McUtils.GaussianInterface
A module for making use of the results of calculations run by the Gaussian electronic structure package. We’d like to be able to also support the NWChem and Psi4 packages, but haven’t had the time, yet, to write it out.
Two main avenues of support are provided:
- importing Gaussian results
- setting up Gaussian jobs
The first is likely to be more useful to you, but we’re hoping to be able to hook (2.) into the Psience.Molecools
package.
The goal there is to provide automated support for setting up scans of molecular vibrations & the like.
There are already direct hooks into (1.) in Psience.Data
through the DipoleSurface
and PotentialSurface
objects.
These are still in the prototype stage, but hopefully will allow us to unify strands of our Gaussian support,
and also make it easy to unify support for Psi4 and NWChem data, once we have the basic interface down.
Members
Examples
FChk Parsing
Gaussian .fchk
files have a set structure which looks roughly like
key data_type data_size
data
This allows us to provide a complete parser for any key
.
The actual parser is a subclass of [Parsers.FileStreamReader]
(../Parsers/) called GaussianFchkReader
.
The syntax to parse is straightforward
target_keys = {"Current cartesian coordinates", "Numerical dipole derivatives"}
with GaussianFchkReader("/path/to/output.log") as parser:
res = parser.parse(target_keys)
and to access properties you will pull them from the dict, res
my_coords = res["Current cartesian coordinates"]
Log Parsing
Gaussian .log
files are totally unstructured (and a bit of a disaster).
This means we need to write custom parsing logic for every field we might want.
The basic supported formats are defined in GaussianLogComponents.py
.
The actual parser is a subclass of Parsers.FileStreamReader
called GaussianLogReader
.
The syntax to parse is straightforward
target_keys = {"StandardCartesianCoordinates", "DipoleMoments"}
with GaussianLogReader("/path/to/output.log") as parser:
res = parser.parse(target_keys)
and to access properties you will pull them from the dict, res
my_coords = res["StandardCartesianCoordinates"]
Adding New Parsing Fields
New parse fields can be added by registering a property on GaussianLogComponents
.
Each field is defined as a dict like
GaussianLogComponents["Name"] = {
"description" : string, # used for docmenting what we have
"tag_start" : start_tag, # starting delimeter for a block
"tag_end" : end_tag, # ending delimiter for a block None means apply the parser upon tag_start
"parser" : parser, # function that'll parse the returned list of blocks (for "List") or block (for "Single")
"mode" : mode # "List" or "Single"
}
The mode
argument specifies whether all blocks should be matched first and send to the parser
("List"
) or if they should be fed in one-by-one "Single"
.
This often provides a tradeoff between parsing efficiency and memory efficiency.
The parser
can be any function, but commonly is built off of a Parsers.StringParser
.
See the documentation for StringParser
for more.
You can add to GaussianLogComponents
at runtime.
Not all changes need to be integrated directly into the file.
Creating Parsers
As a concrete example, for a simple property the coordinates, there is a series of blocks each of which looks like
---------------------------------------------------------------------
Center Atomic Atomic Coordinates (Angstroms)
Number Number Type X Y Z
---------------------------------------------------------------------
1 8 0 0.000000 0.000000 0.000000
2 1 0 0.000000 0.000000 0.565929
3 1 0 0.549420 0.000000 -0.135695
---------------------------------------------------------------------
and the corresponding StringParser
looks like
You don’t need to use a StringParser
.
It can just be convenient for declaring complicated patterns in a way that can be
parsed efficiently.
header_pattern = Named(
Repeating(
Capturing(PositiveInteger),
min=3, max=3,
prefix=Optional(Whitespace),
suffix=Whitespace
),
"GaussianStuff", handler=StringParser.array_handler(dtype=int)
)
coords_pattern = Named(
Repeating(
Capturing(Number),
min=3,
max=3,
prefix=Optional(Whitespace),
joiner=Whitespace
),
"Coordinates", handler=StringParser.array_handler(dtype=float)
)
CartParser = StringParser(
Repeating((header_pattern, coords_pattern), suffix=Optional(Newline))
)
There’s quite a bit going on here, but each bit is straightforward.
First off, we note that in the data we actually want, Gaussian puts three ints before each x y z
triple.
We capture this by the header_pattern
, which is given the name "GaussianStuff"
and
which declares a set of exactly three repeating positive integers that we want to capture in the output,
joined and ended by whitespace.
We want to then take these and save them as an array of ints (the handler
field)
Second, we have a named "Coordinates"
block which is identical to the header_pattern
except
we capture plain Numbers
(i.e. positive or negative floats) and save the results as an array of floats.
Then we string these two patterns together in a repeating block joined by newlines.
Next, we need to tell the system how to identify these blocks. To do so we need a start and end tag.
For that, we declare the patterns
cart_delim = """ --------------------------------------------------------------"""
cartesian_start_tag = FileStreamerTag(
"""Center Atomic Atomic Coordinates (Angstroms)""",
follow_ups=cart_delim
)
cartesian_end_tag = cart_delim
The cartesian_start_tag
tells the code where to start looking. It first finds the line
Center Atomic Atomic Coordinates (Angstroms)
and then follows that up by looking for cart_delim
, as specified by follow_ups
.
Finally, we declare the actual parser function we’ll use which will take the list of identified blocks
and apply the StringParser
we defined like so
def cartesian_coordinates_parser(strs):
strss = "\n\n".join(strs)
parse = CartParser.parse_all(strss)
coords = (
parse["GaussianStuff", 0],
parse["Coordinates"].array
)
return coords
where this returns each array of coordinates and the first match of the GaussianStuff
(we assume they’re all the same).
A regular function that uses string.split
and friends would have worked here too.
It would have just been a little bit more involved.
A regex would also work, and is what StringPattern
uses under the hood.
Finally, we register the parser as
GaussianLogComponents["CartesianCoordinates"] = {
"tag_start": cartesian_start_tag,
"tag_end" : cartesian_end_tag,
"parser" : cartesian_coordinates_parser,
"mode" : "List"
}
There are few different coordinate orientations that Gaussian uses, and we get the different orientations by matching different tags. For instance, we get the Z-matrix oriented coordinates like so
GaussianLogComponents["ZMatCartesianCoordinates"] = {
"tag_start": FileStreamerTag('''Z-Matrix orientation:''', follow_ups = (cart_delim, cart_delim)),
"tag_end" : cartesian_end_tag,
"parser" : cartesian_coordinates_parser,
"mode" : "List"
}
where the reader now initially looks for the tag Z-Matrix orientation:
and then skips over the delimiters twice.
Dealing with Optimizations
We can use the same type of trick to deal with parameters coming from optimizations. For instance, to get the dipole moments from a relaxed scan, we do the following
tag_start = " Dipole ="
tag_end = " Optimization"
def convert_D_number(a, **kw):
import numpy as np
return np.array([float(s.replace("D", "E")) for s in a])
DNumberPattern = RegexPattern((Number, "D", Integer), dtype=float)
OptimizedDipolesParser = StringParser(
RegexPattern(
(
"Dipole", "=",
Repeating(
Capturing(DNumberPattern, handler=convert_D_number),
min=3,
max=3,
suffix=Optional(Whitespace)
)
),
joiner=Whitespace
)
)
def parser(mom):
"""Parses dipole block, but only saves the dipole of the optimized structure"""
import numpy as np
mom = "Dipole =" + mom
# print(">>>>>", mom)
grps = OptimizedDipolesParser.parse_iter(mom)
match = None
for match in grps:
pass
if match is None:
return np.array([])
return match.value.array
# else:
# grp = match.value
# dip_list = [x.replace("D", "E") for x in grp]
# dip_array = np.asarray(dip_list)
# return dip_array.astype("float64")
mode = "List"
parse_mode = "Single"
GaussianLogComponents["OptimizedDipoleMoments"] = {
"tag_start" : tag_start,
"tag_end" : tag_end,
"parser" : parser,
"mode" : mode,
"parse_mode": parse_mode
}
The StringParser
is not much more sophisticated than the one for the Cartesian coordinates, with the caveat that
we need to convert D
to E
to allow python to recognize the floats.
The conversion could have been made more efficient, but it is perhaps edifying to see what a simplistic handler
looks like.
The tag_start
simply looks for a mention of Dipole
and the tag_end
looks for the Optimization
tag.
This will identify the block between when the dipole moment is first printed and the optimization completes.
As the dipole moment is expected to be printed multiple times in this block, we then need to skip to the
final time the dipole moment is printed and only use that value.
We handle this in the parser
function by first adding the tag_start
back onto the matched string
and then using OptimizedDipolesParser.parse_iter
to create an iterator that allows us to loop over the
blocks that match the pattern declared above. We do nothing for all of these matches, just exhausting the iterator.
Once the iterator has been run dry, the value of match
is the final match found by the OptimizedDipolesParser
.
If no matches were found, the value of match
defaults to None
and we do nothing.
Finally, we allow the match
to actually parse out the dipole moment by accessing its value
attribute, and
then we return the array
from that which contains the matched dipole moment.
GJF Setup
Support is also provided for the automatic generation of Gaussian job files (.gjf
) through the GaussianJob
class.