PyCIFRW provides facilities for reading, manipulating and writing CIF and STAR files. In addition, CIF files and dictionaries may be validated against DDL1/2/m dictionaries.
(Note: these instructions refer to version 4.0 and higher. For older versions, see the documentation provided with those versions).
As of version 4.0, it is sufficient to install the PyCIFRW “wheel” using pip, for example:
pip install --use-wheel PyCifRW-4.2-cp27-none-linux_i686.whl
or using the platform independent source package found on PyPI:
pip install pycifrw
If you want to include PyCIFRW with your package, you can install the PyCIFRW wheel into your development environment and then bundle the contents of the CifFile directory found in the Python local libraries directory (usually site-packages).
If PyCIFRW has installed properly, the following command should complete without any errors:
import CifFile
CIF files are represented in PyCIFRW as CifFile objects. These objects behave identically to Python dictionaries, with some additional methods. CifFile objects can be created by calling the ReadCif function on a filename or URL:
from CifFile import ReadCif
cf = ReadCif("mycif.cif")
df = ReadCif("ftp://ftp.iucr.org/pub/cifdics/cifdic.register")
Errors are raised if CIF syntax/grammar violations are encountered in the input file or line length limits are exceeded.
A compiled extension (StarScan.so) is available in binary distributions which increases parsing speed by a factor of three or more. To use this facility, include the keyword argument scantype='flex' in ReadCif commands:
cf = ReadCif("mycif.cif",scantype="flex")
Binary distributions are generally only provided for the 'manylinux' target, but may also be generated from the source distribution for any platform if the appropriate compilers are available on that platform.
Alternatively, you may initialise a CifFile object with the URI:
cf = CifFile("mycif.cif",scantype="flex")
If your CIF file contains characters that are not encoded in UTF8 or ASCII, you may pass the 'permissive' option to ReadCif, which will try other encodings (currently only latin1). Use of this option is not encouraged.
There are three variations in CIF file syntax. An early, little-used version of the standard allowed non-quoted data strings to begin with square bracket characters ('['). This was disallowed in version 1.1 in order to reserve such usage for later developments. The recently introduced CIF2 standard adds list and table datastructures to CIF1. Detection of the appropriate CIF grammar is automatic, but potentially time-consuming for multiple files, so specification of the particular version to use is possible with the grammar keyword:
cf = ReadCif('oldcif.cif',grammar='1.0') #oldest CIF syntax
cf = ReadCif('normcif.cif',grammar='1.1') #widespread
cf = ReadCif('future.cif',grammar='2.0') #latest standard
cf = ReadCif('unknown.cif',grammar='auto') #try 2.0->1.1->1.0
Reading of STAR2 files is also possible by setting grammar='STAR2'. Currently, the default is set to 'auto'.
A new CifFile object is usually created empty:
from CifFile import CifFile
cf = CifFile()You will need to create at least one CifBlock object to hold your data. The CifBlock is then added to the CifFile using the usual Python dictionary notation. The dictionary 'key' becomes the blockname used for output.
from CifFile import CifBlock
myblock = CifBlock()
cf['a_block'] = myblockA CifBlock object may be initialised with another CifBlock, in which case a copy operation is performed.
Note that most operations on data provided by PyCIFRW involve CifBlock objects.
The simplest form of access is using standard Python square bracket notation. Data blocks and data names within each data block are referenced identically to normal Python dictionaries:
my_data = cf['a_data_block']['_a_data_name']All values read in are stored as strings 1, with CIF syntactical elements stripped, that is, no enclosing quotation marks or semicolons are included in the values. The value associated with a CifFile dictionary key is always a CifBlock object. All standard Python dictionary methods (e.g. get, update, items(), keys()) are available for both CifFile and CifBlock objects. Note also the convenience method first_block(), which will return the first datablock stored which is not necessarily the first datablock in the physical file:
my_data = cf.first_block()If a data name occurs in a loop, a list of values is returned for the value of that dataname - the next section describes ways to access looped data.
For the purpose of the examples, we use the following example CIF file:
data_testblock
loop_
_item_5
_item_7
_item_6
1 a 5
2 b 6
3 c 7
4 d 8
Any table can be interacted with in a column-based or a row-based way. A PyCIFRW CifBlock object provides column-based access using normal square bracket syntax as described above: for example cf['testblock']['_item_6'] will return ['5','6','7','8'].
The CifLoopBlock object represents a loop structure in the CIF file and facilitates row-based access. A CifLoopBlock object can be obtained by calling the CifBlock method GetLoop(dataname). Column-based access remains available for this object (e.g. keys() returns a list of datanames in the loop and square bracket notation returns a list of column values for that column).
A particular row can be selected using the CifLoopBlock GetKeyedPacket method:
>>> lb = cf['testblock'].GetLoop('_item_6')
>>> myrow = lb.GetKeyedPacket('_item_7','c')
>>> myrow._item_5
'3'In this example, the single packet with a value of 'c' for _item_7 is returned, and packet values can then be accessed using the dataname as an attribute of the packet. Note that a KeyError is raised if more than one packet matches, or no packets match, and that the packet returned is a copy of the data read in from the file, and therefore can be changed without affecting the CifBlock object.
You may also access the nth value in this CifLoopBlock object. 2, and values can be obtained from these packets as attributes.
>>> lb = cb.GetLoop("_item_5")
>>> lb[0]
['1', 'a', '5']
>>> lb[0]._item_7
'a'An alternative way of accessing loop data uses Python iterators, allowing the following syntax:
>>> for a in lb: print `a["_item_7"]`
'a' 'b' 'c' 'd' Note that in both the above examples the row packet is a copy of the looped data, and therefore changes to it will not silently alter the contents of the original CifFile object, unlike the lists returned when column-based access is used.
If many operations are going to be performed on a single data block, it is convenient to assign that block to a new variable:
cb = cf['my_block']A new data name and value may be added, or the value of an existing name changed, by straight assignment:
cb['_new_data_name'] = 4.5
cb['_old_data_name'] = 'cucumber'Old values are overwritten silently. Note that values may be strings or numbers.
To create a loop, simply set the column values to same-length lists, and then call the CifBlock method CreateLoop with a list of the looped datanames as a single argument. This method will raise an error if the datanames have different length columns assigned to them. For example, the following commands create the example loop above:
cb['_item_5'] = [1,2,3,4]
cb['_item_7'] = ['a','b','c','d']
cb['_item_6'] = [5,6,7,8]
cb.CreateLoop(['_item_5','_item_7','_item_6'])Another method, AddToLoop(dataname,newdata), adds columns in newdata to the pre-existing loop containing dataname, silently overwriting duplicate data. newdata should be a Python dictionary of dataname - datavalue pairs.
Note that lists (and other listlike objects except packets) returned by PyCIFRW actually point to the list currently inside the CifBlock object, and therefore any modification to them will modify the stored list. While this is often the desired behaviour, if you intend to manipulate such a list in other parts of your program while preserving the original CIF information, you should first copy the list to avoid destroying the loop structure:
mysym = cb['_symmetry_ops'][:]
mysym.append('x-1/2,y+1/2,z')Item (and block) order has no semantic significance in CIF files. However, the readability of CIF files in simple text editors leads to a desire to organise the output order for human readers. The ChangeItemOrder method allows the order in which data items appear in the printed file to be changed:
mycif['testblock'].ChangeItemOrder('_item_5',0)will move _item_5 to the beginning of the datablock. When changing the order inside a loop block, the loop block's method must be called i.e.:
aloop = mycif['testblock'].GetLoop('_loop_item_1')
aloop.ChangeItemOrder('_loop_item_1',4)Note also that the position of a loop within the file can be changed in this way as well, by passing the 'block number' object as the first argument. Each loop is assigned a simple integer number, which can be found by calling FindLoop with the name of a column in that loop:
loop_id = mycif['testblock'].FindLoop('_item_6')
mycif['testblock'].ChangeItemOrder(loop_id,0)will move the loop block to the beginning of the printed datablock.
While it is most efficient to add columns to the CifBlock and then bind them together once into a loop, it is possible to add a new row into an existing loop using the AddPacket(packet) method of CifLoopBlock objects:
aloop = mycif['testblock'].GetLoop('_item_7')
template = aloop.GetKeyedPacket('_item_7','d')
template._item_5 = '5'
template._item_7 = 'e'
template._item_6 = '9'
aloop.AddPacket(template)Note we use an existing packet as a template in this example. If you wish to create a packet from scratch, you should instantiate a StarPacket:
from CifFile import StarFile #installed with PyCIFRW
newpack = StarFile.StarPacket()
newpack._item_5 = '5'
...
aloop.AddPacket(newpack)Note that an error will be raised when calling AddPacket if the packet attributes do not exactly match the item names in the loop.
A packet may be removed using the RemoveKeyedPacket method, which chooses the packet to be removed based on the value of the given dataname:
aloop.RemoveKeyedPacket('_item_7','a')The CifFile method WriteOut returns a string which may be passed to an open file descriptor:
outfile = open("mycif.cif")
outfile.write(cf.WriteOut())Or the built-in Python str() function can be used:
outfile.write(str(cf))
WriteOut takes an optional keyword argument, comment, which should be a string containing a comment which will be placed at the top of the output file. This comment string must already contain # characters at the beginning of lines:
outfile.write(cf.WriteOut("#This is a test file"))Two additional keyword arguments control line length in the output file: wraplength and maxoutlength. Lines in the output file are guaranteed to be shorter than maxoutlength characters, and PyCIFRW will additionally insert a line break if putting two data values or a dataname/datavalue pair together on the same line would exceed wraplength. In other words, unless data values are longer than maxoutlength characters long, no line breaks will be inserted into those datavalues the output file. By default, wraplength = 80 and maxoutlength = 2048. Note that the CIF line folding protocol is used, which makes wrapping of long datavalues reversible.
These values may be set on a per block basis by calling the SetOutputLength method of the block.
The order of output of items within a CifFile or CifBlock is specified using the ChangeItemOrder method (see above). The default order is the order that items were first added in to the CifFile/CifBlock. Note that this order is not guaranteed to be the order in which they appear in the input file.
If you want precise control of the layout of your CIF file, you can pass a template file to the CifBlock.process_template method. A 'template' is a CIF file containing a single block, where the datanames are laid out in the way that the user desires. The layout elements that are picked up from this template are:
CifBlock)Constraints on the template:
loop_ and and datablock tokens should appear as the only non-blank characters on their linesAfter calling process_template with the template file as the argument, subsequent calls to WriteOut will respect the template information, and revert to default behaviour for any datanames that were not found in the template. Templating is most useful when formatting CIF dictionaries which are read heavily by human readers, and have many (thousands!) of datablocks, each containing the same limited number of datanames.
CIF files are output by default in CIF2 grammar, but with the CIF2-only triple quotes avoided unless explicitly requested through a template. Therefore, as long as CIF2-only datastructures (lists and tables) are absent, the output CIF files will conform to 1.0,1.1 and 2.0 grammar. The grammar of the output files can be changed by calling CifFile.set_grammar with the choices being 1.0,1.1,2.0 or STAR2.
The ValidCifFile class is deprecated and will be removed in a future version.
A program which uses PyCIFRW for validation, validate_cif.py, is included in the distribution in the Programs subdirectory. It will validate a CIF file (including dictionaries) against one or more dictionaries which may be specified by name and version or as a filename on the local disk. If name and version are specified, the IUCr canonical registry or a local registry is used to find the dictionary and download it if necessary.
python validate_cif.py [options] ciffile
--version show version number and exit
-h,--help print short help message
-d dirname directory to find/store dictionary files
-f dictname filename of locally-stored dictionary
-u version dictionary version to resolve using registry
-n name dictionary name to resolve using registry
-s store downloaded dictionary locally (default True)
-c fetch and use canonical registry from IUCr
-r registry location of registry as filename or URL
-t The file to be checked is itself a DDL2 dictionary
The source files are in a literate programming format (noweb) with file extension .nw. HTML documentation generated from these files and containing both code and copious comments is included in the downloaded package. Details of interpretation of the current standards as relates to validation can be found in these files.
This deviates from the current CIF standard, which mandates interpreting unquoted strings as numbers where possible and in the absence of dictionary definitions to the contrary (International Tables, Vol. G., p24).↩
Warning: row and column order in a CIF loop is arbitrary; while PyCIFRW currently maintains the row order seen in the input file, there is nothing in the CIF standards which mandates this behaviour, and later implementations may change this behaviour↩