1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
|
[](https://github.com/pdb-redo/libcifpp/actions)
[](https://github.com/pdb-redo/libcifpp/LICENSE)
# libcifpp
As the name implies, this library was originally written to work with mmCIF files
using C++ as programming language. The design of this library leanes heavily on
the structure of CIF files. These files can be thought of as a text dump of a
relational databank with, often but not always, a very strict schema describing
the data. These schema's are called dictionaries.
Using information from the content of a mmCIF file and an optional schema,
libcifpp allows you to access the data in the file as a collection of datablock
each containing a collection of categories with rows of data. The categories can
be searched for data using queries written in regular C++ syntax. When a dictionary
was specified, inserted data is checked for validity. Likewise removal of data
may result in cascaded removal of linked data in other categories using
parent/child relationship information.
Since there were still many programs using the legacy PDB format at the time
development started, a layer was added that converts data to and from PDB format
into mmCIF format. This means you can manipulate PDB files as if they were
normal mmCIF files.
Apart from this basic functionality, libcifpp also offers code to help with
symmetry calculations, 3d manipulations and obtaining information from the CCD
[Chemical Component Dictionary](https://www.wwpdb.org/data/ccd).
## Documentation
The documentation can be found at [github.io](https://pdb-redo.github.io/libcifpp/)
## Synopsis
```cpp
// A simple program counting residues with an OXT atom
#include <filesystem>
#include <iostream>
#include <cif++.hpp>
namespace fs = std::filesystem;
int main(int argc, char *argv[])
{
if (argc != 2)
exit(1);
// Read file, can be PDB or mmCIF and can even be compressed with gzip.
cif::file file = cif::pdb::read(argv[1]);
if (file.empty())
{
std::cerr << "Empty file\n";
exit(1);
}
// Take the first datablock in the file
auto &db = file.front();
// Use the atom_site category
auto &atom_site = db["atom_site"];
// Count the atoms with atom-id "OXT"
auto n = atom_site.count(cif::key("label_atom_id") == "OXT");
std::cout << "File contains " << atom_site.size() << " atoms of which "
<< n << (n == 1 ? " is" : " are") << " OXT\n"
<< "residues with an OXT are:\n";
// Loop over all atoms with atom-id "OXT" and print out some info.
// That info is extracted using structured binding in C++
for (const auto &[asym, comp, seqnr] :
atom_site.find<std::string, std::string, int>(
cif::key("label_atom_id") == "OXT",
"label_asym_id", "label_comp_id", "label_seq_id"))
{
std::cout << asym << ' ' << comp << ' ' << seqnr << '\n';
}
return 0;
}
```
## Installation
You might be able to use libcifpp from a package manager used by your
OS distribution. But most likely this package will be out-of-date.
Therefore it is recommended to build *libcifpp* from code. It is not
hard to do. But it is recommended to read the following instructions
carefully.
### Requirements
The code for this library was written in C++17. You therefore need a
recent compiler to build it. For the development gcc >= 9.4 and clang >= 9.0
have been used as well as MSVC version 2019.
The other requirement you really need to have installed on your computer
is a version of [CMake](https://cmake.org). For now the minimum version
is 3.16 but that may soon change into a higher version. You should also
install the gui version of CMake to set build options easily, on Debian
I prefer to use the curses version installed with `cmake-curses-gui`.
It is very useful to have [mrc](https://github.com/mhekkel/mrc) available.
However, this is only an option if you use Windows or an operating system
using the ELF executable format (i.e. Linux or FreeBSD). MRC is a resource
compiler that allows including data files into the executable making them
easier to install.
Other libraries you might want to install beforehand are:
- [libeigen](https://eigen.tuxfamily.org/index.php?title=Main_Page), a
library to do amongst others matrix calculations. This usually can be
installed using your package manager, in Debian/Ubuntu it is called
`libeigen3-dev`
- [zlib](https://github.com/madler/zlib), the development version of this
library. On Debian/Ubuntu this is the package `zlib1g-dev`.
- [pcre2](https://www.pcre.org/), the Perl Compatible Regular Expression
library. On Debian/Ubuntu this is the package `libpcre2-dev`.
### Building
First you need to download the code:
```console
git clone https://github.com/PDB-REDO/libcifpp.git
cd libcifpp
```
You should start by considering where to install libcifpp. If you have
sufficient permissions on your computer you perhaps should use the
default but libcifpp can be configured to be installed anywhere
including e.g. *$HOME/.local*.
Next step is to configure, for this use the CMake gui application. If you
installed the curses version of cmake you can type `ccmake`. On Windows
you can use `cmake-gui.exe`.
To install in the default location:
```console
ccmake -S . -B build
```
To install elsewhere, e.g. *$HOME/.local*:
```console
ccmake -S . -B build -DCMAKE_INSTALL_PREFIX=$HOME/.local
```
In the cmake window, start the configure command (use button or press 'c').
After the first configure step you will see a list of settable options.
Alter these to match your preferences. Most options are self explaining
and contain a description. Some may need a bit more explanation:
- CIFPP_DATA_DIR, this directory will be used to store initial versions
of the mmcif_pdbx dictionary as well as the optional CCD file.
- CIFPP_DOWNLOAD_CCD
The CCD file is huge and perhaps you think you don't
need it. In that case you can leave this OFF. But that will limit the
use cases.
- CIFPP_INSTALL_UPDATE_SCRIPT
The files in CIFPP_DATA_DIR are quickly becoming out of date. On
FreeBSD and Linux you can install a script that updates these files
on a weekly basis.
- CIFPP_CRON_DIR
The directory where the update script is to be installed.
- CIFPP_ETC_DIR
The update script will only work if the file called *libcifpp.conf*
in this *etc* directory will contain an uncommented line with
```console
update=true
```
- CIFPP_CACHE_DIR
When you installed and enabled the update script, new files are
written to this directory.
- CIFPP_RECREATE_SYMOP_DATA
If you had CCP4 sourced into your environment, this option allows
you to recreate the symop data file.
- BUILD_FOR_CCP4
Build a special version of libcifpp to be installed in the CCP4
environment.
After setting these options you can run the configure step again and
then use generate to create the makefiles.
Building and installing is then as simple as:
```console
cmake --build build
cmake --install build
```
If this fails due to lack of permissions, you can try:
```console
sudo cmake --install build
```
Tests are created by default, and to test the code you can run:
```console
ctest --test-dir build
```
|