1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322
|
mol22lt.py
===========
*mol22lt.py* is a program for converting MOL2 files
into moltemplate (LT) file format.
## *WARNING: BETA SOFTWARE. THIS SOFTWARE IS EXPERIMENTAL AS OF 2024-12-05*
## Usage:
```
mol22lt.py \
--in FILE.MOL2 \
--out FILE.LT \
[--name MOLECULE_NAME] \
[--charges charges.txt] \
[--ff FORCE_FIELD_NAME] \
[--ff-file FORCE_FIELD_FILE_NAME]
```
## Example:
Convert polyphenylene sulfide (PPS) polymer
(stored in a file named "PPS_5mer.mol2")
into moltemplate format:
```
mol22lt.py \
--in PPS_5mer.mol2 \
--out PPS_5mer.lt \
--name PPS5 \
--ff GAFF2 \
--ff-file "gaff2.lt"
```
Later on, you would use this "PPS_5mer.lt" file we just created
by referring to it in another file (usually "system.lt").
Here is an example "system.lt" file which uses the "PPS_5mer.lt"
file we just created:
```
import "PPS_5mer.lt"
pps5_copy = new PPS5 # (instantiate a single copy of the "PPS5" polymer)
```
To make multiple copies of "PPS5", you could use:
```
import "PPS_5mer.lt"
pps5_copy1 = new PPS5.move(-24.7, -3.9, -4.3)
pps5_copy2 = new PPS5.move(-21.3, 1.9, 0.7)
```
To prepare a LAMMPS simulation, we would enter this command into the terminal:
```
moltemplate.sh system.lt
```
*(Once defined, molecules (like "PPS5") can be customized
and combined with (bonded to) other molecules, as demonstrated in the
[moltemplate manual](https://moltemplate.org/doc/moltemplate_manual.pdf#section.9).)*
## *WARNING: THIS SOFTWARE DOES NOT WORK WITH MULTIPLE CHAINS*
This software does not work with MOL2 files containing multiple "chains".
*("Chains" are optional features located in the
[SUBSTRUCTURE section of some MOL2 files](http://chemyang.ccnu.edu.cn/ccb/server/AIMMS/mol2.pdf).)*
However there is a manual workaround.
([See below](#working-with-multiple-chains).)
## Details
The [MOL2 file format](https://zhanggroup.org/DockRMSD/mol2.pdf)
is a versatile file generated by many popular molecular simulation software
tools (including AmberTools, Gaussian, OpenBabel, and the
[RED-server](https://upjv.q4md-forcefieldtools.org)).
This program will extract the following information from a MOL2 file,
converting the result to a moltemplate LT file
(using the "full" atom-style).
- charge (column 9 of the ATOM section)
- atom-names (column 2 of the ATOM section)
- XYZ coordinates (columns 3,4,5 of the ATOM section)
- atom-type (column 6 of the ATOM section)
- subunit-id (column 7 of the ATOM section)
- subunit-name (column 8 of the ATOM section)
- bonds (columns 2 and 3 from the BOND section)
This program will *IGNORE* the following information in a MOL2 file:
- *any information* ***not*** *contained in the ATOM or BOND sections*
- atom id (column 1 from the ATOM section)
- bond id (column 1 from the BOND section)
- bond type (column 4 from the BOND section)
- "chain" (subunit/substructure ID numbers *are* considered, but not the "chain")
- status bits (columns 10 and 5 from the ATOM and BOND sections, respectively)
If the MOL2 file contains multiple subunits a new molecule-object
definition will be created for each subunit.
In that case, if you want the entire system to be stored in a single
molecule definition, use the *--name* argument. (See below.)
#### MOL2 file format requirements
- The *atom-names* (2nd column) must be unique
within each molecular subunit.
- All of the atom-ID numbers and subunit-ID numbers
in the file must be unique and begin at 1
(although the order can vary).
### Force Fields
The atom type names (column 6 of the MOL2 file)
may correspond to atom types used by
popular force-fields (such as AMBER GAFF or GAFF2).
If you want to use these force fields in your simulations,
you must let moltemplate know the name of force field and the file
that stores the force field parameters using the *--ff* and *--ff-file*
arguments. *(Example: "--ff GAFF2 --ff-file gaff2.lt")*
### Molecular Subunits
LT files are typically used to store (one or more) molecule type definitions
(or monomers or other types of molecular subunits).
The LT files generated by *mol22lt.py* contain definitions of all of the
molecules or molecular subunits (a.k.a. "substructures")
defined in the MOL2 file.
Again, if you want the entire system to be stored in a single
molecule definition, use the *--name* argument.
#### Redundant Subunits
If the the MOL2 file contains multiple identical types of molecules
or molecular subunits, the resulting LT file will contain multiple
redundant definitions of the same molecular subunits
(but with different atomic coordinates).
This won't cause any problems (other than larger LT files).
*(If, for some reason, the user wants to avoid redefining the
same types of molecules or molecular subunits,
they should supply a MOL2 file containing only
a single copy of that molecule or subunit.
Later they can use moltemplate's "new", ".move()", and ".rot()" commands to
instantiate multiple copies of the molecular subunit at those positions
instead of redefining it.)*
### Centering the molecule(s)
The *mol22lt.py* ignores the "CENTROID" and "CENTER_OF_MASS"
sections of the MOL2 file.
Instead, each molecular subunit (or the entire molecule) can be manually
recentered or rotated by editing the LT file generated by this
program and appending a line containing a sequence of *.move()* and/or *.rot()*
commands to correct the position.
In the example above, if the "PPS5" polymer is centered at
(24.7,3.9,4.3), we could append this line
to the end of the "PPS_5mer.lt" file to recenter it:
```
PPS5.move(-24.7, -3.9, -4.3)
```
This will modify the definition of the "PPS5" molecule,
adding (-24.7, -3.9, -4.3) to the coordinates of all the atoms the molecule
(before it is copied/instantiated using the "new" command).
## Arguments
### --in FILE.mol2
Specify the name of the MOL2 file you want to convert.
*(If omitted, the terminal (stdin) is used by default.)*
### --out FILE.lt
Specify the name of the moltemplate file (LT file) you want to create.
*(If omitted, the terminal (stdout) is used by default.)*
## Optional Arguments
### --charges CHARGES.txt
By default *mol22lt.py* will read the charges from the MOL2 file (if present).
But if the the charges in the MOL2 file are absent or not correct,
you can also customize them by supplying a file containing
the correct charges using the *--charges* argument.
This is a one-column text file containing one number per line
*(Comments following '#' characters are allowed.)*
The charges in this file must appear in the same order as the
atom-ID numbers in the first column of the MOL2 file.
### --name MOLECULE_NAME
By default *mol22lt.py* will treat each molecular subunit
(a.k.a. "substructure") in the MOL2 file as an independent molecule.
If there are bonds connecting them together, they will be included,
however each molecular subunit will have a different molecule name.
*(And the atoms in different subunits will be assigned to
different molecule-ID numbers.)*
This is inconvenient to use.
Later you want to create multiple copies of this entire molecule (polymer), you
will have to copy each one of these molecular subunits that it is built from.
The *--name* argument allows you to group everything together in
a single molecule definition. Later on, you can refer to this entire
compound molecule using the *MOLECULE_NAME* you gave it.
*(And all of the the atoms in the entire file will share the same molecule-ID.)*
This is useful if you plan to use this molecule as a building block for
creating larger simulations.
*Note:* There is no need to use the *--name* argument
if your MOL2 file only contains a single molecular subunit definition.
This argument was intended for use with more complex molecules
that contain multiple subunits, such as polymers.
### --ff FORCE_FIELD
If the molecules are associated with a particular force field (such as GAFF2),
the user can specify that using this argument (eg. "--f GAFF2").
The atom names in the MOL2 file will be used to lookup the force field
parameters from that force field.
*(You should probably also specify the name of the file containing
that force field using the --ff-file argument.)*
### --ff-file FORCE_FIELD_FILE
This will add a line to the beginning of the LT file generated by this program
telling moltemplate to load a file.
(Typically this file contains atom type definitions and force field parameters.)
In the example above, if you are using the GAFF2 force field, you would use
*"--ff-file gaff2.lt"*. (The "gaff2.lt" stores the GAFF2 parameters.)
### --upper-case-types
This will force all of the atom *type* names to use upper-case letters.
*(This is useful for fixing some force-field specific format errors.)*
### --lower-case-types
This will force all of the atom *type* names to use lower-case letters.
*(This is useful for fixing some force-field specific format errors.)*
### --upper-case-names
This will force all of the atom names to use upper-case letters.
### --lower-case-names
This will force all of the atom names to use lower-case letters.
*(Note that atom names are used to identify atoms in bonds.
They are not used to lookup force-field information.
Make sure they remain uniquely named, even after changing capitalization.)*
## Working with multiple chains
If your MOL2 file contains multiple chains,
split it into multiple MOL2 files (one per chain).
Then convert each file separately.
Afterwards, if you want to define a large molecular complex
(such as a protein with quaternary structure),
you can use moltemplate to define a large molecule composed of
multiple chain subunits. For example, suppose we have a .mol2 file containing
two chains. If we split that file into two files ("chainA.mol2", "chainB.mol2"),
we can create two .lt files, one for each chain:
```
mol22lt.py --in chainA.mol2 --out chainA.lt --name ChainA --ff GAFF2 --ff-file "gaff2.lt"
mol22lt.py --in chainB.mol2 --out chainB.lt --name ChainB --ff GAFF2 --ff-file "gaff2.lt"
```
Then we can then can manually create a new .lt file
(eg. "protein_with_2_chains.lt")
defining a molecular complex containing two chains:
```
import "chainA.lt" # Defines "ChainA"
import "chainB.lt" # Defines "ChainB"
ProteinWith2Chains {
a = ChainA
b = ChainB
}
```
And then (in our "system.lt" file) we can instantiate that
complex this way (for example):
```
protein1 = new ProteinWith2Chains
```
## Python API
It is possible to access the functionality of *mol22lt.py* from
within python. Example:
```python
import moltemplate
# Open the file you want to convert
fMol2 = open('PPS_5mer.mol2', 'r')
# Now create a new moltemplate file
fLT = open('PPS_5mer.lt', 'w')
# Write the contents of the new file
ConvertMol22Lt(fMol2,
fLT,
ff_name = 'GAFF2', # <-- optional argument (force field)
ff_file = 'gaff2.lt', # <-- optional argument (ff file)
object_name = 'PPS5') # <-- optional argument (molecule name)
```
|