1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377
|
Version 2.8.0 (2022-10-29)
Support for undecahectane/undecadictane (previously only hendeca was supported)
Support for dicarboximido
Improved support for lysergic acid derivatives
Added a few more sugars e.g. digitalose
Added borodeuteride and hydro contractions of pharmaceutical salts e.g. hydromethanesulfonate
Support substitution on glyceric acid
Corrected interpretation of imidazolium, trioxane and phthalhydrazide
Version 2.7.0 (2022-08-16)
Improved coverage of flavonoid parent structures
Support for apiofuranosyl, added 5 locant to apiose
Improved support for n-amyl
Superscripted numbers in poly spiro systems are now intelligently determined if the input lacks superscript indication
Support for annulynes
Fixed issues where amino acid salts were being interpreted as functionalisation of the amino acid
Fixed bug where annulene parsing was case sensitive
Chalcone, in accordance with current IUPAC recommendations, is now interpreted as specifically the trans isomer
Minor dependency updates
Version 2.6.0 (2021-12-21)
OPSIN now requires Java 8 (or higher)
OPSIN command-line functionality moved to opsin-cli module
OPSIN standalone jars are now built with mvn package
Updated from InChI 1.03 to InChI 1.06
Support for capturing relative/racemic stereochemistry (output via CxSmiles) [contributed by John Mayfield]
Support for deaza/dethia
Support nitrile as a suffix on amino acids [contributed by John Mayfield]
Support more glycero-n-phospho substituents
Support for chloroxime and other haloximes
Support cis/trans on rings where a stereocenter has two non-hydrogen substituents, using Cahn-Ingold-Prelog rules to determine which are relative
Multiple improvements to implicit bracketting logic
Corrected interpretation of methylselenopyruvate
Added group 1/2 nitrides e.g. magnesium nitride
Added molecular diatomics e.g. molecular hydrogen (or dihydrogen)
Fixed out of memory error if a fusion bracket referenced an interior atom instead of a peripheral atom
Fixed out of memory error while parsing very long ambiguous input, by switching parsing algorithm from breadth-first to depth-first
Dependency changes:
Updated logging from Log4J v1.2.17 to the latest Log4J2 (v2.17.0). Neither OPSIN 2.5.0 nor 2.6.0 are vulnerable to Log4Shell. The logging implementation is only included in the opsin-cli module
opsin-inchi now uses JNA-InChI (https://github.com/dan2097/jna-inchi) rather than JNI-InChI. This supports the latest version of InChI and also support new Macs with ARM64 processors
Woodstox now uses groupid com.fasterxml.woodstox (the groupid change did not signify a break in API compatibility)
dk.brics.automaton now uses groupid dk.brics (the groupid change did not signify a break in API compatibility)
commons-cli is only used by the opsin-cli module
Version 2.5.0 (2020-10-04)
OPSIN now requires Java 7 (or higher)
Support for traditional oxidation state names e.g. ferric
Added support for defining the stereochemistry of phosphines/arsines
Added newly discovered elements
Improved algorithm for correctly interpreting ester names with a missing space e.g. 3-aminophenyl-4-aminobenzenesulfonate
Fixed structure of canavanine
Corrected interpretation of silver oxide
Vocabulary improvements
Minor improvements/bug fixes
Internal XML Changes:
tokenList files now all use the same schema (tokenLists.dtd)
Version 2.4.0 (2018-12-23)
OPSIN is now licensed under the MIT License
Locant labels included in extended SMILES output
Command-line now has a name flag to include the input name in SMILES/InChI output (tab delimited)
Added support for carotenoids
Added support for Vitamin B-6 related compounds
Added support for more fused ring system bridge prefixes
Added support for anilide as a functional replacement group
Allow heteroatom replacement as a detachable prefix e.g. 3,6,9-triaza-2-(4-phenylbutyl)undecanoic acid
Support Boughton system isotopic suffixes for 13C/14C/15N/17O/18O
Support salts of acids in CAS inverted names
Improved support for implicitly positively charged purine nucleosides/nucleotides
Added various biochemical groups/substituents
Improved logic for determining intended substitution in names with too few brackets
Incorrectly capitalized locants can now be used to reference ring fusion atoms
Some names no longer allow substitution e.g. water, hydrochloride
Many minor precision/recall improvements
Version 2.3.1 (2017-07-23)
Fixed fused ring numbering algorithm incorrectly numbering some ortho- and peri-fused fused systems involving 7-membered rings
Support P-thio to indicate thiophosphate linkage
Count of isotopic replacements no longer required if locants given
Fixed bug where CIP algorithm could assign priorities to identical substituents
Fixed "DL" before a substituent not assigning the substituted alpha-carbon as racemic stereo
L-stereochemistry no longer assumed on semi-systematic glycine derivatives e.g. phenylglycine
Fixed some cases where substituents like carbonyl should have been part of an implicitly bracketed section
Fixed interpretation of leucinic acid and 3/4/5-pyrazolone
Version 2.3.0 (2017-02-23)
D/L stereochemistry can now be assigned algorithmically e.g. L-2-aminobutyric acid
Other minor improvements to amino acid support e.g. homoproline added
Extended SMILES added to command-line interface
Names intended to include the triiodide/tribromide anion no longer erroneously have three monohalides
Ambiguity detected when applying unlocanted subtractive prefixes
Better support for adjacent multipliers e.g. ditrifluoroacetic acid
deoxynucleosides are now implicitly 2'-deoxynucleosides
Added support for <number> as a syntax for a superscripted number
Added support for amidrazones
Aluminium hydrides/chlorides/bromides/iodides are now covalently bonded
Fixed names with isotopes less than 10 not being supported
Fixed interpretation of some trivial names that clash with systematic names
Version 2.2.0 (2016-10-16)
Added support for IUPAC system for isotope specification e.g. (3-14C,2,2-2H2)butane
Added support for specifying deuteration using the Boughton system e.g. butane-2,2-d2
Added support for multiplied bridges e.g. 1,2:3,4-diepoxy
Front locants after a von baeyer descriptor are now supported e.g. bicyclo[2.2.2]-7-octene
onosyl substituents now supported e.g. glucuronosyl
More sugar substituents e.g. glucosaminyl
Improved support for malformed polycyclic spiro names
Support for oximino as a suffix
Added method [NameToStructure.getVersion()] to retrieve OPSIN version number
Allowed bridges to be used as detachable prefixes
Allow odd numbers of hydro to be added e.g. trihydro
Added support for unbracketed R stereochemistry (but not S, for the moment, due to the ambiguity with sulfur locants)
Various minor bug fixes e.g. stereochemistry was incorrect for isovaline
Minor vocabulary improvements
Version 2.1.0 (2016-03-12)
Added support for fractional multipliers e.g. hemihydrochloride
Added support for abbreviated common salts e.g. HCl
Added support for sandwich compounds e.g. ferrocene
Improved recognition of names missing the last 'e' (common in German)
Support for E/Z directly before double bond indication e.g. 2Z-ylidene, 2Z-ene
Improved support for functional class ethers e.g. "glycerol triglycidyl ether"
Added general support for names involving an ester formed from an alcohol and an ate group
Grignards reagents and certain compounds (e.g. uranium hexafluoride), are now treated as covalent rather than ionic
Added experimental support for outputting extended SMILES. Polymers and attachment points are annotated explicitly
Polymers when output as SMILES now have atom classes to indicate which end of the repeat unit is which
Support * as a superscript indicator e.g. *6* to mean superscript 6
Improved recognition of racemic stereochemistry terms
Added general support for names like "beta-alanine N,N-diacetic acid"
Allowed "one" and "ol" suffixes to be used in more cases where another suffix is also present
"ic acid halide" is not interpreted the same as "ic halide"
Fixed some cases where ambiguous operations were not considered ambiguous e.g. monosubstitututed phenyl
Improvements/bug fixes to heuristics for detecting when spaces are omitted from ether/ester names
Improved support for stereochemistry in older CAS index names
Many precision improvements e.g. cyclotriphosphazene, thiazoline, TBDMS/TBDPS protecting groups, S-substituted-methionine
Various minor bug fixes e.g. names containing "SULPH" not recognized
Minor vocabulary improvements
Internal XML Changes:
Synonymns of the same concept are now or-ed rather being seperate entities e.g. <token>tertiary|tert-|t-</token>
Version 2.0.0 (2015-07-10)
MAJOR CHANGES:
Requires Java 1.6 or higher
CML (Chemical Markup Language) is now returned as a String rather than a XOM Element
OPSIN now attempts to identify if a chemical name is ambiguous. Names that appear ambiguous return with a status of WARNING with the structure provided being one interpretation of the name
Added support for "alcohol esters" e.g. phenol acetate [meaning phenyl acetate]
Multiplied unlocanted substitution is now more intelligent e.g. all substituents must connect to same group, and degeneracy of atom environments is taken into account
The ester interpretation is now preferred in more cases where a name does not contain a space but the parent is methanoate/ethanoate/formate/acetate/carbamate
Inorganic oxides are now interpreted, yielding structures with [O-2] ions
Added more trivial names of simple molecules
Support for nitrolic acids
Fixed parsing issue where a directly substituted acetal was not interpretable
Fixed certain groups e.g. phenethyl, not having their suffix attached to a specific location
Corrected interpretation of xanthyl, and various trivial names that look systematic
Name to structure is now ~20% faster
Initialisation time reduced by a third
InChI generation is now ~20% faster
XML processing dependency changed from XOM to Woodstox
Significant internal refactoring
Utility functions designed for internal use are no longer on the public API
Various minor bug fixes
Internal XML Changes:
Groups lacking a labels attribute now have no locants (previously had ascending numeric locants)
Syntax for addGroup/addHeteroAtom/addBond attributes changed to be easier to parse and allow specification of whether the name is ambiguous if a locant is not provided
Version 1.6.0 (2014-04-26)
Added API/command-line options to generate StdInchiKeys
Added support for the IUPAC recommended nomenclature for carbobohydrate lactones
Added support for boronic acid pinacol esters
Added basic support for specifying chalcogen acid tautomer form e.g. thioacetic S-acid
Fused ring bridges are now numbered
Names with Endo/Exo/Syn/Anti stereochemistry can now be partially interpreted if warnRatherThanFailOnUninterpretableStereochemistry is used
The warnRatherThanFailOnUninterpretableStereochemistry option will now assign as much stereochemistry as OPSIN understands (All ignored stereochemistry terms are mentioned in the OpsinResult message)
Many minor nomenclature support improvements e.g. succinic imide; hexaldehyde; phenyldiazonium, organotrifluoroborates etc.
Added more trivial names that can be confused with systematic names e.g. Imidazolidinyl urea
Fixed StackOverFlowError that could occur when processing molecules with over 5000 atoms
Many minor bug fixes
Minor vocabulary improvements
Minor speed improvements
NOTE: This is the last release to support Java 1.5
Version 1.5.0 (2013-07-21)
Command line interface now accepts files to read and write to as arguments
Added option to allow interpretation of acids missing the word acid e.g. "acetic" (off by default)
Added option to treat uninterpretable stereochemistry as a warning rather than a failure (off by default)
Added support for nucleotide chains e.g. guanylyl(3'-5')uridine
Added support for parabens, azetidides, morpholides, piperazides, piperidides and pyrrolidides
Vocabulary improvements e.g. homo/beta amino acids
Many minor bug fixes e.g. fulminic acid correctly interpreted
Version 1.4.0 (2013-01-27)
Added support for dialdoses,diketoses,ketoaldoses,alditols,aldonic acids,uronic acids,aldaric acids,glycosides,oligosacchardides, named systematically or from trivial stems, in cyclic or acyclic form
Added support for ketoses named using dehydro
Added support for anhydro
Added more trivial carbohydrate names
Added support for sn-glcyerol
Improved heuristics for phospho substitution
Added hydrazido and anilate suffixes
Allowed more functional class nomenclature to apply to amino acids
Added support for inverting CAS names with substituted functional terms e.g. Acetaldehyde, O-methyloxime
Double substitution of a deoxy chiral centre now uses the CIP rules to decide which substituent replaced the hydroxy group
Unicode right arrows, superscripts and the soft hyphen are now recognised
Version 1.3.0 (2012-09-16)
Added option to output radicals as R groups (* in SMILES)
Added support for carbolactone/dicarboximide/lactam/lactim/lactone/olide/sultam/sultim/sultine/sultone suffixes
Resolved some cases of ambiguity in the grammar; the program's capability to handle longer peptide names is improved
Allowed one (as in ketone) before yl e.g. indol-2-on-3-yl
Allowed primed locants to be used as unprimed locants in a bracket e.g. 2-(4'-methylphenyl)pyridine
Vocabulary improvements
SMILES writer will no longer reuse ring closures on the same atom
Fixed case where a name formed of many words that could be parsed ambiguously would cause OPSIN to run out of memory
NameToStructure.getInstance() no longer throws a checked exception
Many minor bug fixes
Version 1.2.0 (2011-12-06)
OPSIN is now available from Maven Central
Basic support for cylised carbohydrates e.g. alpha-D-glucopyranose
Basic support for systematic carbohydrate stems e.g. D-glycero-D-gluco-Heptose
Added heuristic for correcting esters with omitted spaces
Added support for xanthates/xanthic acid
Minor vocabulary improvements
Fixed a few minor bugs/limitations in the Cahn-Ingold-Prelog rules implementation and made more memory efficient
Many minor improvements and bug fixes
Version 1.1.0 (2011-06-16)
Significant improvements to fused ring numbering code, specifically 3/4/5/7/8 member rings are no longer only allowed in chains of rings
Added support for outputting to StdInChI
Small improvements to fused ring building code
Improvements to heuristics for disambiguating what group is being referred to by a locant
Lower case indicated hydrogen is now recognised
Improvements to parsing speed
Many minor improvements and bug fixes
Version 1.0.0 (2011-03-09)
Added native isomeric SMILES output
Improved command-line interface. The desired format i.e. CML/SMILES/InChI as well as options such as allowing radicals can now all be specified via flags
Debugging is now performed using log4j rather than by passing a verbose flag
Added traditional locants to carboxylic acids and alkanes e.g. beta-hydroxybutyric acid
Added support for cis/trans indicating the relative stereochemistry of two substituents on rings and fused rings sytems
Added support for stoichiometry ratios and mixture indicators
Added support for alpha/beta stereochemistry on steroids
Added support for the method for naming spiro systems described in the 1979 recommendations rule A-42
Added detailedFailureAnalysis option to detect the part of a chemical name that fails to parse
Added support for deoxy
Added open-chain saccharides
Improvements to CAS index name uninversion algorithm
Added support for isotopes into the program allowing deuterio/tritio
Added support for R/S stereochemistry indicated by a locant which is also used to indicate the point of substitution for a substituent
Many minor improvements and bug fixes
Version 0.9.0 (2010-11-01)
Added transition metals/f-block elements and nobel gases
Added support for specifying the charge or oxidation number on elements e.g. aluminium(3+), iron(II)
Calculations based off a van Arkel diagram are now used to determine whether functional bonds to metals should be treated as ionic or covalent
Improved support for prefix functional replacement e.g. hydrazono/amido/imido/hydrazido/nitrido/pseudohalides can now be used for functional replacement on appropriate acids
Ortho/meta/para handling improved - can now only apply to six membered rings
Added support for methylenedioxy
Added support for simple bridge prefixes e.g. methano as in 2,3-methanoindene
Added support for perfluoro/perchloro/perbromo/periodo
Generalised alkane support to allow alkanes of lengths up to 9999 to be described without enumeration
Updated dependency on JNI-InChI to 0.7, hence InChI 1.03 is now used.
Improved algorithm for assigning unlocanted hydro terms
Improved heuristic for determing meaning of oxido
Improved charge balancing e.g. ionic substance of an implicit ratio 2:3 can now be handled rather than being represented as a net charged 1:1 mixture
Grammar is a bit more lenient of placement of stereochemistry and multipliers
Vocabulary improvements especially in the area of nucleosides and nucleotides
Esters of biochemical compounds e.g. triphosphates are now supported
Many minor improvements and bug fixes
Version 0.8.0 (2010-07-16)
NameToStructureConfig can now be used to configure whether radicals e.g. ethyl are output or not.
Names like carbon tetrachloride are now supported
glycol ethers e.g. ethylene glycol ethyl ether are now supported
Prefix functional replacement support now includes halogens e.g. chlorophosphate
Added support for epoxy/epithio/episeleno/epitelluro
Added suport for hydrazides/fluorohydrins/chlorohydrins/bromohydrins/iodohydrins/cyanohydrins/acetals/ketals/hemiacetals/hemiketals/diketones/disulfones named using functional class nomenclature
Improvements to algorithm for assigning and finding atoms corresponding to element symbol locants
Added experimental right to left parser (ReverseParseRules.java)
Vocabulary improvements
Parsing is now even faster
Various bug fixes and name intepretation fixes
Version 0.7.0 (2010-06-09)
Added full support for conjunctive nomenclature e.g. 1,3,5-benzenetriacetic acid
Added basic support for CAS names
Added trivial poly-noncarboxylic acids and more trivial carboxylic acids
Added support for spirobi/spiroter/dispiroter and the majority of spiro(ring-locant-ring) nomenclature
Indicators of the direction that a chemical rotates plane polarised light are now detected and ignored
Fixed many cases of trivial names being interpreted systematically by adding more trivial names and detecting such cases
Names such as oxalic bromide cyanide where a halide/pseudohalide replaces an oxygen are now supported
Amino acid ester named from the neutral amino acid are now supported e.g. glycine ethyl ester
Added more heteroatom replacement terms
Allowed creation of an OPSIN parse through NameToStructure.getOpsinParser()
Added support for dehydro - for unsaturating bonds
Improvements to element symbol locant assignment and retrieving appropriate atoms from locants like N2
OPSIN's SMILES parser now accept specification of number of hydrogens in cases other than chiral atoms
Mixtures specified by separating components by semicolonspace are now supported
Many internal improvements and bug fixes
Version 0.6.1 (2010-03-18)
Counter ions are now duplicated such as to lead to if possible a neutral compound
In names like nitrous amide the atoms modified by the functional replacement can now be substituted
Allowed ~number~ for specifying superscripts
Vocabulary improvements
Added quinone suffix
Tetrahedral sulfur stereochemistry is now recognised
Bug fixes to fix incorrect interpretation of some names e.g. triphosgene is now unparseable rather than 3 x phosghene, phospho has different meanings depending on whether it used on an amino acid or another group etc.
Version 0.6.0 (2010-02-18)
OPSIN is now a mavenised project consisting of two modules: core and inchi. Core does name -->CML, inchi depends on core and allows conversion to inchi
Instead of CML an OpsinResult can be returned which can yield information as to why a name was not interpretable
Added support for unlocanted R/S/E/Z stereochemistry. Removed limit on number of atoms that stereochemistry code can handle
Added support for polymers e.g. poly(ethylene)
Improvements in handling of multiplicative nomenclature
Improvements to fusion nomenclature handling: multiplied components and multi parent systems are now supported
Improved support for functional class nomenclature; space detection has been improved and support has been added for anhydride,oxide,oxime,hydrazone,semicarbazone,thiosemicarbazone,selenosemicarbazone,tellurosemicarbazone,imide
Support for the lambda convention
Locanted esters
Improvements in dearomatisation code
CML output changed to being CML-Lite compliant
Speed improvements
Support for greek letters e.g. as alpha or $a or α
Added more infixes
Added more suffixes
Vocabulary improvements
Systematic handling of amino acid nomenclature
Added support for perhydro
Support for ylium/uide
Support for locants like N-1 (instead of N1)
Fixed potential infinite loop in fused ring numbering
Made grammar more lenient in many places e.g. euphonic o, optional sqaure brackets
Sulph is now treated like sulf as in sulphuric acid
and many misc fixes and improvements
Version 0.5.3 (2009-10-22)
Added support for amic, aldehydic, anilic, anilide, carboxanilide and amoyl suffixes
Added support for cyclic imides e.g. succinimide/succinimido
Added support for amide functional class
Support for locants such as N5 which means a nitrogen that is attached in some way to position 5. Locants of this type may also be used in ester formation.
Some improvements to functional replacement using prefixes e.g. thioethanoic acid now works
Disabled stereochemistry in molecules with over 300 atoms as a temporary fix to the problem in 0.52
Slight improvement in method for deciding which group detachable hydro prefixes apply to.
Minor vocabulary update
Version 0.5.2 (2009-10-04)
Outputting directly to InChI is now supported using the separately available nameToInchi jar (an OPSIN jar is expected in the same location as the nameToInchi jar)
Fused rings with any number of rings in a chain or formed entirely of 6 membered rings can now be numbered
Added support for E/Z/R/S where locants are given. Unlocanted cases will be dealt with in a subsequent release. In very large molecules a lack of memory may be encountered, this will be resolved in a subsequent release
Some Infixes are now supported e.g. ethanthioic acid
All spiro systems with Von Baeyer brackets are now supported e.g. dispiro[4.2.4.2]tetradecane
Vocabulary increase (especially: terpenes, ingorganic acids, fused ring components)
Fixed some problems with components with both acylic and cyclic sections e.g. trityl
Improved locant assignments e.g. 2-furyl is now also fur-2-yl
Speed improvements
Removed dependence on Nux/Saxon
Misc minor fixes
Version 0.5.1 (2009-07-20)
Huge reduction in OPSIN initialisation time (typical ~7 seconds -->800ms)
Allowed thio/seleno/telluro as divalent linkers and for functional replacement when used as prefixes. Peroxy can now be used for functional replacement
Better support for semi-trivally named hydrocarbon fused rings e.g. tetracene
Better handling of carbonic acid derivatives
Improvements to locant assignment
Support for names like triethyltetramine and triethylene glycol
Misc other fixes to prevent OPSIN generating the wrong structure for certain types of names
Version 0.5 (2009-06-23)
Too many changes to list
Version 0.1 (2006-10-11)
Initial release
|