Overview of the feature definition (FDef) file format

An FDef file contains all the information needed to define a set of chemical features. It contains definitions of feature types that are defined from queries built up using Daylight's SMARTS language. The FDef file can optionally also include definitions of atom types that are used to make feature definitions more readable.

Chemical Features

Chemical features are defined by a Feature Type and a Feature Family. The Feature Family is a general classification of the feature (such as "Hydrogen-bond Donor" or "Aromatic") while the Feature Type provides additional, higher-resolution, information about features. Pharmacophore matching is done using Feature Family's.
Each feature type contains the following pieces of information:

The family to which the feature belongs.
A SMARTS pattern that describes atoms (one or more) matching the feature type.
Weights used to determine the feature's position based on the positions of its defining atoms.

Syntax of the FDef file

AtomType definitions

An AtomType definition allows you to assign a shorthand name to be used in place of a SMARTS string defining an atom query. This allows FDef files to be made much more readable. For example, defining a non-polar carbon atom like this:

AtomType Carbon_NonPolar [C&!$(C=[O,N,P,S])&!$(C#N)]

creates a new name that can be used anywhere else in the FDef file that it would be useful to use this SMARTS. To reference an AtomType, just include its name in curly brackets. For example, this excerpt from an FDef file defines another atom type - Hphobe - which references the Carbon_NonPolar definition:

AtomType Carbon_NonPolar [C&!$(C=[O,N,P,S])&!$(C#N)]
AtomType Hphobe [{Carbon_NonPolar},c,s,S&H0&v2,F,Cl,Br,I]

Note that {Carbon_NonPolar} is used in the new AtomType definition without any additional decoration (no square brackes or recursive SMARTS markers are required).

Repeating an AtomType results in the two definitions being combined using the SMARTS "," (or) operator. Here's an example:

AtomType d1 [N&!H0]
AtomType d1 [O&!H0]

This is equivalent to:

AtomType d1 [N&!H0,O&!H0]

Which is equivalent to the more efficient:

AtomType d1 [N,O;!H0]

Note that these examples tend to use SMARTS's high-precendence and operator "&" and not the low precedence and ";". This can be important when AtomTypes are combined when they are repeated. The SMARTS "," operator is higher precedence than ";", so definitions that use ";" can lead to unexpected results.

It is also possible to define negative AtomType queries:

AtomType d1 [N,O,S]
AtomType !d1 [H0]

The negative query gets combined with the first to produce a definition identical to this:

AtomType d1 [!H0;N,O,S]

Note that the negative AtomType is added to the beginning of the query.

Feature definitions

A feature definition is more complex than an AtomType definition and stretches across multiple lines:

DefineFeature HDonor1 [N,O;!H0]
    Family HBondDonor
    Weights 1.0
EndFeature

The first line of the feature definition includes the feature type and the SMARTS string defining the feature.
The next two lines (order not important) define the feature's family and its atom weights (a comma-delimited list that is the same length as the number of atoms defining the feature). The atom weights are used to calculate the feature's locations based on a weighted average of the positions of the atom defining the feature. More detail on this is provided below.
The final line of a feature definition must be EndFeature. It is perfectly legal to mix AtomType definitions with feature definitions in the FDef file. The one rule is that AtomTypes must be defined before they are referenced.

Additional syntax notes:

Any line that begins with a # symbol is considered a comment and will be ignored.
A backslash character, \, at the end of a line is a continuation character, it indicates that the data from that line is continued on the next line of the file. Blank space at the beginning of these additional lines is ignored. For example, this AtomType definition:
```
AtomType tButylAtom [$([C;!R](-[CH3])(-[CH3])(-[CH3])),\
                     $([CH3](-[C;!R](-[CH3])(-[CH3])))]
```
is exactly equivalent to this one:
```
AtomType tButylAtom [$([C;!R](-[CH3])(-[CH3])(-[CH3])),$([CH3](-[C;!R](-[CH3])(-[CH3])))]
```
(though the first form is much easier to read!)

Atom weights and feature locations

Frequently Asked Question(s)

What happens if a Feature Type is repeated in the file? Here's an example:
```
DefineFeature HDonor1 [O&!H0]
    Family HBondDonor
    Weights 1.0
EndFeature
DefineFeature HDonor1 [N&!H0]
    Family HBondDonor
    Weights 1.0
EndFeature
```
In this case both definitions of the HDonor1 feature type will be active. This is functionally identical to:
```
DefineFeature HDonor1 [O,N;!H0]
    Family HBondDonor
    Weights 1.0
EndFeature
```
However the formulation of this feature definition with a duplicated feature type is considerably less efficient and more confusing than the simpler combined definition.