1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
|
#!/usr/bin/perl
use strict;
use warnings;
use Chemistry::OpenSMILES::Parser;
use Encode qw( encode );
use File::Basename qw( basename );
use Getopt::Long::Descriptive;
my $basename = basename $0;
my( $opt, $usage ) = describe_options( <<"END" . 'OPTIONS',
USAGE
$basename [<args>] [<files>]
DESCRIPTION
$basename reads in SMILES and outputs a summary chemical formula for
each SMILES entry.
END
[ 'h-isotopes' => hidden => {
one_of => [
[ 'distinguish-hydrogen-isotopes' =>
'output deuterium as D and tritium as T' ],
[ 'no-distinguish-hydrogen-isotopes' =>
'output deuterium and tritium as H [default]' ]
],
default => 'no_distinguish_hydrogen_isotopes'
}
],
[],
[ 'help', 'print usage message and exit', { shortcircuit => 1 } ],
);
if( $opt->help ) {
print $usage->text;
exit;
}
my $errors = 0;
while (<>) {
chomp;
utf8::decode( $_ );
# my $additional_position = $1 if s/\t([^\t]*)$//;
my $additional_position;
if( s/\t([^\t]*)$// ) {
$additional_position = $1;
}
my @moieties;
my $parser = Chemistry::OpenSMILES::Parser->new;
eval {
@moieties = $parser->parse( $_ );
};
if( $@ ) {
$@ =~ s/^$0:\s*//;
$additional_position = defined $additional_position
? ' ' . $additional_position
: '';
print STDERR sprintf '%s: %s(%d)%s: %s',
basename( $0 ),
$ARGV,
$.,
encode( 'utf8', $additional_position ),
encode( 'utf8', $@ );
$errors++;
next;
}
my %formula;
for my $moiety (@moieties) {
for my $atom ($moiety->vertices) {
my $symbol = ucfirst $atom->{symbol};
if( $opt->h_isotopes eq 'distinguish_hydrogen_isotopes' &&
$symbol eq 'H' && $atom->{isotope} ) {
if ( $atom->{isotope} == 2 ) {
$symbol = 'D';
} elsif ( $atom->{isotope} == 3 ) {
$symbol = 'T';
}
}
$formula{$symbol} = 0 unless $formula{$symbol};
$formula{$symbol}++;
}
}
my @formula;
if( exists $formula{C} ) {
push @formula, 'C' . ($formula{C} == 1 ? '' : $formula{C});
push @formula, 'H' . ($formula{H} == 1 ? '' : $formula{H}) if exists $formula{H};
delete $formula{C};
delete $formula{H};
}
for my $element (sort keys %formula) {
push @formula, $element . ($formula{$element} == 1 ? '' : $formula{$element});
}
$additional_position = defined $additional_position
? "\t" . $additional_position
: '';
print "@formula$additional_position\n";
}
exit( $errors > 0 );
|