1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
|
#!/usr/local/bin/perl
# POD docs at end of file
use strict;
use Getopt::Long;
use FileHandle;
use GO::Parser;
use Data::Stag;
$|=1;
my $opt = {};
GetOptions($opt,
"help|h",
"format|p=s",
"min|m=s",
"err|e=s",
"id=s@",
"names",
"use_cache",
"count",
);
if ($opt->{help}) {
system("perldoc $0");
exit;
}
my $errf = $opt->{err};
my $errhandler = Data::Stag->getformathandler('xml');
if ($errf) {
$errhandler->file($errf);
}
else {
$errhandler->fh(\*STDERR);
}
my $outer_parser = new GO::Parser({handler=>'obj'});
my $handler = $outer_parser->handler;
my @files = GO::Parser->new->normalize_files(@ARGV);
while (my $fn = shift @files) {
my $fn = shift;
my %h = %$opt;
my $fmt;
if ($fn =~ /\.obo/) {
$fmt = 'obo_text';
}
if ($fn =~ /\gene_assoc/) {
$fmt = 'go_assoc';
}
if ($fmt && !$h{format}) {
$h{format} = $fmt;
}
my $parser = new GO::Parser(%h);
$parser->handler($handler);
#$parser->litemode(1);
$parser->errhandler($errhandler);
$parser->parse($fn);
}
my $g = $handler->graph;
my $subg = GO::Model::Graph->new;
my $min = $opt->{min} || 1;
my %ok = ();
foreach my $term (@{$g->get_all_terms}) {
my $id = $term->acc;
my $n = scalar(@{$g->deep_product_list($id)});
if ($n >= $min) {
#print STDERR "OK: $id [c=$n]\n";
$ok{$id} = 1;
}
}
foreach my $term (@{$g->get_all_terms}) {
my $id = $term->acc;
if($ok{$id}) {
}
else {
#print STDERR "XX: $id\n";
$g->delete_node($id);
}
}
$g->export({format=>$opt->{to} || 'obo'});
$errhandler->finish;
exit 0;
__END__
=head1 NAME
go-make-slim.pl - generates a slim file based on association file
=head1 SYNOPSIS
go-show-assocs-by-node.pl gene_ontology.obo gene_associations.fb
=head1 DESCRIPTION
=head1 ARGUMENTS
=head3 -m
minimum number of distinct gene products a node must have attached
at-or-below that node for it to be included in the slim
=head3 -e ERRFILE
writes parse errors in XML - defaults to STDERR
(there should be no parse errors in well formed files)
=head2 DOCUMENTATION
L<http://www.godatabase.org/dev>
=cut
|