File: gff2annot.pl

package info (click to toggle)
jalview 2.11.4.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 445,392 kB
  • sloc: java: 365,549; xml: 2,989; sh: 1,511; perl: 336; lisp: 139; python: 116; makefile: 81; haskell: 60
file content (117 lines) | stat: -rwxr-xr-x 4,046 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
#!/usr/bin/perl
##
# Jalview - A Sequence Alignment Editor and Viewer ($$Version-Rel$$)
# Copyright (C) $$Year-Rel$$ The Jalview Authors
# 
# This file is part of Jalview.
# 
# Jalview is free software: you can redistribute it and/or
# modify it under the terms of the GNU General Public License 
# as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
#  
# Jalview is distributed in the hope that it will be useful, but 
# WITHOUT ANY WARRANTY; without even the implied warranty 
# of MERCHANTABILITY or FITNESS FOR A PARTICULAR 
# PURPOSE.  See the GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License along with Jalview.  If not, see <http://www.gnu.org/licenses/>.
# The Jalview Authors are detailed in the 'AUTHORS' file.
##

use strict;
use warnings;

my %annotLines;
my %featureids;
my @fields;
while (<>) {
    ($_=~/^\#/) and next;
    my @fields = split /\s+/, $_;
    if (scalar @fields) {
        (defined $annotLines{$fields[1]}) or $annotLines{$fields[1]}=[];
        # this is the tab-separated set of fields forming a jalview annotation line
        # we only use sequence IDs, not numbers
        my $line = [$fields[2],$fields[0],"-1", $fields[3], $fields[4], $fields[2]];
        $featureids{$fields[2]}="FF0000"; # red is the colour.
        my $attribs = {};
        if (scalar @fields>5) {
            $attribs->{"gff:score"}=$fields[5];
            (scalar @fields>6) and $attribs->{"gff:strand"}=$fields[6];
            (scalar @fields>7) and $attribs->{"gff:frame"}=$fields[7];
            if (scalar @fields>8) {
                for (my $i=7; ($i+1)<(scalar @fields); $i+=2) {
                    $attribs->{"gff:".$fields[$i]} = $fields[$i+1];
                }
            }
        }
        push @{$annotLines{$fields[1]}}, [$line, $attribs];
    }
}
foreach my $labels (keys %featureids) {
    print "$labels\t".$featureids{$labels}."\n"; 
}
foreach my $labels (keys %annotLines) {
    print "STARTGROUP\t".$labels."\n";
    foreach my $annot (@{$annotLines{$labels}}) {
        # bare minimum is written - no attributes/links yet.
        print "".(join "\t",@{$annot->[0]})."\n"; 
    }
    print "ENDGROUP\t".$labels."\n";
}

=pod

=head1 NAME

gff2annot.pl

=head2 SYNOPSIS


  gff2annot.pl [one or more files containing gff annotation]

Generates a nominally usable Jalview Annotation file on B<STDOUT> from arbitrary GFF annotation lines.

=head2 DESCRIPTION

This script will generate a jalview features file on standard out, from a set of GFF annotation lines input from STDIN and/or any provided filenames.

For a series of GFF annotation lines looking like :

E<lt>seqIdE<gt> E<lt>sourceE<gt> E<lt>nameE<gt> E<lt>startE<gt> E<lt>endE<gt> [E<lt>scoreE<gt> E<lt>strandE<gt> E<lt>frameE<gt> [E<lt>AttributeE<gt> E<lt>Attribute-Value<gt>]]

The script will generate a seuqence features file on B<STDOUT> where annotation with a particular B<source> string will be grouped together under that name.

=head2 Example

Passing some GFF annotation through STDIN:

  perl gff2annot.pl
  Seq1 blastx significant_hsp 1 5 0.9 + 1 link http://mylink/
  # a comment
  Seq1 blasty significant_hsp 15 25 0.9 + 1 link http://mylink/
  Seq1 blastz significant_hsp 32 43 0.9 + 1 link http://mylink/
  Seq2 blastx significant_hsp 1 5 0.9 + 1 link http://mylink/
  Seq2 blasty significant_hsp 1 5 0.9 + 1 link http://mylink/
  Seq2 blastz significant_hsp 1 5 0.9
  Seq3 blastx significant_hsp 50 70
  <Control^D/Z>

Produces

  significant_hsp	FF0000
  STARTGROUP	blasty
  significant_hsp	Seq1	-1	15	25	significant_hsp
  significant_hsp	Seq2	-1	1	5	significant_hsp
  ENDGROUP	blasty
  STARTGROUP	blastx
  significant_hsp	Seq1	-1	1	5	significant_hsp
  significant_hsp	Seq2	-1	1	5	significant_hsp
  significant_hsp	Seq3	-1	50	70	significant_hsp
  ENDGROUP	blastx
  STARTGROUP	blastz
  significant_hsp	Seq1	-1	32	43	significant_hsp
  significant_hsp	Seq2	-1	1	5	significant_hsp
  ENDGROUP	blastz

=cut