1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
|
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use Getopt::Long qw(:config posix_default no_ignore_case bundling pass_through);
use Data::Dumper;
my $help_flag;
my $usage = <<__EOUSAGE__;
####################################################################################
#
# required:
#
# --gtf <string> gtf transcript structure file
#
# --pruned_transcripts_fasta <string> fasta file containing transcript sequences.
#
####################################################################################
__EOUSAGE__
;
my $gtf_file;
my $transcripts_fasta_file;
&GetOptions ( 'h' => \$help_flag,
'gtf=s' => \$gtf_file,
'pruned_transcripts_fasta=s' => \$transcripts_fasta_file,
);
unless ($gtf_file && $transcripts_fasta_file) {
die $usage;
}
main: {
my %trans_ids_want;
{
open (my $fh, $transcripts_fasta_file) or die "Error, cannot open file $transcripts_fasta_file";
while (<$fh>) {
if (/^>(\S+)/) {
my $trans_id = $1;
$trans_ids_want{$trans_id} = 1;
}
}
close $fh;
}
my %remain_to_find_in_gtf = %trans_ids_want;
open(my $fh, $gtf_file) or die $!;
while (<$fh>) {
unless (/\w/) { next; }
if (/^\#/) { next; }
my $line = $_;
if (/transcript_id \"([^\"]+)\"/) {
my $trans_id = $1;
if ($trans_ids_want{$trans_id}) {
print $line;
delete $remain_to_find_in_gtf{$trans_id};
}
}
}
close $fh;
if (%remain_to_find_in_gtf) {
die "Error, didn't encounter entries for the following transcripts in the gtf file: " . Dumper(\%remain_to_find_in_gtf);
}
exit(0);
}
|