1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
|
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use Getopt::Long qw(:config no_ignore_case bundling);
use Cwd;
use File::Basename;
use lib ("/usr/lib/trinityrnaseq/PerlLib");
use Fasta_reader;
use Data::Dumper;
my @matrices = @ARGV;
unless (scalar @matrices > 1) {
die "\n\n\tusage: $0 matrixA matrixB ...\n\n";
}
my %matrix;
my %genes;
main: {
foreach my $matrix (@matrices) {
if (-s $matrix) {
&parse_matrix($matrix);
}
else {
print STDERR "WARNING: cannot locate matrix file: $matrix";
}
}
## output new matrix:
my @colnames = sort keys %matrix;
print "\t" . join("\t", @colnames) . "\n";
foreach my $gene (sort keys %genes) {
print "$gene";
foreach my $colname (@colnames) {
my $val = $matrix{$colname}->{$gene};
unless (defined $val) {
$val = "NA";
}
print "\t$val";
}
print "\n";
}
exit(0);
}
####
sub parse_matrix {
my ($matrix_file) = @_;
open (my $fh, $matrix_file);
my $header = <$fh>;
chomp $header;
my @pos_to_col = split(/\t/, $header);
my $check_column_ordering_flag = 0;
foreach my $sample (@pos_to_col) {
if (exists $matrix{$sample}) {
die "Error, already encountered column header: $sample, cannot have redundant column names across matrices.";
}
}
while (<$fh>) {
chomp;
my @x = split(/\t/);
unless ($check_column_ordering_flag) {
if (scalar(@x) == scalar(@pos_to_col) + 1) {
## header is offset, as is acceptable by R
## not acceptable here. fix it:
unshift (@pos_to_col, "");
}
$check_column_ordering_flag = 1;
}
my $gene = $x[0];
$genes{$gene} = 1;
for (my $i = 1; $i <= $#x; $i++) {
my $col = $pos_to_col[$i];
my $val = $x[$i];
$matrix{$col}->{$gene} = $val;
}
}
return %matrix;
}
|