1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
|
.\" Man page generated from reStructuredText.
.
.TH "HTSEQ-QA" "1" "Oct 01, 2017" "0.6.1p1" "HTSeq"
.SH NAME
htseq-qa \- Perform simple quality assessment of high-throughput sequencing reads
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.sp
The Python script \fBhtseq\-qa\fP takes a file with sequencing reads (either
raw or aligned reads) and produces a PDF file with useful plots to assess
the technical quality of a run.
.SH PLOT
.sp
Here is a typical plot:
[image]
.sp
The plot is made from a SAM file, which contained aligned and unalignable reads.
The left column is made from the non\-aligned, the right column from the aligned
reads. The header informs you about the name of the SAM file, and the number of
reads.
.sp
The upper row shows how often which base was called for each position in the
read. In this sample, the non\-alignable reads have a clear excess in A. The
aligned reads have a balance between complementing reads: A and C (reddish colours)
have equal levels, and so do C and G (greenish colours). The sequences seem to be AT
rich. Furthermore, nearly all aligned reads start with a T, followed by an A, and then,
a C in 70% and an A in 30% of the reads. Such an imbalance would be reason for concern
if it has no good explanation. Here, the reason is that the fragmentation of the sample
was done by enzyme digestion.
.sp
The lower half shows the abundance of base\-call quality scores at the different positions
in the read. Nearly all aligned reads have a quality of 34 over their whole length, while
for the non\-aligned reads, some reads have lower quality scores towards their ends.
.SH USAGE
.sp
Note that \fBhtseq\-qa\fP needs matplotlib to produce the plot, so you need to install this
module, as described \fI\%here\fP on the matplotlib web site.
.sp
After you have installed HTSeq (see install) and matplotlib, you can run \fBhtseq\-qa\fP from
the command line:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
htseq\-qa [options] read_file
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If the file \fBhtseq\-qa\fP is not in your path, you can, alternatively, call the script with
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python \-m HTSeq.scripts.qa [options] read_file
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The \fIread_file\fP is either a FASTQ file or a SAM file. For a SAM file, a plot with two columns
is produced as above, for a FASTQ file, you get only one column.
.sp
The output is written into a file with the same name as \fIread_file\fP, with the suffix \fB\&.pdf\fP
added. View it with a PDF viewer such as the Acrobat Reader.
.SS Options
.INDENT 0.0
.TP
.B \-t <type>, \-\-type=<type>
The file type of the \fIread_file\fP\&. Supported values for \fI<type>\fP are:
.INDENT 7.0
.IP \(bu 2
\fBsam\fP: a SAM file (Note that the \fI\%SAMtools\fP contain Perl scripts to convert
most alignment formats to SAM)
.IP \(bu 2
\fBsolexa\-export\fP: an \fB_export.txt\fP file as produced by the SolexaPipeline
software after aligning with Eland (\fBhtseq\-qa\fP expects the new Solexa quality
encoding as produced by version 1.3 or newer of the SolexaPipeline)
.IP \(bu 2
\fBfastq\fP: a FASTQ file with standard (Sanger or Phred) quality encoding
.IP \(bu 2
\fBsolexa\-fastq\fP: a FASTQ file with Solexa quality encoding, as produced by
the SolexaPipeline after base\-calling with Bustard (\fBhtseq\-qa\fP expects
the new Solexa quality encoding as produced by version 1.3 or newer
of the SolexaPipeline)
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B \-o <outfile>, \-\-outfile=<outfile>
output filename (default is \fI<read_file>\(ga\(ga\fP\&.pdf\(ga\(ga)
.UNINDENT
.INDENT 0.0
.TP
.B \-r <readlen>, \-\-readlength=<readlen>
the maximum read length (when not specified, the
script guesses from the file
.UNINDENT
.INDENT 0.0
.TP
.B \-g <gamma>, \-\-gamma=<gamma>
the gamma factor for the contrast adjustment of the
quality score plot
.UNINDENT
.INDENT 0.0
.TP
.B \-n, \-\-nosplit
do not split reads in unaligned and aligned ones, i.e., produce
a one\-column plot
.UNINDENT
.INDENT 0.0
.TP
.B \-m, \-\-maxqual
the maximum quality score that appears in the data (default: 40)
.UNINDENT
.INDENT 0.0
.TP
.B \-h, \-\-help
Show a usage summary and exit
.UNINDENT
.SH AUTHOR
Simon Anders
.SH COPYRIGHT
2017, Simon Anders
.\" Generated by docutils manpage writer.
.
|