1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361
|
This package contains command line utilities for preprocessing, computing
feature count density (coverage), sorting, and indexing data files.
See also http://www.broadinstitute.org/software/igv/igvtools_commandline.
***************************************************************************
Java 11 is required for this release. See our website for more
information about support for Java 8.
***************************************************************************
---------------------------------------------------------------------------
Starting with shell scripts
---------------------------------------------------------------------------
The utilities are invoked from one of the following scripts:
igvtools (command line version for Linux and macOS 10.x)
igvtools_gui (gui version for Linux and macOS 10.x)
igvtools_gui_hidpi (HiDPI gui version for Linux)
igvtools_gui.command (alternative double-clickable gui version for macOS 10.x)
igvtools.bat (command line version for Windows)
igvtools_gui.bat (gui version for Windows)
The general form of the command-line version is:
igvtools [command] [options][arguments]
or
igvtools.bat [command] [options][arguments]
Recognized commands, options,arguments, and file types are described below.
---------------------------------------------------------------------------
Starting with java
---------------------------------------------------------------------------
Igvtools can also be started directly using java as shown below. This option
allows more control over java parameters, such as the maximum memory to
allocate. In the example below igvtools is started with 1500 MB of memory
allocated and launched in the location where you have unpacked IGVTools.
java -Xmx1500m --module-path=lib @igv.args --module=org.igv/org.broad.igv.tools.IgvTools [command] [options][arguments]
To start with a gui the command is
java -Xmx1500m --module-path=lib @igv.args --module=org.igv/org.broad.igv.tools.IgvTools gui
Note that the command line has become more complex with Java 11 compared to
Java 8. We recommend the shell scripts above for most users.
---------------------------------------------------------------------------
Memory settings
---------------------------------------------------------------------------
The scripts above allocate a fixed amount of memory. If this amount is not
available on your platform you will get an obscure error along the lines of
"Could not start the Virtual Machine". If this happens you will need to
edit the scripts to reduce the amount of memory requested, or use the Java
startup option. The memory is set via a "-Xmx" parameter. For example
-Xmx1500m requests 1500 MB, -Xmx1g requests 1 gigabyte.
---------------------------------------------------------------------------
HiDPI settings
---------------------------------------------------------------------------
HiDPI is supported natively by Java on Mac and Windows. Users on these
platforms can ignore this section.
For Linux users, the igvtools_gui_hidpi script is set up for 2x scaling.
To modify it to do 4x scaling, for example, change the value
-Dsun.java2d.uiScale=2
to
-Dsun.java2d.uiScale=4
Fractional values are *NOT* supported at this time.
---------------------------------------------------------------------------
Genome
---------------------------------------------------------------------------
The genome argument in the tile and count command can be either an id, or
a full path to an IGV .genome file. The id for IGV supplied genomes are
listed below. Genome definitions corresponding to these files are in the
"genomes" subdirectory of the igvtools install. The id is derived by removing
the .extension from the filename.
---------------------------------------------------------------------------
COMMANDS
---------------------------------------------------------------------------
The recognized commands are tile, count, sort, and index. Note that these
utilities are for working with ascii file formats, including SAM, but
do not work with BAM files. For manipulating BAM files use samtools (http://samtools.sourceforge.net/).
---------------------------------------------------------------------------
Command "tile"
---------------------------------------------------------------------------
Warning: This command is deprecated. Use "toTDF" instead.
---------------------------------------------------------------------------
Command "toTDF"
---------------------------------------------------------------------------
The "toTDF" command converts a sorted data input file to a binary tiled
data (.tdf) file. Input file formats supported are .wig, .cn, .igv,
and .gct, TCGA mage-tab files, and "list" files.
List files are text files containing a list of files in one of the supported formats,
one file per line. When using a list file the format of the contained files must be
specified explicitly with the "fileType" parameter. List files must end with the
extension ".list". File paths can be absolute or relative to the directory containing
the list file.
Usage:
igvtools toTDF [options] [inputFile] [outputFile] [genome]
Required arguments:
inputFile The input file (see supported formats above).
outputFile Binary output file. Must end in ".tdf".
genome A genome id or filename. See details below. Default is hg18.
Options:
-z, --maxZoom num Specifies the maximum zoom level to precompute. The default
value is 7 and is sufficient for most files. To reduce file
size at the expense of IGV performance this value can be
reduced.
-f, --windowFunctions list A comma delimited list specifying window functions to use
when reducing the data to precomputed tiles. Allowed
values are min, max, mean, median, p2, p10, p90, and p98.
The "p" values represent percentile, so p2=2nd percentile,
etc.
-p, --probeFile file Specifies a "bed" file to be used to map probe identifiers
to locations. This option is useful when preprocessing gct
files. The bed file should contain 4 columns:
chr start end name
where name is the probe name in the gct file.
--fileType Explicitly specify the file type. This is a required parameter for TCGA mage-tab and ".list" files.
Possible values are mage-tab, .wig, .cn, .igv, and .gct. Only mage-tab files downloaded from the
TCGA data center or related sights are supported at this time.
Conversion of ".gct" and "mage-tab" files results in the creation of an ".igv" file, which is sorted by genome
position using the "sort" command. For this case the following optional parameters can be specified.
-t, --tmpDir tmpdir Specify a temporary working directory. For large input files
this directory will be used to store intermediate results of
the sort. The default is the users temp directory.
-m, --maxRecords number The maximum number of records to keep in memory during the
sort. The default value is 500000. Increase this number
if you receive "too many open files" errors. Decrease it
if you experience "out of memory" errors.
Example:
igvtools toTDF -z 5 copyNumberFile.cn copyNumberFile.tdf hg18
Notes:
Data file formats, with the exception of .gct files, must be sorted by
start position. If necessary files can be sorted with the "sort" command
described below. Attempting to preprocess an unsorted file will result
in an error.
---------------------------------------------------------------------------
Command "count"
---------------------------------------------------------------------------
The "count" command computes average feature density over a specified
window size across the genome. Common usages include computing coverage
for alignment files and counting hits in Chip-seq experiments. Supported
file formats are .sam, .bam, .aligned, .sorted.txt, and .bed, and
.bam.list files. The latter format is a plain text file containing a list
of alignment or bed files, one file per line.
Usage:
igvtools count [options] [inputFile] [outputFile] [genome]
Required arguments:
inputFile The input file (see supported formats above).
outputFile Either a binary tdf file, a text wig file, or both. The output file type is determined
by file extension, for example "output.tdf". To output both formats supply two file names
separated by a commas, for example "outputBinary.tdf,outputText.wig". To display feature
intensity in IGV, the density must be computed with this command, and the resulting file
must be named <feature track filename>.tdf.
The special string "stdout" can be used in either position, in which case the output will
be written to the standard output stream in wig format.
genome A genome id or filename. See details below. Default is hg18.
Options:
-z, --maxZoom num Specifies the maximum zoom level to precompute.
-w, --windowSize num The window size over which coverage is averaged. Defaults
to 25 bp.
-e, --extFactor num The read or feature is extended by the specified distance
in bp prior to counting. This option is useful for chip-seq
and rna-seq applications. The value is generally set to the
average fragment length of the library minus the average read length.
--preExtFactor num The read is extended upstream from the 5' end by the specified distance.
--postExtFactor num Effectively overrides the read length, defines the downstream extent
from the 5' end. Intended for use with preExtFactor.
-f, --windowFunctions list A comma delimited list specifying window functions to use
when reducing the data to precomputed tiles. Possible
values are min, max, mean, median, p2, p10, p90, and p98.
The "p" values represent percentile, so p2=2nd percentile,
etc.
--strands [arg] By default, counting is combined among both strands.
This setting outputs the count for each strand separately.
Legal argument values are 'read' or 'first'.
'read' Separates count by 'read' strand, 'first' uses the first in pair strand.
Results are saved in a separate column for .wig output, and a separate track
for TDF output.
--bases Count the occurrence of each base (A,G,C,T,N). Takes no arguments.
Results are saved in a separate column for .wig output, and a separate track for TDF output.
--query [querystring] Only count a specific region. Query string has syntax <chr>:<start>-<end>. e.g. chr1:100-1000.
Input file must be indexed.
--minMapQuality [mqual] Set the minimum mapping quality of reads to include. Default is 0.
--includeDuplicates Include duplicate alignments in count. Default false. If this flag is included, duplicates
are counted. Takes no arguments
--pairs Compute coverage from paired alignments counting the entire insert as covered. When using this option only
reads marked "proper pairs" are used.
Notes:
The input file must be sorted by start position. The samtools package can
be used to sort .bam files. Other files types can be sorted with the "sort"
command (see below).
Example:
igvtools count -z 5 -w 25 -e 250 alignments.bam alignments.cov.tdf hg18
---------------------------------------------------------------------------
Command "sort"
---------------------------------------------------------------------------
Sorts the input file by start position. This command supports the following
file formats: .bed, .gff, .cn, .igv, .sam, and .bam files
Usage:
igvtools sort [options] [inputFile] [outputFile]
The special string "stdout" can be used as [outputFile], in which case the output will
be written to the standard output stream instead of a file.
Options:
-t, --tmpDir tmpdir Specify a temporary working directory. For large input files
this directory will be used to store intermediate results of
the sort. The default is the users temp directory.
-m, --maxRecords number The maximum number of records to keep in memory during the
sort. The default value is 500000. Increase this number
if you receive "too many open files" errors. Decrease it
if you experience "out of memory" errors.
---------------------------------------------------------------------------
Command "index"
---------------------------------------------------------------------------
Creates an index for an alignment or the bed feature file formats. Indexes
required for loading alignment files into IGV, and can significantly
improve performance for large feature files. The input file must be
sorted by start position. This command does not take an output file
argument, rather the filename is generated by appending ".sai" (for alignments)
or ".idx" (for features) to the input filename. IGV relies on this naming
convention to find the index.
Supported file formats include .bed, .gff, .vcf, .sam., and .bam files
Usage:
igvtools index [inputFile]
---------------------------------------------------------------------------
Command "formatexp"
---------------------------------------------------------------------------
Format GCT or RES files for display. This should only be used if the file has not previously been log-transformed and has no negative numbers. The module:
1. Takes the log2 of the data.
2. Computes the median and subtracts it from each log2 probe value (i.e., centers on the median).
3. Computer the MAD (mean absolute deviation) using the definition here: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/mad.html
4. Divides each log2 probe value by the MAD.
Supported input file formats are: .gct and .res
Usage:
igvtools formatexp [inputFile] [outputFile]
---------------------------------------------------------------------------
Command "gui"
---------------------------------------------------------------------------
Start the igvtools gui
Usage:
igvtools gui
---------------------------------------------------------------------------
Command "help"
---------------------------------------------------------------------------
"igvtools help" will display a list of available commands. "igvtools help [command]"
displays help on a particular command.
Example:
igvtools help index
---------------------------------------------------------------------------
Command "version"
---------------------------------------------------------------------------
Prints the igvtools version number.
|