1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
|
/**
* logarithmic k-mer count based on a Bloom filter
* Copyright 2014 BCGSC
*/
#include "config.h"
#include "plc.h"
#include "CountingBloomFilter.h"
#include "Common/IOUtil.h"
#include "Common/Options.h"
#include "Common/StringUtil.h"
#include "DataLayer/Options.h"
#include <cassert>
#include <getopt.h>
#include <iostream>
#include <cstring>
#if _OPENMP
# include <omp.h>
#endif
using namespace std;
#define PROGRAM "logcounter"
static const char VERSION_MESSAGE[] =
PROGRAM " (" PACKAGE_NAME ") " VERSION "\n"
"Written by ?.\n"
"\n"
"Copyright 2014 Canada's Michael Smith Genome Science Centre\n";
static const char USAGE_MESSAGE[] =
"Usage: " PROGRAM " [OPTION]... [READS]...\n"
" -j, --threads=N use N parallel threads [1]\n"
" -k, --kmer=N the size of a k-mer\n"
" -s, --seed=N the seed value used\n"
" -b, --bloom-size=N size of bloom filter [500M]\n"
" --chastity discard unchaste reads [default]\n"
" --no-chastity do not discard unchaste reads\n"
" --trim-masked trim masked bases from the ends of reads\n"
" --no-trim-masked do not trim masked bases from the ends\n"
" of reads [default]\n"
" -q, --trim-quality=N trim bases from the ends of reads whose\n"
" quality is less than the threshold\n"
" --standard-quality zero quality is `!' (33)\n"
" default for FASTQ and SAM files\n"
" --illumina-quality zero quality is `@' (64)\n"
" default for qseq and export files\n"
" -v, --verbose display verbose output\n"
" --help display this help and exit\n"
" --version output version information and exit\n"
"\n"
"Report bugs to <" PACKAGE_BUGREPORT ">.\n";
namespace opt {
/** The number of parallel threads. */
static unsigned threads = 1;
/** The size of the bloom filter in bytes. */
size_t bloomSize = 500 * 1024 * 1024;
/** The size of a k-mer. */
unsigned k;
/** The seed value to use for random number gen **/
unsigned s;
// /** Prefix for output files */
// static string outputPrefix;
}
///** Counters */
//static struct {
//} g_count;
static const char shortopts[] = "j:k:s:q:v";
enum { OPT_HELP = 1, OPT_VERSION };
static const struct option longopts[] = {
{ "bloom-size", required_argument, NULL, 'b' },
{ "threads", required_argument, NULL, 'j' },
{ "kmer", required_argument, NULL, 'k' },
{ "seed", required_argument, NULL, 's' },
{ "chastity", no_argument, &opt::chastityFilter, 1 },
{ "no-chastity", no_argument, &opt::chastityFilter, 0 },
{ "trim-masked", no_argument, &opt::trimMasked, 1 },
{ "no-trim-masked", no_argument, &opt::trimMasked, 0 },
{ "trim-quality", required_argument, NULL, 'q' },
{ "standard-quality", no_argument, &opt::qualityOffset, 33 },
{ "illumina-quality", no_argument, &opt::qualityOffset, 64 },
{ "verbose", no_argument, NULL, 'v' },
{ "help", no_argument, NULL, OPT_HELP },
{ "version", no_argument, NULL, OPT_VERSION },
{ NULL, 0, NULL, 0 }
};
int main(int argc, char** argv)
{
bool die = false;
for (int c; (c = getopt_long(argc, argv,
shortopts, longopts, NULL)) != -1;) {
istringstream arg(optarg != NULL ? optarg : "");
switch (c) {
case '?':
die = true; break;
case 'b':
opt::bloomSize = SIToBytes(arg); break;
case 'j':
arg >> opt::threads; break;
case 'k':
arg >> opt::k; break;
case 's':
arg >> opt::s; break;
case 'q':
arg >> opt::qualityThreshold; break;
case 'v':
opt::verbose++; break;
case OPT_HELP:
cout << USAGE_MESSAGE;
exit(EXIT_SUCCESS);
case OPT_VERSION:
cout << VERSION_MESSAGE;
exit(EXIT_SUCCESS);
}
if (optarg != NULL && (!arg.eof() || arg.fail())) {
cerr << PROGRAM ": invalid option: `-"
<< (char)c << optarg << "'\n";
exit(EXIT_FAILURE);
}
}
if (opt::k == 0) {
cerr << PROGRAM ": missing mandatory option `-k'\n";
die = true;
}
if (die) {
cerr << "Try `" << PROGRAM
<< " --help' for more information.\n";
exit(EXIT_FAILURE);
}
#if _OPENMP
if (opt::threads > 0)
omp_set_num_threads(opt::threads);
#endif
//set seed
srand (opt::s);
Kmer::setLength(opt::k);
assert(opt::bloomSize > 0);
//size determined by
CountingBloomFilter<plc> bloom(opt::bloomSize, 1);
for (int i = optind; i < argc; i++)
Bloom::loadFile(bloom, opt::k, string(argv[i]), opt::verbose);
return 0;
}
|