1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
|
#!/bin/bash -e
PROGNAME=$(basename "$0")
function usage() {
echo "
USAGE:
fetch_refseq_bacterial_genomes_by_name.sh -n STRING -o DIR
DESCRIPTION:
Downloads a GenBank record using a partial or complete bacterial species name.
The --min and --max options can be used to restrict the size of the
returned sequences.
REQUIRED ARGUMENTS:
-n, --name STRING
Complete or partial name of the bacterial species.
-m, --min INTEGER
Records with a sequence length shorter than this value will be ignored.
-x, --max INTEGER
Records with a sequence length longer than this value will be ignored.
-o, --output DIR
The output directory to download the GenBank file into.
OPTIONAL ARGUMENTS:
-h, --help
Show this message.
EXAMPLE:
fetch_refseq_bacterial_genomes_by_name.sh -n 'Escherichia*' -o my_project/comparison_genomes
"
}
function error_exit() {
echo "${PROGNAME}: ${1:-"Unknown Error"}" 1>&2
exit 1
}
function remove_trailing_slash() {
string="$1"
new_string=$(echo "$string" | perl -nl -e 's/\/+$//;' -e 'print $_')
echo "$new_string"
}
min_length=""
max_length=""
while [ "$1" != "" ]; do
case $1 in
-n | --name)
shift
name=$1
;;
-o | --directory)
shift
directory=$1
;;
-m | --min)
shift
min_length=$1
;;
-x | --max)
shift
max_length=$1
;;
-h | --help)
usage
exit
;;
*)
usage
exit 1
;;
esac
shift
done
# The CCT_HOME variable must be set
if [ -z "$CCT_HOME" ]; then
error_exit "Please set the \$CCT_HOME environment variable to the path to the cgview_comparison_tool directory."
fi
cct_home=$CCT_HOME
if [ -z "$name" ]; then
error_exit "Please use '-n' to specify a species name. Use '-h' for help."
fi
if [ -z "$directory" ]; then
error_exit "Please use '-o' to specify an output directory. Use '-h' for help."
fi
directory=$(remove_trailing_slash "$directory")
if [ ! -d "$directory" ]; then
mkdir -p "$directory"
fi
query="${name}[Organism] AND nucleotide genome[Filter] AND (bacteria[Filter] OR archaea[Filter]) AND refseq[Filter] NOT wgs[Filter]"
# Add min and max lengths to query
if [ -n "$min_length" ]; then
if [ -n "$max_length" ]; then
query=$query" AND ${min_length}:${max_length}[SLEN]"
else
min_length="$(( min_length - 1 ))"
query=$query" NOT 1:${min_length}[SLEN]"
fi
elif [ -n "$max_length" ]; then
query=$query" AND 1:${max_length}[SLEN]"
fi
perl "${cct_home}/scripts/ncbi_search.pl" -q "$query" -d nucleotide -o "$directory" -s -r gbwithparts -v
echo "Downloaded sequences saved to ${directory}"
|