File: ChangeLog

package info (click to toggle)
emboss 6.6.0%2Bdfsg-15
links: PTS, VCS
area: main
in suites: trixie
size: 571,580 kB
sloc: ansic: 460,579; java: 29,383; perl: 13,573; sh: 12,753; makefile: 3,295; csh: 706; asm: 351; xml: 239; pascal: 237; modula3: 8
file content (4712 lines) | stat: -rw-r--r-- 210,482 bytes
parent folder | download | duplicates (7)
Version 6.6.0 15-Jul-2013
	Restrict and related applications now ignore matches where the
	enzyme site is wider than the sequence length.

	The SRS server at EMBL-EBI no longer serves the EMBL database!
	EBI's SRS server databases in server.srs have been updated to
	reflect their reduced service.

	Reading large sequences is more efficient. Reference counted
	strings are used for output. Where gaps do not need to be
	replaced, a single copy of the sequence string is used for input,
	processing and output.

	New sequence format iguspto supports a variant of the
	intelligenetics format with tolerance for format variants on
	input.

	Calculation of isoelectric point has been updated to use the same
	data values as Expasy and the Open Bio packages. New data file
	Epkexpasy.dat holds the values used by Expasy.

	The final position of the reverse strand is now correctly numbered
	in the output of sixpack and showseq.

	Eukaryote join features in union were not correctly copied after
	subfeatures were implemented to hold exons. The union code now
	correctly relocates subfeatures.

	Complex (join) feature positions were not relocated when the
	parent sequence was trimmed by start and end position. This was
	introduced when subfeatures were implemented, and is now
	corrected.

	New option -methionine for transeq translates any start codon as
	methionine when a specific range is given (including 1 to end) and
	an alternative genetic code is specified.

	Wildcard filenames were broken by the query language rewrite. The
	previous functionality is restored. Any query can use a wildcard
	filename with '*' or '?' characters. The order in which files are
	processed is determined by the operating system.

	Dbxreport and dbxstat now support databases with a dbalias
	(alternative base name for the database files).

	Restriction digest applications occasionally reported more than
	one identical match where several enzymes recognize the same
	target site. The testing of isoschizomers has been improved to
	catch these cases. In practice most runs are with only a few named
	enzymes with different sites.

	Fragment lengths in restrict are now included as extra columns in
	the output, giving the fragments to the 5' and 3' side of each cut
	in the forward strand. Note that the output includes all possible
	cut sites, though it may be impossible for a double digest to
	physically cut at each of two closely spaced sites.

	The -name option of restrict had no effect on report output and
	has been removed.

	Cachedbfetch corrects bad EDAM references to EDAM_syntax: instead
	of EDAM_format: in the definitions returned by EBI's dbfetch
	and wsdbfetch servers.

	Sequence identifiers now remove characters that may confuse output
	file generation, changing to underscore any forward or backslash
	(interpreted as host system paths), commas, semicolons and colons.

	Sequence input now warns for bad sequence characters when the
	format is known. When auto-detecting the format the warnings are
	turned off so that failed formats can silently be ignored, but
	when reading further sequences from the same input file warnings
	are enabled. They can be disabled for individual format parsers by
	passing zero as the format code to seqAppendWarn.

	New nibble (nib) format stores sequence data in half-byte binary
	compressed format. The format is available for input and output,
	but as a binary format can only be read from a file, not from a pipe.

	New GDE format for sequence input and output - a simple format
	with a #id prefix.

	Support added for SwissProt OH (viral host) records.

	New sequence input associated qualifier (available for all
	sequence inputs) -squick reads only the id, accession, description
	and sequence, saving unnecessary parsing of more complex input
	formats such as swissprot, embl and genbank.

	String parsing objects are now reused rather than deleted to save
	memory reallocation in parsing input streams with a large number
	of entries.  Input source code now uses reusable token objects
	cleared only when the program exits.

	Acdpretty now correctly preserves in-line comments in ACD files.

	Efficiency improvements in matching sets of characters in strings,
	especially in functions used for each entry in a large set of
	input sequences.

	New applications xmlget and xmltext read XML data, for example
	from dbfetch:embl which offers emblxml format. Output can be as
	input or in reformatted versions.

	QA tests of EMBASSY applications look in a test/data directory in
	the EMBASSY package as an alternative place for data files
	prefixed by TESTDATA:

	Clustal omega data types added to knowntypes.standard file.

	Ranges can use a syntax of start+len or start,+len to give the
	length rather than the end position. The end is calculated from
	the start and length and used internally. This syntax allows a
	closer fit to the command line of primer3_core in eprimer32 where
	ranges in the native application are always specified as start and
	length.

	List file inputs now report an error if any text follows the first
	token on a line, unless it is a comment following a '#'
	character. Previous versions treated any remaining text as a
	comment and silently ignored it.

	New sequence format iguspto supports a multi-line IG format used
	by the US Patent Office. The multi-line descriptions are preserved
	only if EMBOSS reads and writes in this format. We can add the
	capability to any other multi-line input format where the original
	description lines should be preserved. Other formats treat
	descriptions as a single record to be wrapped where there is a
	maximum record length (e.g. in EMBL format).

	Programs dreg and preg now only report sequences where a pattern
	match was found, which is the same behaviour as fuzznuc, fuzzpro
	and fuzztran.

	New code added to handle xml datatype. Supports multiple named XML
	formats, using the DOM parsers to interpret data. Multiple XML
	input formats are supported, but on output, in the absence of a
	conversion method, the original XML is normally reported as plain
	"xml" format.

	In database definitions, "example" is now a list attribute which
	can appear multiple times, allowing multiple example queries to be
	defined as separate records, with possible documentation following
	a '!' delimiter.

	Showserver now scales the column headers better for long cache
	file names.

	Showdb now displays the taxons, examples, and aliases defined for
	a database. Examples and aliases can be preceded by a count of the
	number of each. All columns are displayed with -full, individual
	elements are controlled by -numtaxons, -taxscope (-taxonomy is a
	database type option) -examples -numexamples -aliases and
	-numaliases.

	Showdb now displays a count of the number of fields in addition to
	the list of field names. New command line qualifier -numfields
	controls the display of the field count.

	Showdb now displays all types defined for a database, separated by
	commas, but will only display a database once so that, for
	example, a protein and protfeatures database will appear in the
	protein database set first (if displayed). If only the features
	databases are displayed then it will appear with them.

	Showdb no longer shows the access levels (id, query and all) by
	default for a database. New command line qualifier -access or the
	existing -full qualifier will show these values.

	Entrez access was specific to sequence data retrieval. Entrez
	server retrievals can now automatically detect ID and accession
	fields and read text entries with textget where a text format is
	available.

	Genbank-related protein formats Refseqp and Genpept are updated to
	process all record types. Genpept feature handling is updated to
	correct the handling of multiple locations by using subfeatures.

	GenBank and Refseq formats now handle the full set of record types
	including common species names, reference details and comments.

	Dbtell -full reports any alias names for a database after the
	definition.

	Dbtell recognizes alias names for a database, reporting the master
	database definition and a comment describing where the alias is
	defined.

	Dbtell -server reports the database definition for a server. All
	attributes are reported in the database definition, whether
	defined for the database or at the server level.

	Servertell -full now reports the definitions of all databases for
	the server, including all aliases defined in the server definition
	file. Without -full an extra comment line in the output suggests
	running with -full for more detailed information.

	Restrict output now sorts by the position closest to the start for
	matches on the reverse strand (for an asymmetric target
	site). This sort change can produce additional matches in the
	output of restover.

	Embossversion is now set to fail with a message if the update
	information URL is unreachable.

	HTTP and FTP error messages were simplified and blank lines removed.

	The valgrind.pl script has a new qualifier -debug which runs the
	test with -debug on the command line.

	Needle, needleall and water now fail with "die" message if there
	is insufficient virtual memory to calculate the alignment between
	two long sequences

	Indexing with the dbx applications miscalculated the secondary
	page capacity when the secondary page size is less than the
	primary page size.

	Ranges in a file can use a dash as a delimiter for the start and
	end positions in addition to white space.

	For all data types, format names can be replaced by EDAM format
	term identifiers, for example 1927 for "embl". The format terms
	are defined in the source code. We will need to define aliases or
	use more complex queries if a format splits into a hierarchy
	but this is unlikely in most cases.

	On FreeBSD systems embossversion source code has quotes corrected
	on the line that reports FreeBSDLF is defined.

Version 6.5.0 15-Jul-2012

	On Windows (mEMBOSS) the user home directory is checked for the
	.embossrc file and .embossdata directory, using emboss.default for
	settings defined for all users.

	Database definitions with multiple types and formats now check
	that there is at least one valid format defined for each type of
	data.

	The qatest.pl script handles references to the user's home
	directory on Windows. "~/" is replaced with the user's home
	directory, with the full path or filename enclosed in quotes.

	The qatest.pl script has a new qualifier -debug which runs the
	test with -debug on the command line. For ACD utility tests the
	application name is taken from the first command line parameter
	and will not match the debug file so these will give an error for
	an unknown .dbg file. For all other tests this is a simple way to
	obtain debug output for a problematic test result.

	EMBOSS supports soap protocol access using the Apache axis2c
	library.  We use version 1.6.0 for testing. Installation can be
	tricky on some systems. We are happy to help with anyone who finds
	problems. A copy of the library is included in the initial 6.5.0.0
	mEMBOSS build.

	Date parsing for EMBL, GenBank, SwissProt, Refseq and related
	formats has been made more robust.

	New application embossupdate checks for the availability of an
	updated EMBOSS distribution or patches from the EMBOSS website and
	FTP server. Embossupdate can be run at the end of a successful
	installation or reinstallation. We hope this will help our
	users to keep their versions up to date more easily.

	Feature data can be read from PIR and GCG formatted databases.

	EDAM is updated to release 1.1. EDAM is used to define EMBOSS and
	EMBASSY applications, to describe EMBOSS defined databases and
	entries in the DRCAT data resource catalogue. This is a prerelease
	from the EDAM team to ensure EMBOSS has the most recent set of
	terms.

	Lists and tables now support very large numbers, requiring long
	integers (datatype ajulong) to represent the return values from
	ajListGetLength, ajTableGetLength and ajTableGetSize. Further
	extensions are planned in future releases.

	Directory inputs now interpret ~/ or ~user/ in the user response
	in the same way as file inputs.

	Application embossversion -full now reports the versions of all
	libraries, and all configuration settings used to compile EMBOSS,
	plus the sizes of standard data types.

	Dbxfasta has a new format "idsv" which finds sequence version
	values if the accession number has a .number suffix.

	Dbxflat creates a sequence version for UniProt entries using the
	accession number and the sequence version from the DT records.

	Dbx indexing stores secondary reference file positions only if the
	database has more than one data file per entry. The entries file
	records the number of files in the database and can if needed
	store more than one reference file. Identifiers indexes can store
	more entries per page for databases with one file (embl, uniprot),
	but support reference files for gcg, pir and taxonomy indexing.

	Dbx indexing supports separate caches for primary and secondary
	pages. Larger caches can reduce the number of physical reads and
	writes at the cost of a small increase in CPU time.  The organism
	and description indexes for large databases can have terms that
	appear in a very large number of entries (e.g. 'protein' in
	UniProt or 'bacteria' in EMBL). Secondary cache sizes up to 100k
	can be used to try to reduce the physical page rewrites needed as
	these indexes grow.

	Dbx indexing supports a smaller size for secondary index
	pages. These hold the lists of entry ids for indexed strings, and
	the file offsets for non-unique identifiers (e.g. secondary
	accession numbers). The environment variable EMBOSS_SECPAGESIZE
	defaults to 512, a quarter of the EMBOSS_PAGESIZE value of 2048.
	Resource definitions can specify field-specific secondary page
	sizes using, for example accsecpagesize: "256"

	Dbx indexing applications (dbxflat, dbxfasta, dbxgcg, dbxedam,
	dbxobo, dbxresource, dbxtax) secondary index files (e.g. keyword,
	taxonomy and description indexes) are more compact. The entry ids
	for each keyword are stored as a simple list unless more than one
	index page is needed. As most indexed tokens are in only a few
	entries this saves many pages while the index is being built. The
	compressed index size is also smaller.

	Dbxflat, dbxfasta and dbxgcg now report index terms that exceed the
	maximum length (attributes idlen, acclen, deslen, orglen, keylen,
	svlen, gilen). Each term beyond the current maximum is
	reported. When the run is completed, the longest term length for
	each index field is reported so that excessively large values can
	be reduced.

	Dbxflat dbxfasta and dbxgcg have improved memory efficiency on
	large indexing runs. Many more internal data structures are reused
	in the parsers.

	Window length options are renamed to -window consistently across
	all EMBOSS applications. The change applies to pepwindow and
	pepwindowall

	Multiple inputs to einverted gave inconsistent results as two
	internal variables were not reset for each new sequence.

	Resource definitions for uniprot (swissresource) and embl
	(emblresource) are updated to allow the maximum size for database
	index keys. If the database contains longer values in future they
	will be truncated and the maximum size found by the parser will be
	reported by dbxflat.

	New resource definitions chebiresource and sworesource are
	provided in emboss.standard to index ontologies with
	exceptionally large index keys.

	Ontologies CHEBI, ECO, GO, PW, RO, SO are updated.

	Ontology SWO is added. This is the software ontology, in its OBO
	format. Some identifiers are really URLs.

	Sequence and other databases with an organism ('org') or taxonomy
	('tax') index can restrict retrieval to one or more indexed
	organism names or any other indexed level in the
	taxonomy. Examples include EMBL or UniProt whether indexed locally
	with dbxflat or accessed through the EBI's SRS server as srs:embl
	or srs:uniprot. A new database attribute 'organisms' can be used
	to define one or more organisms or taxonomy levels to restrict
	data retrieval from the master index of the complete file. A value
	using EMBOSS query syntax of "rattus|mus" will allow data from
	both genera to be retrieved. Values can also be separated by tabs,
	commas ',' or semicolons ';' As organisms can include spaces we
	chose not to allow space as a delimiter. The organisms attribute
	is implemented for method "emboss" and "srswww" to allow remote
	retrieval. We can implement organisms for other access methods if
	there is a demand from the user community,

	Ontology databases can combine more than one branch of an ontology
	in a single file. Examples include the Gene Ontology (GO) with
	namespaces for cellular_location molecular_function and
	biological_process and EDAM with data, format, identifier,
	operation and topic. A new database attribute 'namespace' can be
	used to define one or more namespaces to restrict data retrieval
	from the master index of the complete file. This is tricky for
	EDAM data which is in the data or identifier namespaces. A value
	using EMBOSS query syntax of "data|identifier" or spaced with
	"data identifier" will allow data from both namespaces to be
	retrieved. The namespace attribute is implemented for method
	"emboss" (how the ontologies are indexed in the distribution) and
	"srswww" to allow remote retrieval. We can implement namespace for
	other access methods if there is a demand from the user community,

	EDAM release 1.0 is included. Major changes were needed to EMBOSS
	internals as the identifiers are all changed (different term ID
	number and different prefix). ACD files and the DRCAT data
	resource catalogue are updated with the nearest equivalent terms
	from EDAM 1.0.

	Assembly data is now loaded a few records at a time using a new
	"loader" object. This allows very large files to be processed in
	chunks.

	Variation data is now loaded a few records at a time using a new
	"loader" object. This allows very large files to be processed in
	chunks.

	Support for BioPerl/Open-Bio OBDA flatfile indexes is included as
	database access method 'obda'. The indexing in BioPerl 1.6 is
	broken for EMBL as the semicolon is not removed from
	identifiers. The secondary index files have duplicated
	records. Both problems should be fixed in a future BioPerl
	release. Note also that OBDA indexing parses only the primary
	accession number so that other accessions are not retrievable from
	OBDA index files.

	EMBL entries with a single (source) feature could ignore the
	feature.

	Output files for fuzznuc, fuzzpro, fuzztran, dreg and preg
	included the pattern name and the pattern string in the last
	release. The output format is changed to remove the space between
	the pattern name and string so that parsers see the expected number
	of space-delimited fields in the output.

	The query language parser has been rewritten to handle the new
	-iquery and -ioffset qualifiers. Badly formed queries may now
	produce different error messages.

	Any input type that uses queries, with the exception or URL
	inputs, can use two new associated qualifiers. -ioffset is the
	initial non-zero offset when reading from a file or a
	URL. -iquery if the query field which can be applied to an FTP or
	HTTP URL or to any query in a list file. These names also apply to
	sequence and feature input where other qualifiers begin with 's'
	and 'f' respectively.

	FTP and HTTP URLs can now be used directly as input queries for
	all data types in place of file names. EMBOSS automatically
	detects the ftp:// or http:// prefix and uses the appropriate
	protocol. Any query or offset is ignored as there is no way to
	distinguish these from a genuine part of the URL.

	Patterns for fuzznuc, fuzzpro and fuzztran can include escaped
	codes to skip the expansion of ambiguity codes and look for them
	explicitly in the input. A backslash (shells may need two) before
	the code specifies an exact match, for example \S will only match
	S in the input.

	Patterns for fuzznuc with ambiguity codes are now expanded to
	include the ambiguity code (and any overlapped ambiguity
	codes). For example, S matches [GCS] and B (not A) matches
	[TGCBSYK]

	A new AJAX source file ajtagval.c handles general tag-value pairs
	of strings which have uses beyond feature internals.

	Pepwheel can plot up to 5 sets of residues, with a total of
	"steps" at each level. Leucine zipper plots with a step of 7 and 2
	turns required more residues to be visible. The updated pepwheel
	rescales the size of the inner wheel to allow more residues to be
	displayed.

	Sequence and assembly reading in BAM format always fails if no
	match as found in the first pass - attempting to read again could
	loop with the same result as the file is rewound. Rereading is
	intended for text formats such as FASTA where the next entry may
	match.

	Header files in AJAX and NUCLEUS have been cleaned to remove
	redundant references. A new include file ajlib.h includes the core
	set of ajdefine, ajarch, ajmem, ajmess, ajfmt and ajstr which were
	almost universally included. Applications are expected to use
	emboss.h as their only include, but references to ajax.h and
	emboss.h in the libraries are now all replaced with the minimally
	required set of include files.

	The server.entrez file has been updated using a script
	serverentrez.pl which queries Eutils to obtain a list of database
	names and fields. An internal array is used to define the
	datatypes and formats for each database as these are defined only
	in a series of HTML tables in other pages.

	Reading from the NCBI Entrez server failed. The cause was trimming
	newlines from a reference-counted string where the data returned
	has CR-LF format but only one character was removed.

	New xygraph output device support for datafile formats. "bedgraph"
	outputs in BedGraph format. "wig" outputs in Wiggle format.

	The "sequence" attribute is implemented for xygraph outputs.  If
	set true, the X-axis label defaults to the name of the first input
	and the source name used in datafile outputs is also the name of
	the first input.

	Dottup and dotmatcher now have the first sequence on the X axis
	and the second on the Y axis. This follows standards for datafile
	output of graphical data which default to the X axis relating to
	the first input sequence.

	Dbx index files from earlier releases defaulted to "secondary"
	indexes. The test for an index with no "Type" parameter defined
	now picks up the standard Identifier indexed fields (id, acc, sv
	and gi) correctly. The files were identified by field name, but
	the test was using the file extension.

	Fuzznuc, fuzzpro, fuzztran, dreg and preg when searching with a
	regular expression found only the largest possible match at each
	start position. A new function in recent releases of the PCRE
	regular expression library supports searching for all matches
	using function ajRegExecallC instead of ajRegExecC. These
	applications can now find all overlapping matches to a pattern
	using a regular expression.

	The PCRE library is updated to include the pcre_dfa_exec
	function. This is called by ajRegExecall and ajRegExecallC. The
	regular expression can be compiled as usual. The new calls set an
	internal value to the number of matches found, retrievable by
	ajRegGetMatches. Offsets (ajRegOffsetI) and substrings (ajRegSubI)
	return these matches, starting at zero which is the longest match
	(the same as in ajRegExec). Any shorter matches with the same
	start are stored in place of bracketed substrings.

	Prettyplot options are changed to remove dependencies on other
	options. Option -plurality (which depended on the sequence
	alignment weight or the number of input sequences) is now -ratio
	with a default of 0.5. This is exactly equivalent to the default
	-plurality value or half the total weight. Option -resbreak is
	replaced by -blocksperline with a default value of 1. This has the
	same default output as the -resbreak option which defaulted to the
	-residuesperline value.

	All header files now have an @include comment block which includes
	the LGPL licence and RCS tags. Header files are commented in
	consistent sections. The C++ compile extern wrapper for C
	declarations is now a macro to avoid indentation issues in emacs
	and other editors.

	All obsolete functions are moved to the end of source files and
	wrapped in an #ifdef AJ__COMPILE_DEPRECATED block. The configure
	option --enable-buildalldeprecated includes these functions in
	compilation. Functions described in the 6.2.0 books are included
	in a similar AJ__COMPILE_DEPRECATED_BOOK block and built with the
	--enable-buildbookdeprecated configure option.

	Diffseq produced incorrect results when reporting an insertion in
	the second sequence. The error was introduced in release 6.0.0. It
	is fixed by defining a "between" location for the insert site in
	the first sequence, and by adding support for "between" features
	to diffseq and other report formats. A new constructor
	ajFeatNewBetween with one position makes creating such features
	easier.

	New function ajListDrop removes a node from a list by searching
	for its address.

	Test data includes a new EMBL data file syn.dat containing a
	circular sequence.

	GFF3 input combines features with the same ID under a generated
	parent so that features can be linked as subfeatures and sorted
	together. These features are identified by the Flags attribute and
	excluded from GFF3 output.

	GFF3 output is required to use different feature types for
	parent and child. This is broken by the annotated parent feature
	we need to represent EMBL/GenBank/DDBJ joins. For these, the
	parent has a new type of biological_region with a new featflag
	type=CDS (for example) so we can restore the correct internal
	representation when reading the GFF3 file.

	A new sequence associated qualifier -scircular defines a sequence
	input as a circular molecule where this is not defined in the
	input format, for example EMBL/Genbank and GFF3 have the
	information but FASTA input does not. For feature input there is a
	new -fcircular qualifier. Any circular definition in a sequence
	format overrides this qualifier. Sequences with features are set
	circular if the feature table input is defined as circular.

	GFF3 format has been corrected using the online GFF3
	validator. Protein feature type names are corrected to use the
	current SO term name. Tags are converted to lower case on output
	and back to standard case on input, for example /EC_number in EMBL
	format, as GFF tags must start in lower case.

	In GFF3 protein features now always use '.' for the
	strand. Previous releases could also write '+'. Both are
	acceptable as input.

	GFF3 and GFF2 scores now use a general floating point format to
	write 4 significant figures (rather than 3 decimal places) to cope
	with very large and very small score values. Trailing zeroes
	after the decimal point are omitted in this format. A score of
	zero is written as a dot (missing value).

	Sequence queries can use two alternative syntaxes for sequence
	ranges.  Appending :start:end allows a syntax similar to DAS
	queries. Appending :start..end allows a syntax similar to
	EMBL/GenBank locations in other entries. Both can be followed by
	:r to reverse the sequence region.

	Sequences and reference sequences can be read from EMBL CON
	division entries by using the same database with an ACC (accession
	number) index to read the sequence fragments defined in the CO
	record(s).

	New code added to handle reference sequences in ajrefseq* source
	files. The AjPRefseq object will hold large reference sequence data
	in managed memory buffers.

	Database definitions can use a new attribute "special" to give a
	name=value definition for any attribute specific to one access
	method. The first instances are SpeciesIdentifier for
	ensemblgenomes databases, and tags for processing assembled entries
	in CON (constructed) entries in EMBL. ConDatabase is the database
	name used, ConField is the index field. By default CON entries use
	the ACC field of the same database.

	Standardized all licensing references in the libraries to GNU Lesser
	GPL version 2.1. Added CVS keywords to record the CVS file
	version, and the date and user of the latest commit.

	Microbial genomes in ensemblgenomes have an enumerated species
	code which must be included in an data retrieval request. The
	codes are temporarily added to the comment attribute of the
	databases in the server cache file. This will be replaced by a
	more complete solution in the next release.

	The DRCAT.dat file has a new set of lines to handle Nucleic Acids
	Research classifications. A new NARCat line code is now separately
	parsed by dbxresource into the NAR category name and the URI.

	Long tag values in GFF3 format could exceed limits in the regular
	expression. This is fixed by first testing for and replacing
	escaped quotes and then using a simpler expression to extract
	quoted string values.

	When reading ranges from a file the strings were overwritten by
	the parser.

	Application tcode results disagreed with the original
	publication. The calculation parameters have been corrected.

	EDAM.obo is updated. 28 terms were added. Descriptions were
	updated and names changed.

	Short descriptions of EMBOSS and EMBASSY applications have been
	updated to use consistent terminology and grammar rules.

	Dbxflat failed to parse the organism ('org') field of a GenBank
	entry when another secondary field (keyword or description) was
	also parsed in the same run.

	Dbxflat and dbiflat now use a separate parser for SwissProt format
	data files. Previous releases used the EMBL parser which failed to
	identify the first word in the specially formatted SwissProt
	description records. The change only affects the 'des" index
	field.

	Reading ABI format failed to read the sample name field and
	machine name. The sample name is now correctly parsed. The sample
	name is used by EMBOSS as the sequence identifier.

	Formats specified on the command line were ignored by database
	queries. This behaviour was correct in previous releases where
	only one format was permitted, but is required from 6.4.0 where a
	database may have multiple possible formats. Any format defined
	elsewhere on the command line is now used if there is no format in
	the query string.

	ACD files are stricter in checking ambiguous qualifiers. Options
	that are also a short form of another qualifier now generate
	warnings. These can be turned off with the application attribute
	wrapper: "Y" where a third party command line is wrapped.

	Showfeat had an option -type which was ambiguous. Changed the
	options so those with a match option (-typematch) have a show
	equivalent -typeshow to display the column.

	Emma had options -dend and -slow which were short forms of other
	qualifiers. They are renamed -dendreuse and -slowalign. The old
	qualifier name will now give an "ambiguous qualifier" error
	message and report the new name.

	Eprimer3 and eprimer32 had options -otm and -osize which were
	short forms of other qualifiers, and could cause confusion
	between optimum and oligo values. They are renamed -opttm and
	-optsize.

	Helixturnhelix had an advanced option -sd which was a short form
	of sequence qualifier -sdbname. It is renamed to -sdvalue.

	Prettyplot had an option -box which was a short form of other
	qualifiers. It has been renamed -doboxes to match the related
	qualifier -docolour.

	Showserver had an option -server which was a short form of
	-serverversion (itself named to avoid a clash with -version). This
	option is now renamed -servername.

	Supermatcher and wordfinder had an option -errorfile which was a
	longer form of the standard qualifier -error which can suppress
	the reporting of error messages. The -errorfile qualifiers are
	renamed -errfile.

	Revseq added 'Reversed:' to the sequence description. For use
	cases where the original sequence description is preferred
	(e.g. FASTQ format formatted descriptions) a new -notag option
	retains the original description.

	Cirdna prints text inside solid blocks invisibly. When printed
	outside the text scaling was too small. The text scale is now
	adjusted for the radius and sequence length so that labels should
	be readable outside the box.

	Fuzznuc, fuzzpro and fuzztran using a pattern file ignored the
	command line -mismatch qualifier for the first pattern. The
	default mismatch is now set to this value at the start of the
	pattern matching loop in the library.

	qatest.pl which runs the QA tests now checks for a qatest.dat file
	in the EMBOSS source directory and additional qatest.dat files in
	the test subdirectory for all EMBASSY packages found under the
	source embassy/ directory. By providing individual qatest.dat
	files for each package we can simplify testing for a core
	distribution. Some of the older EMBASSY packages derived from
	domainatrix have cross-dependencies where one test uses the output
	of an application from another package. New AX and AY lines define
	foreign tests which are executed even where a single EMBASSY
	package has been specified with the -embassy=package qualifier on
	the command line.

Version 6.4.0 15-Jul-2011

	DBXFLAT can index FASTQ format short read sequence files, allowing
	individual sequences to be rapidly retrieved by name.

	Genpept format has changed since we last tested it. The LOCUS line
	is simpler. EMBOSS now supports GenPept as documented and
	distributed by NCBI.

	Sequence in SAM format ignores the reference sequence
	name. Previous releases saved it as the accession number, but this
	is inappropriate as it is then reported as the identifier in EMBL
	format.

	The -help output (and documentation) for align and report output
	types now includes the default format if defined in the ACD file.

	New code added to handle variation data in ajvar* source
	files. The AjPVar object will hold genetic variation data from the
	Ensembl API and from VCF input files.

	New access methods for URLs have been added as ajurlread.c and for
	URL output methods as ajurlwrite.c - supporting collecting and
	reporting of URLs as output. URLs are saved as an array of strings,
	intended to be reported as a set of links to the underlying data.

	Sequence format "raw" now only reads binary files, which means it
	cannot be used for piped data. The change was needed to avoid
	accepting binary data where a file has a NULL and then no newline,
	for example ABI data files where the initial 'ABIF' could be read
	as a valid sequence.

	Application tcode failed to plot results for more than one
	sequence.  It also reported a plplot error when reading random
	non-coding input. It also failed to report the threshold lines
	when they were outside the range of observed scores.

	Four new functions combine tables where the keys and values are of
	the same types. In each case the tables are resized to the larger
	of the hash array sizes, and then at each hash array position all
	keys in both tables are compared. The functions differ only in the
	actions taken when a match is or is not found. ajTableMergeAnd
	keeps all keys that are in both tables. ajTableMergeEor is the
	inverse keeping only keys that are in only one table.
	ajTableMergeNot removes keys that are also in the second
	table. ajTableMergeOr adds keys from the second table that do not
	match. All remaining keys and values are deleted using the tables
	built-in destructor functions.

	Some data resource catalogue applications failed when run with the
	-debug option. Their debug calls have been updated.

	New application dbtell reports the attributes for a database.

	All messages written to the user are also logged to the debug file
	to help locate where they are generated when debugging.

	Applications showfeat, extractfeat and coderet are updated to
	follow the new features /subfeatures data structures.

	When using a simple numeric database identifier, the SV field is
	only searched if it is defined.

	Access to local SRS databases created an invalid command line for
	getz with a stray '+' character needed only in the web version.

	Nexus format input can now handle a missing taxlabels block by
	using the matrix block to read sequence names.

	GFF3 tag names are automatically converted to lower case unless
	they match a known GFF3 "special" tag name.

	GFF3 format has been rewritten to comply strictly with the GFF3
	standard on the sequence ontology website. Characters are now
	escaped in tag values. The 'featflag' tag has been changed to
	convert the hex value into a readable list of flags, with some
	flags now inferred from the content of the GFF line. The GFF3
	special tags (all starting with an upper-case letter) are now
	stored separately. The ID and Parent tags are used in
	post-processing to build subfeatures which are stored under the
	feature with an ID matching their first Parent tag.

	GFF3 input requires the optional EMBOSS type comment to identify a
	protein GFF3 file as there is currently no safe way to distinguish
	protein from nucleotide features using only the standard GFF3
	format.

	GFF3 format sequence format failed to read files with additional
	## comment records after the header block. These comments are now
	ignored.

	Feature objects have been extended. A feature may now include a
	list of subfeatures. This is intended to allow exons to be stored
	under the feature to which they belong. With this new structure,
	sorting feature tables becomes easy as there is no need to match
	group tags and sort by ID. Features simply sort by their main
	(parent) feature, with the other subfeatures (exons) unseen by the
	sort algorithm.

	Application restrict crashed when the enzyme list was empty. If
	reported invalid enzyme names, but not 'no enzyme name given'.

	Reference-counted lists are enabled with the constructor
	ajListNewRef creating a reference-counted copy. Lists are only
	deleted when the reference count falls to zero.

	Reference-counted tables are enabled with the constructor
	ajTableNewRef creating a reference-counted copy. Tables are only
	deleted when the reference count falls to zero.

	Table code has been rewritten to automatically delete keys unless
	the table is created with a Const version of the constructor. All
	table constructors are renamed, with the older names retains as
	"deprecated" functions which do not delete keys or values. All
	EMBOSS code has been changed to use the new function names.

	New functions ajTableMatch, ajTableMatchC and ajTableMatchS test a
	key is present in a table. They can be used where the ajTableFetch
	is inadequate because the value may be NULL. Some code used
	ajTableFetchKey but this is intended only for case-insensitive keys.

	Tables (AjPTable) have defined functions to hash and compare
	keys. Two new functions can be defined to delete keys and
	values. By default these are NULL and no keys or values are
	deleted. The functions can be ajMemFree to simply free memory, or
	more complex object destructors. As these require a void** argument
	(all keys and values are void* internally) wrappers are needed
	around object destructors. We recommend appending 'Void' to the
	standard destructor name and casting the void** argument to pass
	to the object-specific destructor.

	Tables (AjPTable) can be resized using the ajTableResizeLen
	function. When adding to a table with ajTablePut the table is
	automatically resized when the number of entries exceeds an
	average of 8 per bucket.

	Function ajMemFree now accepts a void** argument and sets the
	pointer to zero after free the memory. All EMBOSS code calls this
	through the AJFREE macro which is now safer to use as the pointer
	appears only once in the generated code.

	Application digest conflicted with the name of a utility on some
	systems. It has been renamed to pepdigest.

	In the emboss.standard and emboss.default files certain attributes
	can appear more than once if defined as type "ATTR_LIST" in the
	ajnam.c source file. These include a new attribute 'field:' defined
	once for each database query field, superseding the 'fields:'
	list of field names. The 'field:' attribute has a list of field
	names, with the first being the name preferred by EMBOSS and
	others acceptable on the command line. A '!' delimiter marks the
	end of the field names and the start of a free text description.
	This style of description is also allowed for other attributes,
	including 'taxon:' and the 'edam*:' attributes. The syntax is
	taken from the metadata in OBO format.

	Data retrieval using the HTTP protocol now checks for redirects in
	the header and replaces the file buffer with the results from the
	new URL. This allows EMBOSS to read outdated URLs for database
	access.

	New trace functions ajTableFetchTrace and ajTablePutTrace help to
	debug adding new keys to a table.

	New parsing function ajStrTokenNextParseDelimiters returns the
	delimiter string in addition to the token parsed from a string
	token handler.

	Application einverted could report a bad alignment if the matched
	region reached the end of the search window. Matches which go
	beyond the search window are now ignored. This bug was reported
	with a very low threshold score and was unlikely to be noticed
	with the default settings.

	Sequence format treecon failed if the only line of input started
	with a number. Failure to find a second record now simply returns
	false.

	Tables can now use integer keys and values of four types - integer
	and long, signed and unsigned. The unsigned longs are used
	internally for emblcd index reading and for b+tree index creation.

	Report output in from pattern patching applications (fuzznuc,
	fuzzpro, fuzztran, dreg, preg) now includes the pattern as well
	as the pattern name in the '*pat' or 'Pattern_name' feature tag
	value.

	New applications search the EDAM ontology by each of its query
	fields, with common options to restrict the results to one of the
	7 EDAM namespaces. Also new applications to look for EDAM term with
	each of the 5 common relationships for EDAM data terms:
	has_input, has_output, is_identifier_of, is_format_of and
	is_source_of. The sixth relationship has_attribute is only used by
	the obsolete 'entity' namespace terms.

	New application dbxresource indexes the data resource catalogue
	DRCAT.dat which is distributed with EMBOSS. Most fields in DRCAT
	are indexed. The EDAM and Taxon fields are used by other
	applications to search the EDAM and TAXON databases for terms which
	are in turn used to select DRCAT entries by taxon, data type,
	format, identifier and resource.

	Any menu (list and selection ACD types) which allows all options
	to be selected now accepts "*" to select everything. This can be
	the default (e.g. for database index fields) or can be specified
	by the user with quotes to protect it from interpretation by the
	Unix shell.

	Tokens indexed with the dbx* programs now have white space indexed
	as underscores. Any index files with spaces in the tokens need to
	be re-indexed. This applies to keyword and organism indexes.

	New code added to handle short read assemblies in ajassem* source
	files. The AjPAssem object will hold large numbers of short reads
	in managed memory buffers.

	New template for adding data types with specific formats for input
	and output and data access methods. These templates are stored in
	ajwxyz* source files with a script newdatatypes.pl to
	automatically create new, properly named, stub functions in the
	AJAX core and ajaxdb libraries.

	Program nthseq now simply reports an error (not a fatal error) if
	too few sequences were read.

	Feature input and output was in one large file. This has now been
	refactored with ajfeatdata.h for the data structures, ajfeatread.c
	for input formats, ajfeatwrite.c for output formats and remaining
	feature object handling code in ajfeat.c.

	New access methods for text have been added as ajtextread.c and
	for text output methods as ajtextwrite.c - supporting text and
	(preserved) HTML and XML output. Text is saved as an array of
	strings, intended to be used as one per input record although
	storing the entire text in the first string is also possible.

	Data queries have been made general. A new AjPQuery object handles
	queries for any datatype, storing a list of field names and
	queries, plus an operator (OR, AND, NOT, EOR, ELSE) for combining
	fields. Previous releases had a hard-coded search for "id or
	accession" which now uses the new query structure. Extensions to
	the query language will allow more complex combinations, and will
	allow any field to be defined for an external data resource
	(e.g. fields for an SRSWWW server).

	All data reading access methods have been restructured. Methods
	that essentially return an open file with the pointer set to the
	start of an entry (which covers most of the original access
	methods) are moved to a new source file ajtextdb.c and use a new
	AjPTextin input object which is included within AjPSeqin for
	sequence input and AjPOboin for OBO term input. These functions
	are generalized for any input data in some text-based file
	format. Sequence access will first check for a text-based access
	method, and then for a sequence-specific method (e.g. ensembl).
	Other input datatypes can do the same. The code for OBO ontology
	terms will use the new text access methods. Code for access to
	other input data types (feature, alignment) will now be relatively
	easy to add. Text retrieval of data from a new list of data
	resources can also use these access methods.

	Program einverted required at least one base between the halves of
	an inverted repeat. Blunt joins are now reported where previous
	versions reported a 2 base gap.

	Error messages from database indexing now include the filename of
	the index file. This is useful when identifying the indexing
	operation where the problem occurred.

	EMBOSS database index files are extended to mark numeric and
	string index pages. In previous releases all were marked as
	strings. Older index files remain valid for sequence retrieval,
	but not for the new dbxreport index analysis application.

	New application dbxreport analyses the contents of an EMBOSS
	index, reporting the numbers of keys of various types, number of
	pages, and percent free space. It also checks that all pages in
	the index have been used and are linked to a higher page.

	New application dbxedam is an extended version of dbxobo which
	also indexes EDAM-specific relationships between terms.

	New application dbxobo indexes OBO format ontology files. Index
	fields are id, acc (alt_id records), name (name and synonym
	records), ns (namespace records), isa (is_a records pointing to
	the parent term) and des (def records).

	EMBOSS database index files include an extra count value
	"fullcount" for the total number of words indexed. The "count"
	value is the number of unique terms (for example, words in
	descriptions or accession numbers).

	EMBOSS database index files include an extra type value "Type"
	with the value "Identifier" for a simple primary identifier such
	as ID or accession, and "Secondary" for an index of secondary terms
	which points to the entry unique ID.

	Database indexing application dbxfasta may corrupt index files with
	long words in the description index. Dbxfasta now checks the
	maximum word length, and as an added safeguard the indexing
	library code also checks and truncates any word longer than the
	maximum.

	New application seqcount returns the number of sequences read.
	This simple application was requested on the EMBOSS mailing list
	to avoid complicated command line manipulations and unnecessary
	sequence output.

	acdpretty now writes lines up to 75 characters wide. The width was
	restricted to 50 to allow space for in-line comments but this
	restricted the length of indented text too severely.

	In emboss.defaults and the user's .embossrc file variables are now
	resolved at read time, including the names of include files. This
	can simplify the configuration files for sites running more than
	one installation.

	Patched: SAM format file entries with negative insert sizes are
	valid but were wrongly rejected.

	Patched: BAM format misread the quality scores. An offset of 33
	used to report values for debugging was incorrectly included in
	the stored values.

	Configuration now uses autoheader and has less dependency
	on the libtool version.

Version 6.3.0 15-Jul-2010
	'ensembl' is a new access method for accessing Ensembl
	from MySQL. Queries take the form:
	   seqret ensembl:human:ENST00000262160
	   seqret ensembl:human:ENST0000026216?
	   seqret ensembl:human:ENSE00001533831
	showing that transcripts, translations and exons are retrievable
	and that partial queries are allowed. Example database
	definitions are given in the emboss.default.template file. Please
	read the note above those definitions regarding fair use of
	the public Ensembl servers.

	'sql' is a new access method for networked SQL servers
	(MySQL or PostgreSQL). The server and database is described
	using the 'url' field. As for biomart (described below) the
	database definition must include definitions of new attributes
	'sequence' (the sequence column) and 'identifier' (the
	column used in the query). Additional columns may be
	returned as description text if they are listed in the 'returns'
	attribute of the DB definition. An example definition is
	given in emboss.default.template.

	tfextract has been updated to deal with multiple pattern lines
	and empty sequence lines.

	Three automatic EMBOSS environment variables are
	added. EMBOSS_INSTALLDIRECTORY is the installation directory
	reported by embossversion -full, EMBOSS_BASEDIRECTORY is the base
	directory reported by embossversion -full, and
	EMBOSS_ROOTDIRECTORY is the root directory reported by
	embossversion-full. These are needed to allow the QA test
	database definitions to point to the test data for the current
	installation, and appear in the test/.embossrc file.

	Validation of EMBL/GenBank feature tables has been updated by
	reading EMBL release 104 (June 2010) and allowing many feature
	qualifier non-standard values that appear in that release.

	Biomart is a new access method for sequence databases, The
	database definition must include definitions of new attributes
	'sequence' (the biomart sequence attribute) and 'identifier' (the
	Biomart identifier attribute). Additional attributes may be
	returned as description text if they are listed in the returns'
	attribute of the DB definition. An example definition is
	given in emboss.default.template.

	Database definitions have a new attribute serverversion which is
	used by SRSWWW access to choose the best way to retrieve data.

	SRSWWW database access, for example from the EBI's srs.ebi.ac.uk
	server, had a problem processing queries returning more than 30
	entries. This is now corrected by first asking the server for the
	number of entries and then accessing the data in chunks. This will
	unfortunately slow down SRSWWW access for single entries but was
	the only solution available after checking with EBI's SRS support
	team.

	Infoseq has a new column "organism" which shows the species line
	from an EMBL or UniProt entry. In a future release this may be
	changed to show the standard name for the NCBI taxon identifier
	from an entry as the species definitions for these databases can
	be long with alternative names and possibly additional species.

	Amino acid 280nm extinction coefficients in file Eamino.dat have
	been adjusted to match those of the Expasy 'protparam' tool.
	Pepstats now reports values with cysteine residues reduced and as
	cysteine bridges.

	Database types, originally defined as simply "N" for nucleotide
	and "P" for protein, should now be named in full. The names are
	expanded automatically when reading the definitions in the
	emboss.default and .embossrc files. Expanding the types allows for
	new database types to be added in the near future.

	EMBOSS can now read and write BAM (binary SAM) sequence files to
	extract all sequences and quality scores, for example to write
	them out in FASTQ format. Although BAM data can also be read
	through a pipe as standard input, in this case the format must be
	specified on the command line as it is not currently possible for
	EMBOSS to read a buffered text file as binary data.

	Needle dynamic programming algorithm updated to allow adjacent
	gaps in opposite strands.

	Rabin-Karp multi pattern search algorithm moved into the nucleus
	library. supermatcher application seed finding step updated to use
	Rabin-Karp multi-pattern search.

	Banded Smith-Waterman algorithm used by supermatcher and
	wordfinder applications has been revised, fixing a problem with
	occasional inconsistent alignments. Basic SAM format support for
	these two applications as well as for the wordmatch
	application. supermatcher assumes the second sequence as the
	reference sequence while wordfinder and wordmatch considers the
	first sequence as the reference sequence.

	The acdvalid application now reads the EDAM (EMBRACE Data and
	Methods) ontology to validate EDAM references in relations
	attributes. All applications are expected to have at least one
	topic and at least one operation term. Other qualifiers can have
	any number of data terms.

	New source file ajtax.c provides parsing and validation for the
	NCBI taxonomy in its .dmp file form. The parser reads all taxonomy
	data into memory. This takes up too much space for practical use,
	so is only intended for subsets. The parser will be reused to
	develop indexing applications to provide fast lookup of taxon
	identifiers.

	New source file ajobo.c provides parsing and validation for OBO
	format ontology files. The parser includes strict warnings
	according to the OBO format documentation, but these can be turned
	off as in many cases the OBO foundry ontologies do not follow the
	exact standard. Examples include terms not in sorted order, and
	Typedef stanzas following Term stanzas, and dbxrefs to
	non-existent terms (e.g. GO:ma in the gene ontology to cite a
	curator).

	Support for PDF and SVG graphic file output has been added. SVG
	requires no additional libraries. PDF support requires the libhpdf
	library (which, somewhat confusingly, is provided by the libharu
	project). EMBOSS will attempt to find the library and development
	files automatically and add PDF support (or not) appropriately.
	However, if libhpdf is in a non-standard place, a --with-hpdf=DIR
	configuration switch can be optionally used.

	The output of showalign has changed. The reference sequence now
	appears at the top, of selected. The ticks and sequence position
	numbering is relative to a selected reference sequence. Gaps
	within the reference appear as '.' and are not counted in
	numbering. End gaps appear as '.' with 'V' and 'v' as the major
	and minor tick marks, and numbering from -1 before the start and
	from +1 after the end of the reference. The additional copy of the
	consensus is no longer reported.

	When reading ABI trace files the quality scores can now be
	read. They are undefined in ABI files, but assumed to be phred
	scores. ABI files can have two sequences and sets of quality
	scores. The first is from the instrument base calling. The second
	is from a second base caller. Where two sets are found, EMBOSS now
	reads the second set.

	Application nospace has a new  -menu option to trim all, trailing, or
	excess whitespace.

	Output type outfileall is obsolete (it is essentially an outfile)
	and has been deleted. No application was using it.

	Input type filelist (comma-delimited list of filenames) now trims
	excess whitespace from the beginning and end of each filename.

	Command line qualifiers with an '=' but no value now have a
	value of an empty string. Previous releases set the value to "="

	The file extension for directory, dirlist and outdir ACD datatypes
	is now a qualifier. This allows it to be defined as a default in
	the ACD file but also substituted by the user. An empty string
	means 'ignore the extension'. To specify 'no extension' a single
	space can be used as the value.

	On the command line, for a parameter (with no qualifier name
	given) a single dot was used as a missing value in previous
	versions. This causes problems when specifying the current
	directory as a dot. On the command line an empty (missing) value
	must now be an empty quoted string '' or "".

	Ampersands in application descriptions have been removed. They
	confuse HTML versions of documentation.

	The QA test script qatest.pl has new options -simple to turn off
	messages when running with a local test file, and -with to cancel
	-without options

	Output redirected to a file can now use ajSysExecOutname functions
	to pass the filename to be used for standard output and possibly
	standard error. The filename is most usefully picked up from a new
	function ajAcdGetOutfileName which closes an ACD outfile and
	returns the name of the file. The file will be empty if simply
	opened, or will have existing contents if the append attribute is
	true in the ACD file.

	The output from tfscan is now in report format, replacing the
	undefined text file produced in previous releases.

	Where a new string is created by ajStrAssignS (the standard string
	copy functions) the reserved space for the string is enough to
	hold the current string value. In past releases the reserved
	memory was the same size as the reserved memory of the string
	being copied. This wasted memory where a large string had a short
	value, especially when copying records read from a buffered input
	file.

	Sequence input formats now turn off buffering of input once they
	can no longer fail (for example, FASTA format after the header
	record will read everything until it finds another header).

	Make ajaxdb code IPv6 compliant. Remove gethostbyname config
	check.

	pcre, expat & zlib include files now install to separate
	subdirectories.

	Showfeat failed to sort features with 'join' locations. The
	sorting is corrected. A future internal change will improve
	feature sorting in all cases.

	Restriction mapping applications now process bad enzyme input
	files without crashing.

	PNG graphics output had an unwanted blank margin that did not
	appear in other output formats. This is now turned off through
	plplot.

	Prettyplot formatting is corrected to improve the centring of
	characters within boxes.

	Restriction mapping applications no longer have an upper limit on
	the number of cuts.

	Warning messages for EMBL format sequences created by ENSEMBL
	have been turned off.

	Corrected references to the EMBL/GenBank feature table
	documentation in ACD files and web pages

	embossversion now reports the setting of debug options, and
	corrects variable name warnrange to acdwarnrange.

	Any numeric ACD type (integer, float, range or array) with
	calculated values for the minimum or maximum attributes can
	potentially have an impossible range (maximum less than minimum)
	at run time. ACD processing now discovers these calculated values,
	and requires a definition for a new attribute 'failrange' If this
	is defined true, a 'failmessage' attribute must also be defined to
	explain why the values are invalid (e.g. input sequence too short
	for the algorithm). If 'failrange' is false, a value for another
	new attribute 'trueminimum' must be set to define which of the
	minimum or maximum values if to be used as the only accepted
	value.

	PNG graphics output had a plplot-defined margin limiting the
	available plot space. This is now removed, allowing applications
	such as prettyplot more space to display results.

	Resource attribute identifier: is obsoleted. No code used it. It
	is no longer allowed in resource definitions.

	Database attributes identifier: description: and command: are
	obsoleted. No code used them. They are no longer allowed in
	database definitions.

Version 6.2.0 15-Jan-2010

	Fixed GFF2 and GFF3 feature formats to always have the start
	position less than the end position for features on the '-'
	strand.

	Updated sequence format refseqp to handle features for proteins in
	the latest release of refseq protein.gpff files.

	A new function ajDebugTest can be used to turn on/off specific
	debug calls. The only argument is a quoted string. A file
	.debugtest in the current directory or the user's home directory
	is read. This contains a list of tokens to be debugged, so
	ajDebugTest returns true if any of these tokens is passed in.
	Optionally, the name in .debugtest can be followed by a number
	which is the maximum number of times that token will be reported.
	ajDebugTest is intended for developers who use ajDebug calls that
	may be expensive or be excessively called.

	Some attributes in ACD files may appear more than once. These
	include any relations: attribute (now being populated with
	references to the new EDAM ontology), the groups attribute for
	applications, the (currently unused) keywords attribute for
	applications, and the external attribute for applications.

	Any external application must now be defined in the ACD file with
	an external: attribute in the application section. The string
	value has the name of the application as the first word, followed
	by a message to be printed if it is not found. When the ACD file
	is parsed, before any user prompts, the external applications are
	searched for by first looking for an environment variable
	EMBOSS_appname and then checking for an executable file in the
	current directory or in the path.

	All applications should be launched by using the name returned by
	the new ajAcdGetPathC or ajAcdGetPathS functions. This ensures the
	application has been found in ACD processing and any
	EMBOSS_appname variable has been tested.

	The acdvalid utility now tests for duplicate attributes.

	Format specifiers for strings and characters (%S, %s and %c) now
	have two flags U (e.g. %US) for uppercase and L for lower case
	output.

	The configure.in and main package Makefile.am files handle
	--enable-devwarnings differently. For the imported libraries this
	level of warning message is turned off. Messages are still
	generated for warnings from the main EMBOSS libraries and
	applications.

	The QA testing script qatest.pl has new options -nocheck to skip
	"make check" applications and -noembassy to skip EMBASSY packages.

	Extractfeat processed failed to accept all features by default.

	Extractfeat failed on reverse direction nucleotide features.

	Coderet miscounted non-coding sequences in the output table.

	Graphics devices now have improved and additional checks. 'tek'
	was rejected as an ambiguous match. 'das' is only valid for an
	xygraph - one based on sequence positions. On Windows (using
	mEMBOSS) the plplot version supports fewer devices and these are
	now excluded from selection.

	The change to graphics library access makes the ajGraphInit call
	which registered graphics functions for use by ACD parsing
	redundant. In its place we need to register data access
	functions. As all applications make use of this, we now include
	this automatically in embInit so there is no longer a need for
	applications to make a separate call before invoking
	code (e.g. ACD parsing) that may require registration of
	functions.

	The AJAX ACD code is now in a separate library. New core library
	functions store and retrieve ACD persistent data such as the
	program name, command line and list of inputs. As ACD is now
	linked separately from core AJAX and the graphics library, the
	callback mechanism for ajGraph functions to be called from ACD is
	no longer needed.

	The database access code in ajseqdb.c has been moved to a separate
	higher level library. This is where we will insert code to access
	the new ensembl library functions in AJAX, and possible future
	data access libraries. A callback mechanism is used so that the
	embInit call automatically registers data access methods to make
	them available within the core library functions that read
	sequences. This allows ajSeqRead to remain in the core library
	while calling database access methods that in turn may invoke
	ensembl access.

	The PCRE (perl-compatible regular expressions) code in AJAX has
	been updated to release 7.9 of PCRE. Previous releases were still
	at version 4.3. The code is standard PCRE code with the LINK_SIZE
	set to 4 bytes to allow matches in long sequences.

	ACD files include relations attributes with text taken from terms
	in the EMBRACE EDAM ontology. These terms are also described in
	the knowntypes.standard file and are matched to the known types
	when validating ACD files.

	EMBOSS now uses a more complete User-Agent string when
	communicating with HTTP servers.

	FASTQ short read sequence formats now read and write faster using
	lookup tables to avoid calculations in the conversion of quality
	scores.

	FASTQ short read formats have additional warning messages for bad
	or incomplete data.

	All sequence input formats now recognize invalid partial entries
	at the end of the input data and report an error message. A
	notable exception is FASTA format where a partial entry is still a
	valid ID line - these will give errors for zero length sequence
	unless empty sequences are allowed.

	Common output formats now write faster, using lightweight output
	functions to copy strings to the output file.

	SwissProt output formats now wrap long OS lines.

	Needle has been updated with end-gap penalties support, allowing
	complete global pairwise alignments. Three new options have been added;
	the endopen and endextend options are used to specify
	the gap opening and extension penalties for the end gaps,
	while the endweight option turns on/off weighting of the end gaps.

	New application needleall for all against all global/overlap
	pairwise alignment of sequences in two multi-sequence files.

	wordmatch updated for multi-sequence files using a modified version
	of the Rabin-Karp algorithm for multi-pattern search. Also added is
	a log file with statistical information on pattern matches.
	The updated wordmatch can, for example, be used for efficiently
	finding multiple patterns in large fastq files.

	Application documentation has a new format HTML table for the
	command line options. This is excluded from the text
	documentation, where the format of the help output is improved.

	Function names standardized for ajcod.c ajrange.c ajtranslate.c
	ajgraph.c ajhist.c and a few other functions renamed. The old
	names continue to work as "deprecated" functions although these
	will generate warning messages with the gcc compiler.

	Infoseq option -version is renamed -seqversion to avoid a clash
	with the new global -version qualifier.

	Three new "make check" applications entrailshtml, entrailsbook and
	entrailswiki generate tables of internal data in HTML, DocBook or
	WikiText formats. These are intended to update the website, books
	and Wiki with the latest internal details. The -tables qualifier
	specifies one or more tables to be printed. By default, all tables
	are produced. The book tables are sorted in format name order.

	Alignment output included headers only for EMBOSS-specific
	formats. The headers have been dropped from the FASTA MARKX0
	through MARKX10 formats to allow standard FASTA suite parsers to
	use the EMBOSS versions of these outputs.

	Fastq-solexa sequence formats converted phred scores of 1 to
	Solexa scores of -6. They now convert to the limit of -5.

	Fastq-sanger sequence format incorrectly stopped when the quality
	scores started with a '@' (phred quality 31).

	Intelligenetics sequence format now correctly ignores additional
	carriage control characters.

	Genbank-like protein formats (genpept and refseqp) failed when
	reading more than one sequence. The input is now buffered when
	the format is automatically reassigned to a related parser.

	The -help output now includes the one-line documentation string
	from the ACD file and the version number information reported by
	--version.

	All applications have a -version (or --version) qualifier which
	will report the EMBOSS version number. For EMBASSY applications it
	will also report the EMBASSY package version number as
	"PACKAGE:version". All EMBASSY applications need to call embInitP
	with an additional parameter of VERSION which will be defined
	automatically by the configure.in template. If the "versionnumber"
	attribute is defined in the ACD file this will also be reported as
	the application version "progamname:version"

	The ACD application attribute "version:" is renamed
	"versionnumber:" to avoid a name clash with the new -version
	qualifier. We need to use the qualifier name "-version" for
	compatibility with other systems and applications, so the renaming
	of the attribute is unavoidable. We believe it was only used (as
	originally intended) for the definition of external applications
	by SoapLab.

Version 6.1.0 15-Jul-2009

	New application showpep displays protein sequences. Showseq is now
	limited to nucleotide sequences. Many of the showseq options are
	not appropriate for proteins. Showpep makes the remaining showseq
	options available.

	A new data structure AjPSeqXref holds details of cross-references
	between a sequence object and any other data resource. The
	cross-reference attributes include a type to indicate the source
	of the cross-reference, for example XREF_DR for a reference in a
	DR line from EMBL or Swiss-Prot. The other attributes are the
	database name and up to 4 identifiers (as in the Swiss-Prot DR
	line definition) and a start and end position where the source is
	a feature table entry.

	When reading a sequence with an identifiable species, attempts are
	made to define the NCBI taxonomy identifier for the
	species. Possible sources include the OX line in Swiss-Prot, the
	taxon cross-reference in the EMBL/GenBank/DDBJ feature table
	(available only if the feature table is read) and the species name
	which can be matched to a set of common species obtained from
	NCBI.

	Swissprot entry descriptions in FASTA output no longer have a
	trailing '.'. Where the source entry has the new Swiss-Prot DE
	line format the name is built from the recommended full name with
	other names in round brackets.

	Binary files now consistently have null characters after strings
	to pad them to full length. Previous versions wrote whatever
	followed the NULL in the string object. The resulting files now
	look cleaner although any extra characters were always ignored
	when reading dbi index files.

	Test databases were updated on 24th June 2009.

	Blank lines are ignored before any sequence input. This is to
	support the use of seqret to read data pasted into web forms where
	extra blank lines are often accidentally included.

	FASTQ is now a valid sequence format and can be detected
	automatically. "fastq" format ignores all quality scores as there
	is no automatic and safe way to determine whether scores are for
	Sanger/phred or Illumina/Solexa quality. To read the quality
	scores we support formats "fastq-sanger" and "fastq-illumina". We
	also support "fastq-int" to read quality scores as integers. These
	scores are assumed to be Sanger quality. For Illumina quality
	scores out of range, a warning message is written once for each
	sequence. Sanger scores do not have out of range values as they
	allow the full set of quality characters, although high values
	(over 40) should only appear for contig consensus sequences.

	MEGA format has been rewritten to support the file format used by
	MEGA 4. Title can be in mixed case. Format and Gene/domain command
	lines are processed. Multiple gene/domain files are read by EMBOSS
	as separate alignment sets by seqretsetall. This may change in a
	future release as MEGA4 processes them as one alignment with
	annotated gene regions. While EMBOSS has no annotation specific to
	alignments this is a reasonable compromise.

	embossdata will now always return directory listings alphabetically.

	A new ACD function replaces an attribute value with an EMBOSS or
	environment variable. The attribute syntax is (@value:VARNAME).

	Infile datatypes in ACD have a new attribute directory: which
	defines the default directory to be searched. If the user
	specifies an explicit path the directory attribute is ignored.

	Applications writing out multiple sets of sequences now correctly
	reset the sequence output. This only affected one test application
	in EMBOSS 6.0.1 (input type seqsetall and output type seqoutall).

	Applications that use single letter qualifier names (for example
	the HMMERNEW wrappers for HMMER applications) can be confused if a
	single letter qualifier name matches uniquely an associated
	qualifier for a preceding command line qualifier. An additional
	check now ensures that a unique qualifier (for example -o) is
	correctly recognized.

	Global alignments with needle in rare cases missed the optimal
	alignment of the first 2 residues. This was a bug introduced in
	6.0.0.

	When reading data using a launched application, including the SRS
	access method which launches "getz", closing the input without
	reading to the end caused the file close function to loop
	forever. Examples included nthseq and seqret -firstonly both of
	which stop reading when they have reached the nth or first
	sequence. File closing now only waits if the input has reached end
	of file, and has a timeout on the wait to break out of the loop.

	Intelligenetics format sequence files with more than one sequence
	are now read correctly. Where the sequence ends with a number,
	intelligenetics format sequences can now be automatically
	detected.

	Add -methylation option to restrict/restover/remap/showseq
	to simulate (e.g.) dam/dcm restriction enzyme knockouts.

	remap now correctly reports restriction enzymes cutting a
	greater number of times than an optionally-supplied maximum
	value. The primary function of the application was unaffected.

	showfeat has a new option -joinfeatures to display all exons on
	one line for a join feature location. In previous releases this
	was one of the -sort options. It is now possible to use
	-joinfeatures and to select a sort order.

	Installing without X11 (using the --without-x option for
	./configure) used "x11" as the default graphics device in some
	applications. These now use "png" (if available) or "ps".

	needle and water with the -nobrief option repeated report header
	information on the longest and shortest similarity and identities
	because the previous header content was not cleared. This only
	affected results where there was more than one sequence as the
	second input.

	In the EMBL/GenBank feature table the group() and one_of()
	operators are obsolete. They are automatically converted to
	order().

	The command line syntax using the master qualifier name as a
	suffix (for example -sreverse_asequence) ignored the master
	qualifier name and set values for all matching inputs. This syntax
	is intended as a way for wrappers to better control the use of
	associated qualifiers, as it is cleaner than using a numeric
	suffix (-sreverse1 -sreverse2 etc.)

	Using -sreverse on the command line could reverse protein
	sequences for inputs that can read more than one sequence (seqall,
	seqaset, seqsetall). -sreverse is now only set for nucleotide
	sequence inputs. Single sequence inputs correctly ignored the
	-sreverse value.

	Multiple sequence sets can be read as input type seqsetall, but
	when this input was used for a single sequence set input (type
	seqset) all sequence sets were read. seqset input now stops after
	the first set (for example a PHYLIP or MSF alignment).

	Genbank test data had incorrect format. The data was extracted
	from a set of test GCG databases and had spaces in the feature
	locations.

	extractfeat now uses the new feature fetch functions and can
	retrieve features that include joins across entries.

	Feature parsing functions are added to fetch sequences from other
	entries. These depend on reusing the USA of the original sequence,
	with the identifier of the external sequence inserted in place
	of the original. This is known to work for database references and
	flat files.

	coderet was limited to EMBL/GenBank feature tables. It now
	processes any valid feature input including GFF files. The
	previous parsing functions are obsolete and have been removed
	as coderet was the only application calling them.

	Very large pairwise alignments can fail to back trace through the
	alignment because of rounding error. The alignment and traceback
	functions now use double precision to maintain accuracy.

	pepwindow and pepwindowall missed the plot value for the last
	window in the sequence.

	pepwindow and pepwindowall now process sequence ranges -sbegin and
	-send.

	pepwindow and pepwindowall now default to a window length of 19,
	ideal for transmembrane regions. The old default of 7 was short
	and gave noisy results.

	pepwindow and pepwindowall have an extra option -normalize to
	convert the amino acid data in the datafile to mean 0.0 and
	standard deviation 1.0. The default Kyte-Doolittle data is not
	normalized.

	The EMBL/Genbank feature table definitions have been updated to
	version 8.0 (October 2008). Sequence ontology terms are now
	available for all feature types except S_region for which no
	specific SO term exists. S_region is attached to an internal term
	derived from SO:0000301 as a placeholder.

	Programs searching with regular expressions and patterns reported
	the pattern name with '1' added to the end. This was to support
	pattern and regular expression files with multiple patterns. When
	only one pattern is given on the command line the '1' is no longer
	added.

	Programs searching with regular expressions (dreg and preg) missed
	overlapping matches to the pattern. The algorithm now steps
	forward one character from the start of the match and searches
	again. Some regular expressions with wildcards may produce a large
	number of overlapping matches especially in low-complexity regions.

	Protein sequences in GFF format now use GFF3 by default. For
	release 6.0.0 protein sequences were written in GFF2 while the
	GFF3 protein feature definitions were redefined using the Sequence
	Ontology. This process is now completed.

	When a sequence is reversed by revseq the description is tagged
	with "Reversed: " so that the output and any sequence derived from
	it has a note of the history.

	EMBL and GenBank formats when used to read multiple entries failed
	to reset the list of citations. Although the first set of
	citations was reported correctly, all other entries in the same
	run included the citation list from the first entry.

	SwissProt/UniProt entries now preserve the complete entry content
	when read and rewritten. All feature types are preserved and
	feature lines wrap according to the widths in UniProt 14.8. Date
	lines are stored and written. Comments are stored in blocks.
	Database cross-references are stored in a list. The description
	lines are saved in the new SwissProt structure. Tests on a set of
	complex entries confirm that EMBOSS is able to read and write an
	exact copy of this sample set.

	Protein feature keys now use the Sequence Ontology identifiers
	as internal names. This may change the way some feature keys are
	converted between data formats. Protein feature keys have been
	updated to correct some conversions, for example to distinguish
	between "coiled coil" from pepcoil and "random coil" from garnier
	output.

	Fitch sequence format was only able to read a single
	sequence. EMBOSS can now read 'fitch' as a multiple sequence
	format.

	Extractfeat now cleanly processes minscore and maxscore as limits
	on the score. By default any score is allowed if these are
	unchanged. Previous releases required minimum and maximum to be
	equal - or minimum greater than maximum - to permit any feature
	score.

	New feature XML output format DASGFF. Feature output functions
	have a changed interface to pass the AjPFeattabOut object so that
	additional processing can handle the opening and closing of an XML
	output file.

	New sequence output formats "dasdna" and "das" write DASDNA and
	DASSEQUENCE XML outputs. Sequence output functions have a new
	capability to define a Cleanup function to write the final lines
	of an XML output file. The AjPSeqout data structure already has
	the Count attribute needed to identify the first sequence so that
	the XML header can be written.

	New environment variable EMBOSS_ACDFILENAME provides an
	alternative way to set the default output filename for EMBOSS
	applications. If set to true, the filename is used rather than the
	current behaviour of using the first sequence name as the default
	filename. When the filename is used the case of the name is
	preserved.

	Corrected display of exon ranges in showseq. Exons now display in
	their original frame (all were displayed in frame 1 in earlier
	versions). Display of 3-letter amino acid names corrected (but we
	hope nobody is using 3-letter codes any more!)

	Added create attribute for outdir datatype in ACD. If true, the
	output directory will be created if it does not already exist.
	The default is false. output directories must already exist. This
	is the behaviour in previous releases.

	Added attribute aligned for datatype seqoutall in ACD
	files. Applications can write multiple sequences as a seqoutset
	(aligned or unaligned) and can also write seqoutall - writing
	sequences one at a time without first storing them as a set.

	For phylogenetic applications (PHYLIPNEW) reading distance matrix
	files failed for some formats written by other
	applications. Distance matrix input now works for multiple
	matrices in square, upper-triangular and lower-triangular formats.

	The PLPLOT graphics library uses 4 environment variables to allow
	local configuration. EMBOSS uses a local copy in libeplplot. For
	sites that have the native PLPLOT also in use we have renamed the
	environment variables to use the prefix EPLPLOT. This protects
	EMBOSS from any configuration set only for the local plplot.
	The variables are: EPLPLOT_BIN EPLPLOT_LIB EPLPLOT_TCL and
	EPLPLOT_HOME. Versions of EMBOSS up to 2.8.0 defined PLPLOT_LIB
	but this value is now automatically set and the environment
	variable is no longer needed.

	Command line qualifiers are renamed where the first 5 characters
	are the same. These were:
	    eprimer3 major revision of all options
	    est2genome -splice to -usesplace
	    prettyplot -boxcolval to -boxuse
            octanol -*plot to -plot*
            showfeat -match* to -*match; -source to -origin
            showpep -match* to -*match
            showseq -match* to -*match; -source to -origin
	    vectorstrip -vectorfile to -readfile; -linker* to -*linker
	and similar changes for EMBASSY applications.

	ACD processing now objects if two or more qualifiers are not
	unique in the first 6 characters. In a future release we would
	like to reduce this to a 5 character unique name. Several EMBASSY
	applications need to be modified to comply with this requirement.

	MEMENEW updated for meme/mast version 4.0.0. ememe now
	produces fasta, html, text, xml and xsl outputs. A new variant,
	ememetext, produces only the text and fasta outputs.

	DBX index file key deletion code added for ID/ACC/SV/KW/DE/TX
	indexes.

	HTTP access now adds a User-Agent string with the EMBOSS version
	number so that servers can count the number of EMBOSS requests.

	PDB model structures failed to generate a new name for each
	model. Duplicate sequence names are not ideal. The model number
	(from the MODEL record) is now appended to each sequence name in
	"pdb" and "pdbnuc" format. The "pdbseq" and "pdbnucseq" formats
	read a single copy of each sequence from the SEQRES records.

	Added two new PDB formats to read nucleotide data. These are named
	"pdbnuc" and "pdbnucseq". They are not available by default, to
	avoid the problem of reading both protein and nucleotide sequence
	data from a structure file for an oligonucleotide binding protein.

	Alignment outputs now include most of the multiple sequence
	alignment formats that EMBOSS can write. The functions for these
	are trivial to write. New functions can be added to use any
	existing sequence output format for alignments.

	PDB entries can be read in two ways, with two named
	formats. Sequence format "pdb" reads the ATOM records. Sequence
	format "pdbseq" reads the SEQRES records. By default, only "pdb"
	format was used, and could crash on entries where the ATOM records
	were missing. Both formats now fail silently if no sequences are
	found. By default, "pdb" format is used first, and if that fails
	"pdbseq" will be tried.

	The EMBOSS logfile (defined by variable EMBOSS_LOGFILE) now
	reports two extra values: the number of cpu seconds and the
	number of elapsed time seconds.

	Extra stop codons in getorf for ORFs ending close to the end of
	the input sequence no longer appear.

	For optional qualifiers (defined as "nullok" in the ACD file) the
	command line option -no(qualname) was causing output files to
	appear by resetting the value to an empty string, which in turn
	was converted to the default filename. Now -no(qualname) turns off
	any output file defined with nullok, and -(qualname) "" asks for
	an output file that is off by default and uses the default
	filename for it.

	Report output has a new tail format that reports the total
	sequences and total sequence length read by the applications. The
	previous "Total_sequences" report was the number of sequences
	included in the report. This is renamed to "Reported_sequences".
	Where the number of hits was limited by the -rmaxseq or -rmaxall
	options, the number of unreported hits also appears. If the
	rmaxall limit was exceeded, the report tails ends with
	"Maxhits_stop: Y". If the -rmaxseq limit is exceeded, the sequence
	report includes (as before) "HitLimit: max/total"

	Refseq protein and Genpept now use a modified genbank format to
	avoid warnings for "aa" replacing "bp" on the LOCUS line and to
	provide better control over any other differences between
	nucleotide and protein entries. Genbank format automatically calls
	refseqp format if a LOCUS line has "aa".

	Swissprot output was missing a '.' at the end of the organism line.

	vectorstrip failed if the user failed to provide a filename for
	the -vectorsfile option and failed to specify -novectorfile to
	turn off file reading. The ACD file is changed so a vectorsfile is
	required if -vectorfile is true and a check is put into the code
	to catch the problem if the ACD interface changes in future.

	Allow user-defined -carboxyl parameter for iep.

	jaspscan now allows multiple sequences to be scanned.

Version 6.0.0 15-Jul-2008

	New application aligncopy reads a set of aligned sequences and
	prints a report in one of the standard alignment formats that can
	accept the same number of sequences. Pairwise alignment formats
	can only be used if the input has exactly two sequences.

	New application aligncopypair reads a set of aligned sequences and
	prints a report or each pair of aligned sequences in one of the
	standard alignment formats.

	New application featreport reads a sequence and a feature table,
	and writes a report in and of the standard report formats.

	New application featcopy reads and writes a feature table to
	convert feature formats.

	New applications maskambignuc and maskambigprot replace ambiguity
	characters in nucleotide sequences with 'N' and in protein
	sequences with 'X'.

	New application consambig reports an alignment consensus sequence
	using ambiguity characters. The intended use cases are sequencing
	reads and SNP reporting.

	New application sizeseq sorts sequences in ascending or descending
	order of length. This is a port of the application seqsort from
	the domsearch EMBASSY package.

	New application skipredundant uses pairwise sequence matches to
	exclude sequences that are similar from an input set. This is a
	modified version of the application seqnr from the domsearch
	EMBASSY package.

	New applications provide utility functions for former GCG users:
	nohtml removes HTML tags, notab replaces tabs with spaces,
	nospace removes all whitespace from a file, skipspace removes
	extra whitespace from a file.

	Older EMBOSS applications can now generate a warning message
	stating that they are marked as 'obsolete' with an explanation and
	an indication of alternative programs in EMBOSS or in an EMBASSY
	package. This warning can be turned off by defining environment
	variable EMBOSS_WARNOBSOLETE with a value of "N" or by defining
	the same variable in the emboss.defaults or ~/.embossrc files. We
	will begin to mark applications as 'obsolete' in future releases.

	A new EMBASSY package "myembossdemo" contains the demonstration
	applications demoalign, demofeatures, demolist, demoreport,
	demosequence, demostring, demostringnew and demotable that
	illustrate how to use EMBOSS data types in your own
	applications. The myembossdemo package allows novice developers to
	try simple EMBOSS programming. The myemboss package is available
	for adding your own applications. The demo applications are no
	longer distributed with the main EMBOSS package. They were not
	installed and were only built with the "make check" option.

	Application short descriptions have been revised. The minimum
	length of application one line descriptions is increased from 60
	to 70 characters. The descriptions are easier to write. Output
	from wossname can now be 90 characters wide. Interfaces that use
	the description in menus may need to allow some extra space.

	Function names in ajfile.c have been standardized. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference. New source files ajfiledata.c and
	ajfileio.c have been added. The buffered file data structures are
	renamed internally to be more consistent (AjPFileBuff to AjPFilebuff).

	notseq was unable to search for IDs containing '|' characters
	but uses string matching (not regular expressions) and these
	characters are valid in NCBI-style FASTA files if read with the
	"pearson" format which accepts the whole ID string without parsing.

	The sequence alignment code has been updated. Sequence alignments
	with low gap penalties failed to allow two gaps (one in each
	sequence) without a match in between. The embAlign functions are
	now simplified. Scores are returned by the PathCalc functions. The
	Walk functions that walk through the path and return the aligned
	sequences are faster and need fewer parameters. Profile alignments
	occasionally duplicated residues in the sequence around gap
	positions. Fast alignments around a limited width include
	additional residues at each end and require an offset rather than
	separate start positions. The offset if the difference between the
	two start positions used in 5.0.0 and earlier releases.

	Eprimer3 citations are corrected in the help text (from the ACD
	file) and in the documentation. The citation errors were traced to
	the original primer3_core documentation which has now been
	corrected.

	Wordmatch could confuse overlapping matches. It occasionally
	extended the wrong match and missed a corresponding new match.

	Seqmatchall results were correct with the default output
	format which reports match positions, but gave incorrect results
	with some other local alignment formats that include the sequence.
	Seqmatchall now stores alignments in the same way as other local
	alignment applications, and the alignment internals are corrected
	to ensure other applications will not have the same problem.

	Emma was officially supporting clustalw 1.83. Issues with clustalw
	2.0 are now resolved and this version is supported if clustalw2 is
	installed. Emma executes an applications called clustalw (not
	clustalw2) so version 2.0 must be installed under this name or an
	environment variable EMBOSS_CLUSTALW needs to be defined to point
	to the executable clustalw2 file.

	Sequence format "selex" allows invalid sequence data files to be
	accepted as input. Selex format is still available but is no
	longer included in the formats that can be automatically
	detected. When reading selex format data, users need to put
	"-sformat selex" on the command line, or specify "selex::" at the
	from of the USA. See the HMMER (old version EMBASSY package)
	documentation for examples. HMMERNEW (recommended) examples use
	Stockholm format and so are unchanged.

	Program dbxfasta now defaults to a filename of "*.fasta"
	The previous default "*.dat" is not commonly used for FASTA format
	databases.

	Program msbar block mutations were 1 longer than the specified
	block and may crash if the block size was fixed (minimum and
	maximum block sizes the same). This off-by-one error is now
	corrected.

	In GenBank output format, multiple line KEYWORD sections were not
	formatted correctly.

	ACD list and select values (the menus that appear in the user
	prompt) can now have ACD variables. Although useful for local
	application development these are not used in EMBOSS distributed
	ACD files because the variables are difficult for web and GUI
	interfaces to resolve when presenting the menu text.

	List and Table internal data structures are now cached so that
	creating and deleting temporary lists and tables is more efficient.

	In emboss.default database definitions the filename and exclude
	values can be delimited by spaces, commas or semicolons. Previous
	releases used only spaces. Parsing is now consistent with the
	fields definition which allowed all the above characters.

	Protein sequences with pyrrolysine ('O') had 'O' converted to a
	gap because this was a gap character in early versions of
	Phylip. This was patched in 5.0.0 to allow 'O' in UniProt release
	13. The gap character is upper case only, so 'o' was correctly
	read as pyrrolysine.

	Wordfinder used the same descriptions for two pairs of qualifiers.
	The descriptions are changed to make their meaning clear in
	commandline help and in web interfaces.

	New function ajTimeDiff returns the difference in seconds between
	two time values.

	Profiling tests showed that file reading and string handling can
	be made faster. String handling called functions many levels
	deep. Making this code inline and using macro versions improved
	performance for applications (e.g. database indexing) that use
	many string calls. File input requires each input line to be
	copied. Using copy-by-reference (ajStrAssignRef) often makes this
	more efficient. Existing macros now test for undefined strings:
	MAJSTRGETLEN, MAJSTRGETPTR, MAJSTRGETRES and MAJSTRGETUSE. New
	macros are added for string handling: MAJSTRDEL,
	MAJSTRGETUNIQUESTR, MAJSTRCMPC and MAJSTRCMPS.

	Memory management includes new macros AJCRESIZE0 and AJRESIZE0
	provide resize functions that guarantee new memory is set to
	zero. The functions must be given the original allocated size.

	Using the GNU C run-time library, calls to mcheck and mprobe are
	available to test for memory corruption by examining the bytes
	before and after an address allocated by malloc. This can be
	turned on for any application, including Unix commands, with the
	environment variable MALLOC_CHECK_ which has values 0, 1, 2 or
	3. 1 writes to standard error when a problem is found, 2 aborts
	the programs, 3 does both and 0 ignores errors. No recompilation
	is needed for this simple method. EMBOSS now has a ./configure
	option --enable-mprobe which enables two new
	functions. ajMemProbe, passed an address from malloc (AJNEW0,
	AJCNEW0, etc.) tests the bytes before and after and reports any
	errors. The advantage of using ajMemProbe rather than mprobe is
	that a macro MAJMEMPROBE also reports the file and line number
	where it was called. To avoid large numbers of messages (when
	code has problems) a limit can be set with ajMemCheckSetLimit
	after which the program will exit. Note that enable-mprobe is
	incompatible with using valgrind to test for memory leaks - as
	mprobe and mcheck have to look at illegal bytes before and after
	allocated memory blocks. Memory checking is turned on by a call to
	mcheck, passing the function ajMemCheck, in ajnam.c before the
	first memory allocation. If any program calls malloc before
	calling embInit or embInitP this call will fail and issue a
	warning (if compiled with --enable-mprobe). A special call
	ajStrProbe tests any string with mprobe. Special calls ajListProbe
	and ajListProbeData test lists and their contents. For more
	details see http://www.gnu.org/software/libc/manual/

	Protein sequences from the Staden package were read as nucleotide
	because they were missing information on the ID line to identify
	EMBL of SWISSPROT format. The sequences are now tested and
	correctly typed.

	Wordcount now accepts protein sequences as input. Previous
	releases only allowed nucleotide sequences.

	Wordfinder options had the same information prompt. These have
	been changed from "limit" to "minimum" and "maximum" to make their
	function clear.

	Prompting for values from the user now includes a test for
	standard input in use as an input file. If standard input is open,
	the default response is accepted and a message is written to the
	user. This is to avoid problems with command lines that use
	"stdin" as an input and do not include -auto.

	The acdpretty utility can now preserve comments in ACD files.
	Comments are maintained in blocks with blank lines before and
	after. Inline comments are started in column 50 unless they are
	exceptionally long. Comments themselves have white space cleaned
	up but otherwise are not reformatted.

	A new function ajAcdGetValueDefault is added to return the default
	value of an ACD qualifier. This can be combined with
	ajAcdIsUserdefined in wrappers to test for values changed by the
	user.

	Infile qualifiers in ACD have a new attribute "trydefault" which
	allows the default filename to fail. Any filename provided by the
	user has to exist. This was added to support the behaviour of the
	MIRA EMBASSY package. To allow an infile to fail the attribute
	"nullok" also must be set to "Y"

	Applications which produce an output file or graphics often
	created an empty output file when the plot was selected.
	The ACD files have been corrected to only create the file if it
	will be written to. Applications changed are charge, dan,
	freak, hmoment, iep and tcode.

	Whichdb only writes to its output file if -get is false.
	With -get it creates sequences. The outfile is no longer created
	when whichdb is in -get mode.

	String functions corrected so that Case in the name always means
	case-insensitive and works by converting to upper case. Some
	functions were defined the wrong way, with "Case" for the
	case-insensitive form.

	GFF3 format is now the default feature output.

	A new function ajFeatIsCds identifies protein coding nucleotide
	features (CDS) using the SO identifier. A new function
	ajFeattagIsNote identifies feature tags that are for the default
	feature tag.

	Protein features now use the new Sequence Ontology terms defined
	by BioSapiens. These are not yet accepted by GFF3 validators. The
	new SO identifiers are added to protein feature definitions and
	used internally.

	Feature format definitions (the Efeatures and Etags files)
	now allow #include references to other files. This allows a
	standard EMBL and Swissprot feature table definition to be
	included by the internal and GFF definitions. Redefinitions are
	allowed using + and - prefixes to add and remove tags for existing
	feature types.

	GFF3 format feature (and report) output is added.

	A new application "density" has been added. This reports the
	A+C+G+T and AT+GC densities of nucleic acid sequences within
	an adjustable sliding window. Plots of A+C+G+T or AT+GC are
	optionally produced.

	Molecular weight programs (e.g. digest, mowse) now have a
	-mono switch to allow use of monoisotopic weights.
	By default, average molecular weights are used.

	The Eamino.dat format has changed. Molecular weight information
	has been removed and put in its own Emolwt.dat file. This latter
	now allows specification of average and monoisotopic weights. Values
	for hydrogen and oxygen are specified as well as the amino acid weights.

	The library representation of amino acid property information
	has been changed. The EmbPropTable global table has been
	removed and replaced with EmbPPropAmino and EmbPPropMolwt objects.

	Pepcoil now produces a report (replacing a text output) in "motif"
	format. The default is changed to not report non coiled-coil
	regions as they are hard to distinguish in this format.

	The "motif" report format is extended to allow two score positions
	marked with "*" and "+" and labelled internally as "pos" and
	"pos2". No application uses pos2 (it was added for pepcoil, but
	both score maximum positions are always the same)

	A new function ajAcdIsUserdefined allows wrappers to test which
	qualifiers have values changed by the user so that they can use
	shorter command lines to launch the wrapped application.

	jaspscan application added. Scans sequences for transcription
	factors using the JASPAR matrices.

	jaspextract application added to move the JASPAR matrices into the
	EMBOSS data area subdirectories.

	Alignment format "trace" used to display internal data content, is
	renamed to "debug" to be consistent with other formats. A "debug"
	format is added for feature output.

	Application documentation has been updated to remove obsolete
	references to EMBL database identifiers. These are replaced with
	the correct accession numbers.

	Two new entries have been added to the "tembl" test EMBL database
	for use in the QA tests.

	Report output now checks the sequence and feature table type. Is
	the sequence is not a valid protein, protein-only formats (pir,
	swiss) will fail with an error message. Similarly, if the sequence
	is not a valid nucleotide sequence then nucleotide-only formats
	(embl, genbank) will fail with an error message.

	Garnier now uses the correct SwissProt and internal feature keys
	for protein secondary structure. The results will appear much
	better for example as a swissprot feature table. This required
	rewriting of the internals by recoding the secondary structure
	features with a "garnier" tag replacing the previous "helix",
	"sheet", "turns" and "coil" tags. The default output is
	unchanged. The results in other report formats will be changed.

	Silent no longer reports the "Dir" column. This is replaced by the
	new "Strand" column which reports "+" for a forward feature and
	"-" for a reverse feature.

	The following programs have changed default report output, with
	the strand included for nucleotide sequences: equicktandem,
	etandem, fuzznuc, fuzztran, recoder, restrict, silent, tcode,
	twofeat. The strand column can be removed with the new command line
	associated qualifier -norstrandshow.

	Reports for nucleotide sequences have confusing ways to represent
	the start and end positions for features on the complementary
	strand. A strand column has been added to these reports,
	controlled by a new -rstrandshow qualifier and attribute. By
	default the strand is shown for all nucleotide reports (see a list
	of changed program outputs above). The start position is always
	lower than the end position for features on the complementary
	strand indicating the region that should be reversed. In past
	releases the seqtable report format (fuzznuc, dreg, dan)
	confusingly reversed start and end positions to indicate the
	unreported strand. For all report formats (nametable, table) the
	start and end positions are now consistent with nucleotide feature
	formats (gff, embl, genbank).

	Reports from dreg incorrectly reported sequences reversed with the
	-sreverse qualifier.

	Report headers now include the text "(Reversed)" when the input
	sequence(s) are reverse complemented.

	Phylogenetic trees in newick format are now parsed into internal
	trees and converted back for use by Phylip. This allows us to
	read other tree formats and pass them to Phylip (e.g. Nexus)

	Some ACD data types did not allow the input to be NULL because
	extra tests were carried out on the results. These are all cleaned
	up and tested so that they can safely be set to nullok and missing
	in local applications.

	New sequence reading formats for PDB files. By default the ATOM
	records are used (format "pdb"). An alternative format "pdbseq"
	will read the SEQRES records which give the original sequence. The
	ATOM records give the sequence determined from the structure.

	Improved the help text for the -stdout and -filter options to
	explain output files are written to standard output. Some users
	expected graphics output (from plplot) to be controlled.

Version 5.0.0 15-jul-2007

	Extractalign is a new applications to extract regions from a
	sequence alignment in the same way extractseq extracts regions
	from single sequences.

	The MRS server in Nijmegen changed its syntax just before our
	release. A new database access method "MRS3" supports the main
	MRS3 server. We have very little documentation on the changed URL
	query syntax. Access by ID appears to work at this stage. The
	database URL is defined as http://mrs.cmbi.ru.nl/mrs-3/plain.do
	The plain text output is now defined in the URL. The database
	names have all changed on the server. At present the same server
	appears to still support the old MRS access method with the URL
	http://mrs.cmbi.ru.nl/mrs/cgi-bin/mrs.cgi

	ACD parsing now allows square brackets within quoted strings.

	Functions for lists and tables have been renamed to new standard
	naming conventions. Some source files remain to be standardized
	after the release, most importantly ajfile, ajfeat and some
	remaining ajseq source files.

	Warning messages are available for sequence formats that do not
	allow additional characters. The environment variable
	EMBOSS_SEQWARN needs to be set to "Y" to enable warnings. For
	example, EMBL format allows numbers in the sequence records. Fasta
	and related formats now warn for any characters that are not
	whitespace and not known sequence characters. These warnings are
	controlled by an environment variable so they can be disabled (or
	enabled) for specific installations and/or wrappers. We expect
	many cut-and-paste inputs can generate warnings. EMBOSS will
	normally silently remove non-sequence characters.

	Regular expression pattern file names (for dreg and preg) were
	converted to upper case if the ACD file required the patterns to
	be upper case.

	The EMBOSS commandline now accepts gnu-style syntax with
	--qualifier (we allow one or two '-' characters). Users who tried
	this syntax were confused because EMBOSS treated --qualifier as a
	parameter. In many cases it was used as the output filename, which
	would give no error message but make it hard to find the output.

	Antigenic now accepts any protein sequence as input (earlier
	versions did not allow ambiguity codes). B and Z are treated as
	weighted averages of D/N and E/Q. All others are converted to X
	and treated as a weighted average of all values. The data table
	used has no information for selenocysteine or pyrrolysine.

	Dottup is corrected to plot only the selected sequence range. The
	plot lines were 1 residue too long (only noticeable on very short
	sequences).

	Distance matrix data can now read multiple distance matrices from
	a single input file. This is used by three programs (fneighbor,
	ffitch and fkitsch) in the phylipnew EMBASSY package.

	Discrete states input now correctly defaults to all non-space
	characters if no characters attribute is given in the ACD file.
	This was the intention, but two programs (fpars and fdiscboot)
	were instead accepting only 0 and 1. Other phylip programs have
	their discrete state character set specified in the ACD file.

	A new function ajSystemOut calls a system command, and redirects
	standard output to a named file.

	Function names are standardized for the ajsys, ajtime and ajutil
	functions.

	New function ajStrTableFreeKey frees only the key from tables
	where the value is a constant.

	Error messages from reading badly formatted comparison matrix
	files are improved to report the line and the token that failed
	to parse.

	Test data has been updated. EMBL and SwissProt entries are updated
	to the latest versions of these entries. Swnew entries are now a
	selection from the SpTrEmbl subset in UniProt. The wormpep
	database is obsolete. We do not have current data for the gb
	directory which contained GCG reformatted genbank entries.

	NBRF (or PIR) format failed to read some entries from SRSWWW
	servers because the sequence ID does not match if the protein is a
	fragment.

	Efficiency of building large strings is greatly improved by
	doubling the reserved space each time the end is reached. This
	speeds up the reading of all long sequences.

	String function ajStrFmtWrap to wrap strings for output now
	respect newlines in the original string. A new function
	ajStrFmtWrapAt prefers to wrap at a selected character, for
	example ',' for author lists.

	Sequence objects are extended to include the full set of fields
	defined in EMBL, Genbank and UniProt database entries. The "embl"
	"genbank" and "swissprot" formats now read and write all fields,
	so that entries will be rewritten exactly as in the originals
	except for a few minor corrections (extra spaces in feature tables
	are removed). We cannot guarantee that information is preserved
	when writing out in a different format. For example, EMBL and
	Genbank formats do not contain the same information.

	GIF graphics output added where the gd library is a recent enough
	version to provide support.

	The plplot graphics library has been updated to 5.7.2. New files
	are disptab.h pldll.h, file gd.c replaces file gdpng.c and needed
	one change for FREETYPE.

	Infoseq can now optionally display the database name.

	The acdvalid utility warns about qualifier names that do not fit
	the standard naming convention. The messages now include a
	suggested valid name, for example an input file called -sites
	will be suggested as -sitesfile.

	Sequence output in EMBL and SWISS formats now defaults to the new
	format of the databases from 2006. The previous formats are still
	available as "emblold" and "swissold". As sequence input, "embl"
	and "swiss" formats will read both versions of the files.

	Function ajTableRemove deletes an entry in a table, but only
	returns the value. This is replaced by ajTableRemoveKey which also
	returns the original key. The caller now owns both the value and
	the key, and is responsible for deleting them. ajTableRemove is now
	declared obsolete and will be removed from a future release.

	Infoseq by default uses columns with fixed width, but this fails
	to delimit long sequence names (for example, long file names and
	paths). Two changes make this better. Infoseq now inserts a space
	in column-delimited output (the default) when a string fills the
	whole column. It is also now possible to specify a tab as
	delimiter with -nocolumn -delimiter "\t" to return to 3.0.0
	behaviour. This was needed for the W2H interface and maybe some
	other wrappers.

	Renamed libplplot to libeplplot and plplot headers are now
	installed to include/eplplot. This avoids collisions with later
	versions of plplot.

Version 4.1.0 04-mar-2007

	Bugfix 1: graphics output failed to reset the title correctly in
	some applications. Prettyplot and banana badly rescale the output
	from the second page of multipage output. Abiview produced
	additional blank pages with only the title. Abiview also had bugs
	in display when the user changed the window size or asked for
	separate plots for each trace.

	A new ACD attribute outputmodifier: "Y" identifies qualifiers that
	cause the kinds of output changes that can break parsers. An
	obvious example is the -html qualifier on may of the utility
	programs. This attribute is a warning to wrapper developers and
	maintainers that they may want to fix the value of this qualifier
	and not allow users to change it. In some cases (as with toggle
	qualifiers) it may be useful to wrap each possible value
	separately. For example, tfm can run as an HTML version (-html)
	and a text version (-nohtml -nomore).

	Backtranseq now keeps stop positions in the sequence and replaces
	them with the most common stop codon. Previous releases converted
	stops to 'X' and back translated them as 'NNN'.

	Reading sequences in NBRF (or PIR) format now only removes one '*'
	from the end, allowing protein sequences to end with a stop codon.

	Reading NBRF format sequences in FASTA format was retaining a ';'
	in front of the sequence ID. This is now fixed.

	Pattern files and regular expression files now use the -pformat
	and -pname associated qualifiers which were ignored when they
	first appeared in 4.0.0. Pattern file formats are "fasta" for the
	original format in 4.0.0 with FASTA style identifiers, and
	"simple" for files with a single pattern on each line. The format
	defaults to testing the first character for a '>'. The pattern
	name is used to set a name of "name1", "name2" and so on if no
	name is in the FASTA file. By default patterns are called
	pattern1, regular expressions are called "regex1".

	Added a new function to read from a buffered file and trim
	newlines. It was not needed before because input functions were
	doing their own trimming.

	Valgrind memory leak tests now cover all QA tests. The command
	line is captured and used to generate test cases. Script
	valgrind.pl knows about the few cases that need input files copied
	and preprocesses them by name. A few tests can be flagged as
	ignored. This is intended for tests known to run for a very long
	time under valgrind. Memory leaks are fixed for all programs in
	the main EMBOSS package and for the most used ones in the EMBASSY
	packages.

	A new environment variable ACDCOMMANDLINELOG takes a filename as
	its value. This saves the command line equivalent of a program
	run, converting user responses to prompts into their command line
	equivalents. A number of bugs in command line saving for report
	headers were identifier and fixed.

	Two string functions had their names reversed. ajStrRemoveWhite is
	to remove all white space from a string, ajStrRemoveWhiteExcess is
	to remove white space from the ends and replace internal
	whitespace with single spaces. When function names were
	standardized these names were reversed. As function calls were
	converted automatically EMBOSS code worked as before, but
	developers will notice the functions to not behave as
	expected. This is now corrected, and all existing calls in the
	EMBOSS code have been checked and converted.

	Showseq with a sequence end position now stops output at the end
	of the user-specified range, Previous releases printed the whole
	of the line with the last base/residue.

	SRS servers use "gid" as the field name for GI numbers. The field
	name has been changed to allow GI searches with local SRS and
	remote SRSWWW access to Genbank.

	A new configure option for developers --enable-devwarnings
	turns on many more warning messages from the gcc compiler. Not all
	warnings are useful - the less useful gcc options are documented
	(and commented out) in the configure.in file devwarnings section.
	Warnings include missing function prototypes, signed/unsigned
	comparisons, potential loss of precision in casts, use of global
	names (index for example) as variables.

	Function names in ajseqwrite.c have been standardized. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference.

	Edialign is a new application, a port of the DIALIGN2 program by
	B. Morgenstern, using an ACD file written by Guy Bottu.  It takes
	as input nucleic acid or protein sequences and produces as output
	a multiple sequence alignment. The sequences need not be similar
	over their complete length, since the program constructs
	alignments from gapfree pairs of similar segments of the
	sequences.

	Wordfinder is a new application to find word-based matches of
	limited size. It is based on code from supermatcher. The inputs
	are reversed so the query sequence set (unaligned) is compared to
	a streamed database of sequences. (Supermatcher should perhaps
	have its inputs in this order too). Limits are provided for the
	length of the word match and the length of the alignment. The
	default gap penalties are also increased to limit the gaps allowed
	in alignment.

	Word-based algorithms found too many matches where both sequences
	contains runs of X (protein) or N (nucleotide). These are now
	ignored when building the word table.

	Word-based algorithms complained if a sequence was shorter than
	the wordsize. This was a problem for database searches with some
	short sequences present. They now run silently and simply return
	no word matches.

	The EMBL format sequence entry parser was able to read swissprot
	sequence data, but not the feature table. Efficiency improvements
	to set the sequence type to nucleotide for EMBL entries showed
	that swissprot entries were being read by the EMBL parser. A test
	for swissprot protein information on the ID line should redirect
	these entries to the swissprot parser. In previous releases the
	sequence type was not set, so there was no problem with the
	sequence type - although feature lines may not have been readable
	from swissprot format flat files. Database definitions specify the
	swiss or embl format so they are not affected.

	Large sequences were running very slowly. This was traced to the
	way sequence types are tested using regular expressions processed
	by calls to the PCRE library. These calls were replaced by simple
	string functions as they are only testing that a sequence is
	entirely composed of characters from an allowed set. An
	additional speedup was achieved by defining only upper case
	characters as required (almost halving the number of tests) and
	testing the upper case version of the sequence characters.

	Sequence translation in the reverse direction adds extra amino
	acids for partial codons. In the forward direction the overhang
	was miscalculated so these codons were missed. No users have
	complained, probably because in most cases they are translated as
	'X' (it needs a 4-base wobble in the code to convert the first 2
	bases of a codon into a single amino acid).

	Sequence translation was relatively slow, at least on very large
	sequences. Profiling with gprof indicated some changed to reduce
	the number of string handling calls (each was very fast, but
	there was a very large number of calls. The internal tables were
	resized (from 15 elements to 16) for more efficient mapping.

	Parsing NCBI format ID lines saves the database. This is available
	for writing NCBI formatted output ID lines, but is not to be used
	in reporting the USA.

	Added "refseq" as a sequence and feature format. Initially a
	simple alias of GenBank but we may let them diverge later.

	REFSEQ entries have their own idea of what a ProteinID in the
	feature table looks like, as they use REFSEQP protein IDs.
	Validation now allows the third character to be an underscore.

	Large numbers of database files could make the dbi indexing
	programs (dbiflat, dbifasta, dbigcg, dbiblast) fail at the sort
	merge stage when the index files are combined. The sort merge is
	now in 2 steps to limit the number of open files required in the
	system sort utility.

	Added a script emblsplit.pl to split EMBL and UniProt database files
	into 2Gbyte chunks.

	The -sid qualifier now overwrites the sequence id if used. The
	-sid value will be used for creating the output filename and for
	reporting the sequence identifier in output files. For more than
	one sequence as input currently the same ID is used. We may change
	this in future to generate new IDs from this base name.

	New sequence format gifasta is the same as "ncbi" but uses the GI
	number as the identifier. Because the output is the same for both
	formats we have to require -sformat gifasta to be on the
	commandline. The default for such files will remain "ncbi" as the
	automatically processed format. On output if there is no GI number
	a dummy value of "000000" is currently used.

	coderet now writes non-coding sequence to a new output file.

	New feature function ajFeatLocMark marks selected features as
	lower case. Used by coderet to report non-coding regions.

	The help output now correctly reports output sequence default
	filenames.

	Phylip input distance matrices now allow integer values to be
	treated as reals, although there is a possible confusion over
	integer replicate values so the use of a trailing ".0" is strongly
	recommended.

	Sequences with NCBI deflines and no ID after the final "|" were
	using the version part of the seqversion ("1" from "AB123456.1")
	instead of the "AB123456" part to set the ID.

	Graph titles were not standard on the general "graph" type output,
	but are consistent for xygraph outputs. A new attribute gdesc
	defines a prefix for graph titles which can be appended to by the
	calling program, usually with a description of the input (sequence
	USA, input filename). A new call ajGraphSetTitlePlus defines the
	text to add to the gdesc as "[gdesc] of [text]". All graphs were
	standardized except pepinfo which has 10 subplot titles already in
	the intended format. This will be corrected later to have standard
	main titles and shorter subplot titles.

	The version of plplot we use has a bug in calculating character
	sizes where the origin in user units is not the default of
	(0,0). This has been fixed in the plgchrW and plstrlW functions in
	the copy that is included with EMBOSS.

	Dreg and preg ignored sequence begin and end positions. Both
	programs now use the embpatlist function calls to process sequence
	ranges.

	Fuzznuc, fuzzpro and fuzztran lost the ability to use the sequence
	begin and end positions when we switched to pattern lists. This
	has been restored in the pattern list processing code.

	The logfile caused a file close error if it was read only (because
	it had not been successfully opened). Opening the logfile now
	tests the file is writable and ignores logging for a read-only file.

	More case-sensitive sequence comparison and matching functions
	added to be consistent about providing both versions.

	A few sequence databases have no accession number. For these a new
	database attribute hasaccession: "N" in emboss.default prevents
	EMBOSS trying to search the ACC field in addition to the ID field.

	A few databases with duplicate IDs should be treated as
	case-sensitive. The original example was a pdbprot database,
	containing FASTA format sequences of individual chains from PDB
	entries. In PDB, the entry itself is a 4-character string, and the
	chain is a single character A through Z. When an entry has more
	than 26 chains, the next 26 are labelled a through z. Pdbprot
	appends these as _A, _B, etc. PDBPROT is available from some
	public SRS servers - see the official list at
	http://downloads.lionbio.co.uk/publicsrs.html.
	This is resolved by adding a new database attribute caseidmatch in
	emboss.default. A value of "Y" will force EMBOSS to exactly match
	the case of the whole ID. This is done by post-processing and
	rejecting entries with an ID that fails to match.

	The run date included in report output has changed format to have
	the day first and to lose the leading zero when the day is 1st to
	9th of the month.

	Program cpgplot can run on more than one input sequence, but the
	plot failed on the second sequence. Fixing this required adding a
	new function ajGraphDataReplaceI to replace the 1st, 2nd 3rd,
	etc. subgraph. Some memory cleanup was also added to remove
	the replaced graph data objects.

	Programs pepwindow and pepwindowall can now process any
	protein sequence. In previous versions pepwindow was restricted to
	pureprotein (no ambiguity codes) while pepwindowall accepted any
	protein sequence (it has to handle gaps) but was using a score of
	zero for unknown amino acid residues. Changed so that missing amino
	acid values can be filled in using Dayhoff frequency weighted
	averages for B, J and Z and an overall average for X, J and O.

	Program octanol can accept any protein sequence. Interpolated
	values are used for B, Z and J. An average over all values is used
	for X and also for O and U where there is no data. Interpolations
	and averages used the Dayhoff amino acid frequencies.

	Program iep can accept any protein sequence. Ambiguity codes B and
	Z are resolved by converting to the carboxylic acid (D or E) or
	amide (N or Q) according to the Dayhoff amino acid frequencies,
	giving a consistent value for any input protein.

	Sequence set type testing was checking whether the seqset is
	defined as protein but ignoring the type of the first
	sequence. This is now fixed.

	Program tfm looks in the obsolete install directory with the -html
	option. Changed to find the embassy package name from the
	installed ACD file and then to find the installed HTML file. If
	EMBOSS has not been installed, will also search the original
	source files.

	Modified NCBI/FASTA format to preserve the database name from the
	NCBI style ID. The database name is reported in one of the many
	and varied NCBI syntax variants, depending on whether there is a
	version or accession number, and whether there is an EMBOSS
	database name also involved (for example, an entry in a file
	indexed with dbxfasta or dbifasta)

	Modified "pearson" sequence format to keep the FASTA file ID
	complete. For historical reasons GCG-style dbname:id syntax was
	still having the db part trimmed. This will still be trimmed from
	fasta or ncbi format.

	The report for digest has Cterm and Nterm columns capitalized to
	match the rest of the report. Sequence ranges now give correct
	cterm and nterm results.

	The list file Cut.index for codon usage tables was changed to
	remove old file names (commented out list at the end) and to
	remove underscores from the species names.

	Programs water, needle, merger and prophet calculate an internal
	path size from the lengths of the input sequences. For sequences
	that are too long, a fatal error is produced. But if the sequences
	are extremely long, the test failed and the program gave a
	segmentation fault. This fix tests in a different way that will
	catch all cases. (added as a fix to 4.0.0)

	The new MRS access method used a general search. This gave strange
	results when the ID or accession appeared in any other entry. It
	appears that MRS can search for id or accession only. This worked
	on the main MRS server at least. (added as a fix to 4.0.0)

	New database access methods MRS and DBFETCH need to be explicitly
	turned on so that showdb can report them. (added as a fix to
	4.0.0)

	When deleting the last line of buffered input, failed to reset the
	pointer to the last buffered line. This only affected debug
	traces. Unfortunately, the ajFileBuffClear function does call the
	debug trace. In practice we have only seen this bug when
	processing sequence data in EMBL format from an MRS server. (added
	as a fix to 4.0.0)

	Pattern and regular expression searches failed to correctly
	reverse a nucleotide sequence. The change is to use
	ajSeqReverseForce (always reverses the sequence provided) instead
	of ajSeqReverseDo (which only reverses if the reverse flag is
	set). (added as a fix to 4.0.0)

	Reports in list format failed to write a usable USA for "asis"
	sequence input, and incorrectly reported reverse strand nucleotide
	features. (added as a fix to 4.0.0)

	The lists files Matrices.nucleotide, Matrices.protein and
	Matrices.proteinstructure now have comment headers explaining
	their format.  Fixed issues with nucleotide features in the
	reverse direction in reports. The start/end positions were stored
	the wrong way around and then reversed again when reported in one
	of the report formats. However, reporting as EMBL features showed
	the incorrect storage. ajFeatNewII now checks start/end and
	reverses the feature if start is greater than end. ajFeatNewIIRev
	sets the reverse strand and also checks that the start position is
	greater than (or equal to) the end position (added as a fix to 4.0.0)

	To reduce the size of very large reports, for example when fuzznuc
	or fuzzpro run over very large databases, new qualifiers are added
	to report output. -rmaxseq gives the maximum hits for any one
	sequence, -maxall gives the total maximum number of hits. The
	report tail contains a record of the number of hits reported and
	found. The qualifiers are intended for web interfaces to control
	the maximum output they need to report. When the maximum hits
	figure is reached, ajReportWrite returns false so that programs
	can terminate at that point. (added as a fix to 4.0.0)

	Reports now write a header and tail when closed, to make sure that
	all programs will write something to the report file. The default
	header contains the command line provenance, the tail contains the
	number of sequences and hits. (added as a fix to 4.0.0)

Version 4.0.0 15-jul-2006

	The format of the knowntypes.standard file in the emboss/acd
	directory has changed to list the knowntype first, then the
	datatype and finally the description. The file should be sorted by
	knowntype, and any description should not end in "file" so that
	file and directory prompts can be generated.

	Standard prompts can be generated from the knowntype for files,
	directories and other data types. This can reduce the need for
	special information: attributes, but to help those who maintain
	parsers and wrappers we will try to keep an information string in
	the ACD file to match the prompt generated by EMBOSS. Acdvalid
	will report cases where the information string does not match the
	generated prompt. There may be a few cases where two inputs or
	outputs of the same knowntype are needed.

	The output produced by -help provides more information about
	associated qualifiers than the HTML table view (from acdtable)
	which is included in the HTML documentation in the
	distribution. However, there is also a lot of extra information
	in the acdtable output on the default values and the allowed
	values for each qualifier. The -help output is now expanded to
	include all the information provided by the acdtable view. A
	benefit of this is that we can now remove the badly formatted
	acdtable from the text version of the documentation. This is used
	by tfm so the output of the tfm program will now be easier to read.

	The default prompts for input and output files have been very
	simple for the first 10 years. EMBOSS now has a "known type"
	defined for all files in ACD. The known type is now included in
	the automatically generated prompt for input and output files. To
	help in this process, the known type should not have the word
	"file" at the end. This will be added automatically in the prompt.

	Printing with conversion type %g could write extra zeros where the
	decimal point was stripped. In C, %g conversion removes trailing
	zeros and the decimal point if nothing remains after it. The AJAX
	print conversion functions added extra zeros at start of the
	output to extend the result up to the expected width.

        Prophet modified to use an "align:" ACD definition rather than an
	"outfile:".  A bug which was mixing up the name of the profile with
	the name of the sequence has been fixed.

        Simple XML DOM added. This has no additional library
	dependencies. This is a preliminary step in producing (revisiting)
	XML graphics output etc.

        EMBL/Genbank have agreed to add a new amino acid code 'O' for
        pyrrolysine. O has been added to EMBOSS checking for protein
        sequence data, and to the existing data files that contain 'U'
        (selenocysteine). IUPAC/IUBMB has accepted the use of O for protein
        sequences. This means that any alphabetic text is now a valid
        protein sequence. There are 20 naturally occurring amino acids,
        plus 'X' (unknown) 'B' and 'Z' ('D' or 'N' and 'E' or 'Q' for
        analysis of complete digests) 'J' ('I' or 'L' in mass spectrometry)
        plus 'U' (selenocysteine) and 'O' (pyrrolysine). There is a small
        complication - older versions of phylip sometimes use 'O' as a gap
        character. EMBOSS will still allow this in nucleotide sequences.

        New sequence access method "mrs" uses CMBI's "Maarten's Retrieval
        System" http://mrs.cmbi.ru.nl/mrs/cgi-bin/mrs.cgi to query
        databases by ID or accession.

        New sequence access method "dbfetch" uses the EBI's dbfetch REST
        services http://www.ebi.ac.uk/cgi-bin/dbfetch to query databases
        by ID or accession.

	iep changed to allow users to specify number of modified
	(uncharged) lysines and intrachain disulphide bridges. This
	includes extensions to embIep functions to include the two new
	parameters. These updates were provided by Clemens Broger of
	F.Hofmann-La Roche Ltd.

	Changes to splitter and union by Kim Rutherford (Artemis
	maintainer at the Sanger Institute) allow features to be preserve
	for nucleotide sequences. The default operation of both programs
	is unchanged.

	Regular expression pattern lists are accepted by dreg and preg.
	The output reports include pattern names which default to regex1,
	regex2, and so on. The "regex" prefix can be set using the new
	associated qualifier -pname on the command line.

	Prosite pattern lists are accepted by fuzznuc, fuzzpro and fuzztran.
	The output reports include pattern names which default to pattern1,
	pattern2, and so on. The "pattern" prefix can be set using the new
	associated qualifier -pname on the command line.

	Regular expressions have the same syntax as the new pattern
	datatype - they can be in a file, with pattern names, and have a
	qualifier -pname to set the name for a pattern. Regular
	expressions also have a type defined in ACD which can be
	nucleotide (e.g. for dreg), protein (e.g. for preg) and string for
	general patterns. Function ajAcdGetRegexSingle will read a single
	regular expression. ajAcdGetRegex now reads a list of regular
	expressions.

	New ACD pattern type reads a PROSITE style pattern, or @filename
	where filename contains patterns with names in FASTA
	format. Patterns in the file are concatenated if on multiple
	lines. The file may also contain mismatch=n after the ID to set
	the number of mismatches for a pattern. Patterns also have
	associated qualifiers -pmismatch and -pname for the pattern on the
	commandline or all patterns in the file.

	Pattern processing is changed to use lists of patterns, as
	submitted by Henrikki Almusa of Medical in Helsinki. This is
	implemented as new ACD data type "pattern" which required some
	nucleus embPat functions and data types to be moved to AJAX ajPat
	so that they can be called from ajacd.c

        "a2m" alignment format (which is just fasta) is now supported in
	ACD.

        New EMBASSY MEME package containing "wrapper" applications
	providing an EMBOSS-style interface to the applications in
	the original MEME package version 3.0.14 developed by Timothy
	L. Bailey.  The package is fully documented.

	New EMBASSY HMMER package contains "wrapper" applications
	providing an EMBOSS-style interface to the applications in
	the original HMMER package version 2.3.2 developed by Sean Eddy.
	The package is fully documented.

        ACD dirlist: order of list of files is now system-independent.

	fuzztran: now always generates an output file, even if there
	is no data.

	coderet: now writes any permutation of cds, mrna and protein
	sequence output to separate files.  Output file formats may
	be set independently and have the default file extensions of
	"cds", "mrna" and "prot".

	oddcomp: New ACD option to set the window size equal to length
	of the current protein. Code cleaned up.

	Restrict: alphabetic sorting fixed in the case where -limit
	is specified

	Digest changed to add ragging option. Original code was
	contributed by Gregoire R Thomas.

	infoseq: code largely rewritten.  Two new advanced ACD options
	to specify output using a user-defined delimiter or in columns.
	Output much cleaner, e.g. columns are aligned.

	Digest changed to read a sequence stream (earlier versions read
	only one sequence). Code for this was contributed by Henrikki
	Almusa of Medicel in Finland.

	Two new programs makenucseq and makeprotseq have been submitted by
	Henrikki Almusa of Medicel in Finland. They create sets of random
	sequences, Sequence composition can be specified by a codon usage
	file or by pepstats output.

	New format "swissnew", with aliases "swnew" and "swissprotnew",
	added.  UniProt has announced future changes to the UniProt entry
	format, which is still called "swiss" in EMBOSS. The ID line had
	"Reviewed" and "Unreviewed" in place of "STANDARD" and
	"PRELIMINARY", and no longer has the "PRT;" placeholder for the
	EMBL format "division" - now obsolete as EMBL has changed this
	part of their ID line in the latest release. In EMBOSS 4.0.0 we
	replace "STANDARD" with "Unreviewed" as more appropriate to
	entries that come from FASTA files and other sources.

	Programs which analyze nucleotide features now call ajFeatGet
	functions in most places. In previous releases, some of these
	programs used the internal feature data structures directly.

	GFF format feature files are designed for nucleotide
	sequences. EMBOSS supports the use of GFF for protein sequence.

	Feature keys (to use the EMBL/Genbank feature table term) are now
	defined with external names for each format and a list of internal
	names to be used by EMBOSS. This greatly simplified the
	conversion of SwissProt and PIR feature tables. The internal table
	also has a list of aliases. The internal aliases for nucleotide
	features are as far as possible identifiers from the Sequence
	Ontology SOFA (feature annotation) subset. In a few cases, where
	multiple EMBL/Genbank terms map to a single SOFA term, new terms
	have been added to extend the SOFA name uniquely (we simply append
	the EMBL/Genbank feature key).

	MSF format files with more than 5000 sequences were truncated on
	input - only the first 5000 names were being read. This limit has
	been removed. As "emma" uses MSF format for the clustalw run it
	launches, this problem limited emma to 5000 output sequences in
	previous releases.

	The EMBL database has changed its ID line. The new line has
	semicolons after each token, the primary accession instead of the
	ID (there is no ID in the new EMBL format), and the sequence
	version as a number. Internally in EMBOSS we continue to build the
	accnum.n style sequence version. We expect most other packages
	will take some time to change EMBL formats, so for output this is
	called "emblnew" format. As input, "embl" format will accept both
	the old and new style entries. For database indexing, dbiflat and
	dbxflat will read old and new formats as "embl" by looking for SV
	on the ID line. EMBL and EMBLNEW format output is also improved by
	wrapping long DE lines.

	Wossname will now search for each word in a phrase used as the
	search text. By default, all words must match. A new qualifier
	-noallmatch tells wossname to match any word in the
	search. Partial word matches are accepted so "restrict" will match
	"restriction". The search term is also compared to the groups and
	keywords attributes in the ACD file. A new qualifier -showkey will
	report the keywords to help explain why applications were matched.

	All ACD files have a new application attribute keywords: which
	provides keywords to search for in addition to the groups.  This
	is intended for keywords which are hard to include correctly in
	the short description. A file keywords.standard is provided with a
	list of all keywords. this is for use by utilities searching
	programs by keyword, which will be expected to check the groups
	and keywords attributes in a single query.

	Reading a sequence of type "any" sets the sequence type to
	nucleotide by default. Any x or X ambiguity codes will be
	converted to 'n' or 'N' to avoid confusion in programs that will
	convert a second nucleotide sequence (alignment programs, for
	example). X is allowed as an unknown character in nucleotide
	sequences (and N is also allowed as 'any base').

	Stockholm and Selex sequence formats, used mainly by the HMMER and
	HMMERNEW embassy packages, have been corrected for a few cases
	where automatic format detection generated errors.

	Function names in ajseq.c have been standardized. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference.

	Further correction to reversed sequence numbering for local alignments
	from water and supermatcher. For these local alignments all reversed
	alignments were ending at "1" because the end offset was not
	calculated correctly. Matcher called a different function to set
	sequence positions and reported correct positions.

	For alignments with a line of gaps, adjusted the numbering to
	report the last sequence position instead of the next at the start
	of the line.

	Program einverted output is changed to include the sequence ID
	and the program input is changed to process more than one sequence
	as input. The change to the output format was needed to indicate
	which sequence is reported. The program is also speeded up by not
	dynamically resizing the internal arrays used to hold sequence
	positions.

	Added additional information to "entrails" output (entrails is
	built by "make check" and displays internal data to assist
	developers of wrappers and interfaces). The output now includes
	application attributes and reports definitions which are aliases
	(with -full on the commandline).

	Added -mincount option to wordcount to report only words occurring
	a given number of times. The default of 1 does not change the
	previous results.

	Oddcomp had a number of bugs. A window size equal to the sequence
	length resulted in no hits. The word size was used before reading
	the input file. A match in the last possible window was missed.

	Biosed modified to specify a position so it can be used to edit A
	to L in position 2 (for example) in a single sequence or
	throughout an alignment. Normal use is unchanged. If there is
	demand, the target could be changed from a string to a pattern.

	Clustal sequence format output is now version 1.83 with 60
	bases/residues per line. Previous EMBOSS releases reported it as
	1.4 and printed 50 bases/residues per line.

	The tmap program had an upper limit of 6000 residues and 300
	sequences. All fixed size arrays were made dynamic. The length
	limit was exceeded by one of our users.

	GCG formatted databases were found to have split entries into more
	than 1000 chunks - for example human chromosome 7 in a TPA (third
	party annotation) entry in EMBL. A regular expression is now used
	to check for any number of subsequences in GCG data.

	ajSysStrTok and ajSysStrTokR changed to match the behaviour of the
	C run time library function strtok. Both now keep their internal
	pointer at the first delimiter after the matched token. This only
	changes the result if the delimiter set is changed on the next call.

	Another code cleanup is the addition of Exit functions to all AJAX
	and NUCLEUS source files that could still have static memory
	allocated when a program ends. We aim to clean up memory for all
	the standard memory tests in test/memtest.dat. This includes
	creating a new function acdReset which resets the stats of ACD
	processing so that a new ACD file could, in theory, be read once a
	program has completed. All programs need to call the embExit
	function at the end to call the NUCLEUS and AJAX cleanup
	functions. Some of these functions will also log memory usage
	statistics if debugging is turned on (-debug on the command line).

	We are working through all the library code making standard
	function names. Old function names will be retained at least until
	release 4.0.0. They are marked with the __deprecated flag, which
	causes the gcc compiler to report all uses of the old name. Other
	compilers are not affected. The first set to be processed is in
	ajstr.c (string and character functions).

	Sequence reading from website URLs now defaults to HTTP 1.1, with
	chunked blocks of data. A bug in processing small (single line)
	chunks was fixed.

	Report and alignment output now includes the full commandline used
	to run the program, with any replies to prompts included.

	Excel report format includes a column for Strand to indicate
	sequences on the reverse strand. The strand column is + for a
	forward feature (all protein features are forward) or - for a
	reverse direction feature.

	New sequence type gapstopprotein for proteins with gaps and
	internal stops.

	Translation functions in ajax/ajtranslate.c have been cleaned up.

	New program backtranambig to backtranslate as most ambiguous
	codons.

	Phylip sequence format can now read sets of alignments with blank
	lines in between. Such formats were produced by the new fseqboot
	program and used by the new phylip programs and seqsetall in ACD.

	The list of graph devices produced when an invalid device (or '?')
	is given now lists only the unique devices (those defined
	differently in the plplot library code) with alternative names
	(xwindows for x11, for example) added in brackets. Specifying an
	ambiguous device used to accept the first match found, now an
	error message is given.

	Prettyplot and cons were producing different consensus
	sequences. Comparison of the results showed two problems. Cons was
	missing consensus characters because of an error in calculating
	the plurality (since fixed in prettyplot, but the library function
	used by cons had not been corrected). Prettyplot was missing
	consensus characters for a different reason - prettyplot has a
	"collision detection" feature to skip consensus characters for
	positions where more than one amino acid or base is valid as a
	consensus character. This was turned on by default, when the ACD
	file clearly states it should be turned off. In fixing both bugs
	the two programs will give the same consensus, except for cases
	where collisions occur - in these cases prettyplot may not select
	the same character as cons, where both are equally valid.

	Programs that write sequences need to call ajSeqWriteClose before
	they exit. This forces output from sequence formats that save up
	sequences in memory and write at the end. An example is MSF, which
	has to wait for all sequences in order to calculate the file
	checksum.

	Functions that process directories now skip the '.' and '..'
	directories so that '*' wildcards will work correctly.

	Prettyplot has been revised. A debugging commandline option has
	been removed. String commandline options have been changes to
	array and select types for better validation with the same user
	responses. Colours are now corrected for proteins - in version
	3.0.0 and earlier the colours depended on the column order in the
	matrix. Nucleotide colours follow the ABI base colours used in
	abiview. The examples in the documentation showed no boxes because
	of low sequence weights in the MSF format input data. The weights
	have been updated to give the 'expected' results.

	All programs now store the command line needed to recreate the
	run. The result is logged by the database indexing programs, and
	will be added to other program outputs in a future release. The
	command line includes all non-default responses to prompts by the
	user.

	dbiflat, dbifasta, dbigcg and dbiblast set the system sort to use
	normal "C" sort order. On systems where the locale is set to a
	language other than English, sort can have strange behaviour. In
	particular, the underscore character fails to sort in the correct
	place so that indexing SwissProt/UniProt or RefSeq entries fails
	to put certain entries in the correct sort position for
	retrieval. There is now no need to set LC_ALL=C locally, although
	this is good practice whenever sort is used.

Version 3.0.0 15-jul-2005

	Gap penalty qualifiers were standardized for all programs.

	water, needle and other alignment programs occasionally could
	report suboptimal alignments (off by the gap extension penalty
	score). The reported alignments were correct, but rearranging the
	gaps could give a slightly higher score. Matcher and stretcher use
	different alignment functions and were unaffected.

	Cpgplot no longer has a -shift option to speed processing on long
	sequences. The output was broken. We will restore it if there is
	demand.

	Two new variables added for developers using the MYEMBOSS package
	to write their own EMBOSS programs. EMBOSS_MYEMBOSSROOT (the same
	will work for other EMBASSY packages) points to the location of
	the ACD files for an EMBASSY package which is not installed - as
	would be the case for an ordinary user developing and maintaining
	their own code using MYEMBOSS. This requires the use of embInitP
	rather than embInit to pass the package name - something all
	EMBASSY programs should (and will do). The second variable is
	EMBOSS_ACDUTILROOT and is required so that utilities such as
	acdvalid can also find the ACD files. Utilities acdvalid, acdc,
	acdhelp, acdtable and acdpretty use embInit as they no nothing
	about any package name.

	Sequence sets (seqset and seqsetall) have a new ACD attribute
	"aligned" which is true or false. If true, the sequences will be
	extended with gaps and passed to the application as a full
	alignment. It is assumed that they are already aligned. If false,
	the application needs all sequences in memory but has no need for
	aligned input. The aligned attribute is required (to help ACD
	parsers) so acdvalid will object if it is not found.

	embossdata now requires a filename, or an empty string to search
	for all files. If no filename is given, it will prompt for one
	with a default of an empty string.

	acdvalid now tests the order in which sections appear in the ACD
	file. The order must be: input, required, additional, advanced,
	output. There are already constraints on which ACD data types can
	appear in each section. All existing ACD files passed this test.
	If any external ACD files have a problem the acdvalid tests can be
	revised.

	Sequence format "experiment" is now correctly the Staden package
	experiment file format. The description is taken from the "EX"
	experiment description line. EMBL line types (including features)
	are allowed in this format and are supported if used before the
	sequence. The accuracy values are read and stored (one per base,
	using the highest base value if all 4 bases have individual
	numbers) and written. These values could possibly be passed to
	primer3, for example.

	Staden and GCG input formats can now parse out comments from
	anywhere in the sequence records.

	Nexus and nexusnon output formats now correctly report the
	datatype for protein alignments.

	Documentation of the @data datatype header tags updated on the
	developers webpages.

	Coderet reports the number of CDS, mRNA and translation sequences
	to an output file. Requested for easier tracing of inputs that
	gave no sequences.

	Nbrf (pir) input can now read from an SRSWWW server. The problem
	was that SRS reports an extra ">P1;seqid" header before the
	sequence. Now if there is no sequence, a duplicate header (one
	with the same ID) can be skipped.

	Clustal output format no longer writes in blocks of 10.

	Clustal and other multiple sequence formats were unable to return
	single named sequences. Fixed for all such formats.

	Phylip3 output renamed phylipnon for compatibility with other
	formats. The phylip3 name is retained for back compatibility. The
	header for phylip non-interleaved format is corrected to that
	accepted by phylip 3.6 (no need for YF on the header line, and
	correct number of sequences). Documentation of these formats (for
	seqret and general format documentation) has been updated.

	Programs chips, cusp, prettyseq and showtran used a codon usage
	table as input only to define the genetic code (amino acids for
	each codon) for the table they produce. This is no longer needed
	as a new AjPCod constructor ajCodNewCode can be given a genetic
	code (default 0 to use the standard code) and will set the amino
	acid data.

	The ajCodClear function now clears all data, including the amino
	acid assignments, for use in reading multiple codon usage
	formats. A new function ajCodClearData clears only the data and
	other values, and leaves the amino acid assignments in case other
	applications may make the same assumptions.

	Codon usage input filenames can now be used to set the output
	filename. The codcmp program for example will no longer default to
	"outfile.codcmp" for output. However, this can cause unexpected
	results when a codon usage table and a sequence are read in, so
	codon usage filenames are only used if no other input file (or
	sequence, or feature table, or other input type) has been
	read. This is done by passing a "reset" boolean when setting the
	saved first input file name so that other inputs can overwrite a
	name defined by a codon usage input. A remaining side effect is
	that if the first input is stdin (for example with -filter on)
	then a second input file can now set the default for output. The
	recommendation for anyone developing wrappers is to always
	explicitly set the output filenames if there is a need to know the
	name for a specific output.

	Codon usage tables support multiple formats. All can be read
	automatically. EMBOSS will now, for example, accept native GCG
	codon usage tables including those used by the codonusage and
	transterm databases. The format can be specified for "codon" input
	by a -format qualifier. Outcodon is now used as an ACD datatype
	for writing codon usage tables, and has a -oformat qualifier. A
	new application codcopy can inter-convert the codon usage table
	formats. The default codon usage table format is called "emboss"
	and includes structured comments to identify the species, database
	release, database division, number of CDSs and codons, and GC
	content. These values are calculated of searched for in the text
	within a file for other formats.

	In the emboss.default and .embossrc files the same name can be
	used for variables, databases, and resources. In previous versions
	a single table was used and name clashes could occur. This becomes
	an issue with the increasing use of resource definitions.

	Colours for abiview set to the ABI standard colours.

	Sequence types explicitly set in source code for cons, sixpack and
	backtranseq. GCG format output was showing nucleotide instead of
	protein sequence type.

	Correction to reversed sequence numbering for local alignments
	from water.

Version 2.10.0 03-Jan-2005

	Profile analysis with gprof indicates that the regular expressions
	(and the PCRE library) are very inefficient. Wildcards in regular
	expressions lead to millions of recursive calls to the match
	function. Although they are very readable for code maintenance,
	replaced them for EMBL sequence and feature reading to get about a
	4-fold speedup. Profile analysis will continue up to version 3.0.0

	Feature table updated for nucleotide sequences to
	EMBL/GenBank/DDBJ version 6.2. A few obsoleted qualifiers.

	tranalign now allows for the proteins to have Methionine residues
	at the start which now match a START codon in the corresponding
	nucleic acid sequence.

	diffseq has a new option '-global' which makes it treat the whole
	of the sequences as regions to be aligned, rather than the
	default which looks for the longest region of overlap and only
	reports differences within that overlapping region.  This new
	option is useful when looking at protein and mRNA sequences
	which are expected to align over their whole length.

	Alignment output issues resolved. Specifying begin and end of
	input sequences now works for all alignment formats. Markx formats
	have been rewritten as the original code we used has nasty
	dependencies on global variables which we struggled to reproduce
	for all cases. The rewritten code is much simpler. Note that the
	gap penalty reported by markx10 format is the EMBOSS
	penalty. Markx10 as used in the FASTA package subtracts the gap
	extension penalty from the gap penalty ... and adds it back when
	calculating.

	transeq failed to check sequence ranges in list files
	correctly. It was only using the range from the first sequence if
	the USA included a start and end. The range is now reset for each
	sequence.

	remap (and other programs that display translations) had problems
	with masking ORFs (using strange characters instead of '0'),
	caused by bad calls to an AJAX function.

	Entrez added as an access method. Sequence format must be
	genbank. Server URL is hard-coded at NCBI (for now). Works by
	finding GIs GenInfo Identifiers) that match the query, and then
	retrieving them one at a time. This is still a prototype - more work is
	needed. Note that apparently Entrez cannot retrieve by LOCUS (id).

	Seqhound added as an access method. Sequence format must be
	genbank. Needs a URL to find the server. Works by finding GIs
	(GenInfo Identifiers) that match the query, and then retrieving
	them one at a time. This is still a prototype - more work is
	needed. Some Entrez error conditions are less graceful in
	SeqHound. Des and Key searches are turned off until SeqHound adds
	indexing for these. Org searches work, but require the numeric
	taxon ID. This is not friendly, so we are looking for a way to get
	the taxid from the species or genus.

	Direct access databases now support exclude wildcards. The syntax
	is as for emblcd indexing, but only files listed in filename are
	included.

	Database names must be letters, numbers and underscores
	only. Reading emboss.default and .embossrc now generates a warning
	message for any bad database name. Bad names were ignored by USA
	processing, leading to confusing results.

	seqretsplit has a new -feature option (as for seqret)

	noreturn can write files for PC or Mac file systems using a new
	-system qualifier.

	FASTA format sequence files with a sequence ID starting P1; were
	assumed to be PIR format. These can now be read as FASTA, assuming
	that PIR format has already been tested for.

	Sequences with zero length were accepted. Sequences must now have a
	length of at least 1. Some user scripts could create FASTA format
	files with no sequence, or with the sequence on the ID line. These
	can crash many programs, including a core dump from clustalw
	(through emma).

	Added a calculated attribute "haslengths" to (phylogenetic) tree
	input in ACD for use in phylipnew interfaces

	Wossname and seealso have a new commandline option -showembassy
	which defines one embassy package to be shown. The main use is in
	finding applications when automatically building the
	documentation, but end users and interface builders may find some
	uses for this option too.

	Added an "embassy" string attribute to the application in ACD so
	that wossname can find whether an application is in EMBASSY or
	not. Wossname was depending on the source directory, but could not
	distinguish between EMBOSS and EMBASSY ACD files once they were
	installed.

	The EFUNC and EDATA databases have been enhanced to provide better
	views and links within SRS. The new versions are available at both
	HGMP and EBI. In future, EBI will probably become the sole site
	(as HGMP/RFCGR is closing in 2005).

	The official EMBOSS website has moved to emboss.sourceforge.net
	which includes redefining links in applications and major
	modifications to the scripts which maintain the application web
	pages. The sourceforge web pages are now committed to CVS under
	doc/sourceforge. The pages on sourceforge itself can only be
	modified by registering at sourceforge and joining the emboss
	project.

Version 2.9.0 15-Jul-2004

	ajListMapRead and ajListstrMapRead functions for read-only lists.
	As an added check, the functions these call for each element have
	a different prototype.

	ajStrStr function now returns const, as do various 'Get' functions.
	The few cases where a true char* is needed must now call
	ajStrStrMod with the AjPStr passed by reference so that we can
	check it is being modified. All calls to ajStrStr in EMBOSS and
	most EMBASSY packages have been resolved to compiler remove
	warning messages. ajStrFix also needs the AjPStr passed by
	reference.

	tfm -html now gives full path to image files.

	Remove need for the definition of PLPLOT_LIB.

	Add configuration for cygwin dlls.

	Allow filenames of the form drive:/filename for cygwin.

	Fixes for list files with sequence ranges in the USAs. The
	sequence input object is now reset during list processing.

	Sequence sets with begin and end positions are now automatically
	trimmed on input. This applies for example to list input with
	ranges in the USAs for programs such as polydot which were
	previously reporting the entire sequence.

	graph output now has the default title including the date in
	dd-mmm-yy format instead of the unreadable dd/mm/yy format.

	Align output for seqmatchall (like wordmatch). The algorithm is
	not maintaining the sequence accession and description
	information. They may be restored in a future update.

        infoalign now also displays the weight of the sequences in the
        alignment.  This can be turned off using '-noweight'.

	New output types in ACD for all input data types, including those
	for phylogenetics and protein structure data. Initially these are
	a new AjPOutfile type with a defined format (fixed until any of
	them has a choice).

	Programs that produce graphics or text (outfile) output now
	by default will not create the outfile if there is a graph (done
	by setting the nullok attribute of the outfile).

	Acdvalid now checks for incomplete ACD types and attributes.

        trimest now has the option '-toplower' which changes the
        poly-A tail to lower-case instead of cutting it off.

	new ACD attribute 'relation' added to all ACD types. This will
	hold some information about how output data types relate to inputs
	and parameters. The syntax of the string is not yet clear. Running
	of EMBOSS programs will not be affected - the relation string is
	defined for web services and related wrappers to maintain
	provenance better.

	New ACD function oneof added, syntax is @($(var)=={a,b,c}) to test
	for a choice of menu options. Intended to clean up some ACD files
	- but they are already clean so it may not be useful. At some
	stage the unused ACD functions should be declared obsolete for
	simplicity (and efficiency). We will leave the code in place, but
	remove them from the list of functions tested.

	acdvalid now tests the knowntype attribute for strings. ACD files
	have been cleanup up to give knowntypes for all strings (defined
	in knowntypes.standard) or to convert strings to datafile or other
	ACD types as appropriate.

        showfeat now has the qualifier '-annotation'. This allows you to
        add your own brief annotations of regions on the displayed
        figure.

        remap now has has the option '-frame' which allows you to specify
        a list of the frames to be translated and displayed.

	Major cleanup of @data documentation. Added @datatype for typedef
	data types (e.g. AjBool). Checking all have attributes, and all
	attribute names and types match. Comments in the code are moved to
	the @attr documentation. Added an @cc documentation line for
	comments.

	Eprimer3 has been changed so that it runs a separate child process
	of primer3_core for every sequence. This is to cure a problem
	seen when more than about 23 sequences were input, in which there
	was some blocking contention between the input and output streams.

	Major cleanup of ACD files to match acdvalid standards. Featout
	qualifiers are now -outfeat, which means all output start with
	-out but it does clash with -outfile so -outf is not always usable
	as an abbreviation.

	Options for emma have been cleaned up. -insist is no longer used
	(use -sprotein instead) and -slowfast is now a simple boolean
	-slow. Both changed lead to a much cleaner ACD file.

	Options for eprimer3 have been cleaned up. New options -primer
	(true) and -hybridprobe (false) make the dependencies far
	simpler. The default task is now 1 (same as the old zero) and the
	-hybridprobe option is needed to calculate the hybridization
	probes. This removes a lot of dependencies on tasks 1 and 4
	(hybridprobe) and not-task-4 (primer)

	New AjPDir to hold directory path and default extension. Intended
	for domainatrix applications. This requires changing
	ajAcdGetDirectory to return an AjPDir and providing
	ajAcdGetDirectoryName to return the path as a string. Several
	programs were changed to reflect this changed call.

	New ACD type outdirectory for a directory to which files will be
	written. Must have a knowntype describing the files that will
	appear there. Expected qualifier name is -outdir.

	compseq now has the option '-calcfreq'. This makes it calculate
	the expected frequencies of the words in the sequences from the
	observed frequencies of the single bases or residues in those
	frequencies.

	HTML data from remote sites is becoming more complex. EMBOSS now
	makes a first pass to look for a single preformatted block and
	accepts this as the data (thus avoiding horrors such as the Entrez
	headers and javascript which NCBI's search service includes).
	At the same time, an old fix to patch SRS 6.1.0 output has been
	removed as this clashed with the new code.

	Optional outputs have a new behaviour. With nulldefault defined,
	an output is, by default, turned off and will return a NULL value
	to the calling program if nullok is set. Setting the value to ""
	on the command line will now ask for the standard filename to be
	generated. The "missing" attribute, if defined, allows simply
	-qualname on the commandline to request the default filename,
	although care must be taken to avoid anything following the
	qualifier appearing to be a filename. This means the qualifier
	must be last on the commandline, or must be followed by another
	qualifier.

	Indexing programs dbifasta and dbiflat no longer store the source
	directory in the division.lkp file - directory is specified in the
	database definition. This was only done originally to share index
	files with "efetch" at the Sanger Centre. With index files and
	data files in the same directory (as for efetch) it is not needed.

	All ACD files revised for new acdvalid checks.

	New ACD section "additional" added for qualifiers with
	additional:"Y" defined. These have been put in the "advanced"
	section until now. Acdvalid checks that these qualifiers are in
	the appropriate section.

	Acdvalid now checks that qualifiers are in the expected
	section. All input qualifiers (including cfile and datafile) are
	now in the input section, all output qualifiers are in the output
	section. All (remaining) standard, additional and advanced
	qualifiers are in the "required" "additional" and "advanced"
	sections.

	New ACD type "toggle" added. This is the same as "boolean" but is
	allowed in any section by "acdvalid" checks. Toggle is to be used
	for ACD qualifiers that "toggle" (turn on or off) other
	qualifiers. An example in many ACD files would be "-plot".

	Cirdna and lindna now dynamically allocate memory. For simplicity
	they do still have an upper limit for the number of groups and
	labels per group, but no longer have static arrays.

Version 2.8.0 30-Nov-2003

	tfm accepts the PAGER environment variable. It can be overridden
	by EMBOSS_PAGER.

	Fix for HTTP 1.1 lines for MacOSX added (Cedric Rossi).

	The home directory ~/.embossrc file can be turned off with
	"setenv EMBOSS_RCHOME N" This was added for cleaner QA tests
	but may have other uses.

	Report format output added (by Henrikki Almusa) for dreg, preg,
	recoder and silent.

	pestfind renamed to epestfind and handling of terminal water
	residue adjusted.

	Align formats: Added "tcoffee" as a valid -aformat which writes a
	T-Coffee library file suitable for input as -in=Lfilename to
	T-Coffee.

	Pepstats: added molar extinction coefficient and extinction
	coefficient at 1mg/ml for A280.

	Nexus format sequence input added, with new functions to parse all
	standard nexus files. Later releases will accept nexus format for
	other input data.

	Jackknifer, Mega, Treecon Mase and Fitch formats parsed, at least in
	their EMBOSS output forms.

	Underscores are allowed in accession numbers and sequence versions
	to handle REFSEQ fasta format entries.

	New function ajRegPre returns the original string before the
	regular expression match.

	New function ajStrArrayDel deletes a string array.

	New functions ajListstrToArrayApp appends strings in a list to the
	end of a string array.

	Sequence input changes: Allow '?' as a valid character (it has
	been seen in phylip sequences) for 'unknown' and convert to X for
	protein (or any) and 'N' for nucleotide. Note that this can give
	an X or N depending on whether the program accepts nucleotide only
	or any sequence. We may find a cleaner fix, but it would depend on
	knowing the sequence type.

	Added binding factor output to tfscan plus option to specify a
	custom data file

	Removed the Henry Spencer regular expression libraries. There were
	a few calls to the ajPosReg functions, but only to test it worked
	the same way as ajReg. Added a case-insensitive ajRegComp and
	ajRegCompC (which the ajPosReg functions had) using
	PCRE. Farewell, Henry. You were a great servant to EMBOSS.

	Water S-W alignment program no longer truncates some matches

	Vector arithmetic added to ajax library.

	Compilation now uses large file handling by default. To disable use
	--disable-large when configuring. An effect is to make the default
	size of ajlongs 64 bits.

	Pepstats modified to allow multiple sequences

	Major (well, obvious impact on ACD authors) ACD change - the
	"required" attribute is renamed "standard" and the "optional"
	attribute is renamed "additional". They have exactly the same
	functions as before. The change is to (hopefully) make their
	meaning more obvious to those developing ACD parsers and wrappers
	for EMBOSS. ACD attribute "standardtype" clashed with "standard"
	and is renamed "knowntype".

	ACD attributes have been added for applications and for all ACD
	types to make wrappers easier to control. These new attributes are
	specifically for SoapLab from EBI, and need not have any impact on
	other wrappers (SoapLab uses ACD to define non-EMBOSS applications
	and needs extra attributes to define some additional properties).

	pepinfo now writes to a file with a standard output filename of
	(sequenceid).pepinfio instead of pepinfo.out

	Completed the standardization of ACD definitions, using "acdvalid"
	to remove all errors and allowing only selected and hard to avoid
	warnings to remain. The warnings are for calculated "required" or
	"optional" definitions (simple true/false relations to another
	boolean are accepted). In particular: all essential inputs and
	outputs are parameters, with standardtype defined. Non-essential
	inputs and outputs have the nullok attribute set. Information
	strings are defined only where there is no standard prompt.

	The definition of AjPStr and other "pointers to structs" is
	causing strange problems in specifying "const" for structs that
	are unchanged by function calls. In summary, it appears (for all
	compilers we tried) that "const" only knows it is for a pointer if
	it can see the "*" in the type. This means, for example, that
	"const AjPStr" failed but "const AjOStr*" worked. With "const" if
	it knows it is a pointer, it makes the data structure
	constant. Otherwise it makes the pointer itself constant, the
	equivalent of "AjOStr* const". We fixed this by changing AjPStr to
	be a #define of AjOStr*. This has the advantages that most code is
	unaffected and that const now works as expected. The only code
	changes we needed are lines with multiple AjPStr definitions
	(which is anyway deprecated), for example "AjPStr astr, bstr"
	which clearly fail when you think about the #define (astr is an
	AjPStr, but bstr is now an AjOStr and will give strange compiler
	errors). We may change this again to define a separate const data
	type for each struct, but probably the #define is a good solution
	and we expect to stay with it.

	PCRE is now the library of choice for regular expressions. This
	allows the full Perl regular expression syntax, and was very easy
	to integrate. Regular expressions are used internally for parsing
	and for manipulating strings such as file and directory names, and
	also for matching by programs such as dreg and preg.

	The previous Henry Spencer library functions are renamed from
	ajReg to ajHsReg. The Posix version of the Henry Spencer library
	remains available as ajPosReg but may be removed as it was not
	used by the EMBOSS distribution, and PCRE can provide the same or
	higher functionality.

	acdpretty now writes the name of the output file to standard
	output. For example "Created seqret.acdpretty".

	The ACD qualifiers -acdpretty -acdtable and -acdlog are
	removed. Programs acdpretty and acdtable do the first two tasks
	(in the same way as before). To turn on the acdlog file, use
	environment variable EMBOSS_ACDLOG.

	Graphs can now use "-graph data" to produce files compatible with
	the Staden package's spin2 and spin GUIs. This makes some ACD
	options obsolete, especially the various -data and -outfile
	combinations. Banana already wrote an output file which caused
	some confusion in these options. The outfile and the graph are
	both produced by default, but have the nullok attribute and can be
	turned off with -nooutfile or -nograph on the command line.

	graph and xygraph output can now be optional - the ACD files can
	have a nullok: "Y" attribute which allows -nograph on the command
	line.

	In ACD files alternatives for protein and nucleotide input are
	common. Added an automatic variable $(acdprotein) which is defined
	as the calculated ".protein" attribute of the first input
	sequence(s). The value will be "Y" or "N". Acdvalid will check
	that this is how proteins are tested, so the original
	"$(asequence.protein)" syntax will become obsolete. The intention is
	that any wrappers can use this to make protein and nucleotide
	versions of the ACD file, and in general to use only simple
	boolean tests in calculated ACD values.

	Added wait call to wait for a piped command to complete
	before reading data (needed for listfile input with
	many piped reads, for example getz calls from SRS databases.

Version 2.7.1 03-jun-2003

	Corrected Jemboss for displaying emma & prettyplot forms

	Corrected display of recognition sequence for restrict -solofragment

Version 2.7.0 01-jun-2003

	Standardtype attribute added for filelist in ACD

	Datafile for mwfilter changed from string to datafile ACD type.

	A new test application acdvalid will check for deprecated ACD
	syntax and report errors for something that should be fixed, or
	warnings for something still to be clearly defined. None of these
	"errors" will stop an ACD file from working correctly, but they do
	cause confusion to the authors and maintainers of wrappers, GUIs,
	and so on.

	Sequence types are extended to include new types for programs that
	can handle selenocysteine.

	Sequence types are simplified so that input can be converted to
	the specified type. Gaps can be removed, and unsupported
	characters can be converted to X for protein or N for nucleotide.
	A few applications may be unable to handle any ambiguity
	(pureprotein, puredna, etc.) and will require correct input. To
	make it safe to run a program over (for example) swissprot or
	embl, such programs should read single sequences only, or be
	converted to support ambiguity codes. This may take a little
	time. banana, octanol and pepwindow already read single sequences.
	In need of attention are hmoment and iep.

	In ACD files a new application attribute "external" is added where
	a third-party tool is needed. examples include clustalw (emma) and
	primer3_core (eprimer3 and primers).

	ACD definitions for feature and featout now have a "type"
	attribute. The feature output type defaults to the sequence type,
	as for sequence output. Feature types are "protein" or
	"nucleotide" or "any".

	ACD sections now have "information" instead of merely "info" for
	consistency.

	Boundary fix for ajStrMask

	Tightened up on reporting of isoschizomer groups in 'showseq -limit'
	and 'remap -limit'.

        Added embPatRestrictPreferred.

	Added -individual option to RESTRICT. This gives the fragment
	lengths produced by restriction assuming only each named RE
	of the set that can cut the sequence is used. Results are
	added to the tail section of the report.

	Added a -equivalences option (on by default) to rebaseextract.
	This option calculates an embossre.equ file using RE
	prototypes in the withrefm file.

	A guide to the EMBASSY package domainatrix (domainatrix.doc)
	has been added to /emboss/emboss/doc/manuals

	Extractfeat now has the -describe qualifier to allow it to add
	the value of selected tags to the Description line of the output
	sequence.

	Revseq can now read in gapped nucleic acid sequences.

	Removed old corba code in preparation for adding corba server as
	an embassy package.

	Simplified error messages for sequence reading, and corrected
	handling of a bad USA as the first in a list file.

	Padded temporary filename for emma to avoid clustalw bug with
	short input filename (this will not work in all cases and
	a corrected clustalw should be used nevertheless).

	-help output modified to align all the qualifiers

	acdpretty output revised to resolve to full names

	Complete overhaul of all ACD error conditions. Parsing and command
	line validation messages are now all used, and all tested in the
	qatest suite. These tests used bad ACD files in the test/acd
	directory.

	whichdb failed to report error messages. They are now turned on -
	and most of the common errors are reported with less verbosity.

	TCODE application added. Calculates the TESTCODE statistic.

	Eprimer3 now reports the primer positions using the coordinates
	of the original sequence when -sbegin and -send are used to
	specify a sub-sequence to consider.  The input ranges, such as
	the -exclude and -target ranges are always given using the
	positions from the original sequence.

	tfm looks for documentation in EMBOSS_DOCROOT (an environment
	variable, or defined in emboss.default), then in the install
	directory, and finally the original build directory.

	In some cases, EMBOSS programs could terminate with an exit status
	of 255 (-1). Terminating with "Die:" message exists with status 1.
	All exit calls now use either 0 (success) or the standard
	library EXIT_FAILURE value (usually 1).

	All report output fields have a new attribute (and qualifier)
	rscoreshow which defaults to "Y". Setting rscoreshow: "N" will
	remove the score from the output, except for GFF where it is
	required, and SRS format where it can be kept for use in standard
	parsers. The aim is to exclude the score value from applications
	that have no scoring method (restrict for example). For these,
	putting -rscore on the command line will override the ACD file and
	display the score.

	Showseq and showfeat both now have the qualifier '-stricttags'.
	By default if any tag/value pair in a feature matches the
	specified tag and value, then all the tags/value pairs of that
	feature will be displayed.  If '-stricttags' is set to be true,
	then only those tag/value pairs in a feature that match the
	specified tag and value will be displayed.

	Megamerger now has the qualifier '-prefer' which makes it use
	the first sequence to create the merged sequence whenever there is a
	mismatch between the two sequences.

	Sirna now has the qualifier '-context' which writes the first
	two bases (in brackets) of the 23 base target region.

	Maskseq and maskfeat now both have the qualifier '-tolower'
	which will change the masked regions to lower-case characters
	instead of replacing them with a mask character.

	ACD parsing internals are rewritten to find and report errors more
	cleanly and to make the syntax stricter for other ACD parsers used
	by (for example) GUI developers.

	Sequence output types now have a 'type:' attribute which defaults
	to the type of the first input sequence. For most applications
	this is good enough as a default. For those which add gaps or
	translate DNA to protein (or vice versa) a 'type:' attribute will
	be needed. This is to improve support for automated workflow
	building by more strongly typing input and output data.

	acdpretty now wraps long lines of ACD definitions, splitting at
	any lone backslash (which defines a newline for -help output) or
	at whitespace. Attributes and sections are indented by 2 spaces.

	Until now, the ACD file syntax has allowed name=value syntax and
	the use of {} () and even <> for quoted strings just in case they
	needed both ' and " characters. These are now removed. We believe
	no ACD files were using this syntax.

	valgrind.pl is a new addition to the script directory that runs
	valgrind memory leak tests under linux. the tests are a copy of
	those in purify.pl - they may one day move to a separate file.

	EMBOSS feature output now copies (where available) the name of the
	input sequence as the filename, so filenames match more closely to
	the sequence output. For example, "seqret -feat tembl:paamir" will
	now create 2 files called paamir.fasta and paamir.gff where the
	feature file previously was called 'unknown.gff'

	EMBOSS feature output defaults (as before) to GFF format, but the
	default format can now be set by variable EMBOSS_OUTFEATFORMAT

	All EMBOSS output files now have a default output directory
	(required by some webservices implementations that run in
	the 'wrong' default directory). Variable EMBOSS_OUTDIRECTORY
	if set becomes the default output directory for outfile, align,
	report, graph, sequence and feature output.

	The output directory can also be set from the command line (or as
	an ACD attribute) using the associated qualifier -odirectory
	(outfile), -rdirectory (report) -adirectory (align) -gdirectory
	(Graph and graphxy) -osdirectory (sequence) or -ofdirectory
	(featout).

	The "g*"" attributes for graph and graphxy in ACD have been deleted as
	they have the same name (and function) as existing associated
	qualifiers - and can still be used with these names in ACD files.
	Duplicate ACD attribute and associated qualifier functions exist
	in many ACD types, but usually have different names and so are
	left for compatibility purposes.

	emboss.default and ~/.embossrc configuration files now have
	extensive error messages reporting filename and line number.
	showdb has additional validation for all database definitions.
	Environment variable EMBOSS_NAMVALID (boolean) turns this on for
	all programs.

	ajnam.c has debugging turned on by environment variable
	EMBOSS_NAMDEBUG (boolean). This processing (of emboss.default and
	~/.embossrc) happens before command line option -debug has taken
	effect. The output goes to standard error.

	Function ajFmtVPrintS is a previously missing complement to ajFmtPrintS

	EMBL/Genbank feature tables updated to FTv5.0

	SwissProt feature table '<' '>' and '?' location modifiers are
	now handled correctly.

	Added new applications acdlog, acdpretty and acdtable. Run like
	acdc they provide the same functions as the command line options
	-acdlog -acdpretty and "-acdtable -help" These -acd options are
	now obsolete and will be removed in a future release to clean up
	the ACD interface.

	Transeq now has the option '-clean' that converts all '*'
	characters to 'X's.  This may be useful because not all programs
	accept protein sequences containing '*' characters.

Version 2.6.0 20-Sep-2002

	Showdb now can display the presence of any of the extra sv, des,
	org, and key search fields that can be used to index and search in
	databases.

	Added twofeat - Finds neighbouring pairs of features in sequences.

	Extractfeat - added option (-featinname) to include the name of
	the feature as part of the ID name of the sequence that is
	written out.

	Added sirna - designs siRNA probes in mRNA.

	Sigcleave sorts results highest score first.

	Helixturnhelix sorts results highest score first and reports the
	score position as an integer.

	Added pestfind.

	Moved the following programs into the "domainatrix" embassy
	package:
	 contacts, domainer, fraggle, hetparse, hmmgen, interface,
	 pdbparse, pdbtosp, profgen, scopalign, scopnr, scopparse,
	 scoprep, scopreso, scopseqs, seqalign, seqnr, seqsearch,
	 seqsort, seqwords, siggen, sigplot, sigscan

	Palindrome no longer reports palindromes that are only composed
	of N's.

	Msbar can now check that the result doesn't match a set of
	input other sequences.  For example you could specify that it
	doesn't match the input sequence or a set of previously produced
	mutation results.

	Getorf reporting of circular genome positions tidied up - it now
	reports positions starting in the range 1 to the sequence length
	and indicates if the ORF goes through the breakpoint.  A clear
	indication of when ORFs are in the reverse sense has been added.

	Pasteseq now behaves correctly when -sask2, -sbegin2 or -send2
	are used.

Version 2.5.1 12-Aug-2002

	Whichdb new option -showall to see which databases are being
	searched for use where searches hang. The order of searching is
	undefined - it depends on the order in which databases are
	returned from the internal table, which is unrelated to the order
	in which they were defined.

	Wordmatch alignments save the entire sequence but use part only.
	Fixed all alignment formats to work with these by adding a
	SubOffset attribute.

	Duplicate IDs fix. The database indexing programs skipped
	duplicate IDs but did not reset the size of the entryname index
	file so some queries could fail to find the later IDs in the
	databases. Duplicate IDs are illegal for -nosystemsort (no easy
	way to correct because entry numbers are stored internally). For
	the default case duplicate IDs are merged even if they are
	different. REFSEQ is the main problem area.

	Writing data files used EMBOSS_DATA, or by default the install
	directory. Earlier versions, if not installed, could write to the
	source tree emboss/data directory. Fixed to continue if there is
	no install data directory, and to check EMBOSS_DATA (if defined) is
	a real directory.

	Sigcleave options pval and nval hardcoded. They depend on the
	weight matrix size - which is hardcoded as 15 in the ACD file and
	is not checked in the program. They were introduced in EGCG in
	1988 but never used because no other weight matrix length was
	tried.

Version 2.5.0 25-may-2002

	"fasta" format now uses the "ncbi" parser, so both formats report
	"fasta" as the format. "pearson" is the old "fasta" format for a few
	cases (empty IDs for example) there ncbi parsing fails completely.

	SPLITTER changed to match documentation. Old behaviour is
	now selectable by using the -addoverlap command line
	option.

	Configuration modifications. --without-x works. Removed odd
	but harmless -I definitions. PNG detection improved.

	Corrected EMBLCD index searching for queries that start with a
	wildcard. For example, tembl-key:?* should search for all entries
	that have a keyword (key:* is regarded as 'all entries'). Entries
	with no keyword (in PIR's pir4.ref file for example) will be
	ignored.

	Updated source code docs for EFUNC and EDATA. Corrected all bad
	headers. efunc.out has no errors. efunc.check only reports
	'missing headers' for duplicated function names (#ifdef code)
	which is a known 'feature'.

	Updated source code to fix most lines over 80 bytes.

	Calculated ACD attributes now QA tested. Feature attributes will
	be correctly set, although none are used in the ACD files at present.

	purify.pl has a new option -block=n where n is a number from 1
	upwards.  1 runs the first 10 tests, 2 runs the next 10
	(blocksize=10 is hardcoded for now).

	Cleaned up string position code. Inspections showed ajStrPos and
	related functions gave results from 0 to length of a string. This
	caused confusion in many other functions and applications. These
	functions are now static strPos functions because only ajstr.c had
	calls to them (though the ajStrPos versions are still available).
	All calls were checked for positions out of range. As a result,
	many calls to ajStrAssSub and AjStrCut were fixed. ajStrInsertC
	requires a value from 0 to length (start position to insert can be
	before or after the string, or any position in between). Fixed by
	passing length+1 to strPosII.

	Added a functions ajUtilCatch for use in debugging with gdb. When
	a nasty special case occurs, call ajUtilCatch and make it a
	breakpoint in gdb. The resulting backtrace will give the call stack
	and all variable values.

	Cleaned up code for chunk HTML input. Added a new variable
	EMBOSS_HTTPVERSION which defaults to 1.0 (so HTTP is not chunked)
	and a DB attribute httpversion. This must be a floating point
	number, and is included in the HTTP header to specify the HTTP
	protocol version to be used. There is no check in the code to
	change behaviour for different versions. This is used in the
	SRSWWW and URL access methods.

	Added check to qatest.pl to report any EMBOSS (rather than
	EMBASSY) applications for which there is no defined test. The
	EMBASSY test uses wossname results, checked against the names of
	ACD files in the source tree, as qatest always runs in the test/qa
	directory.

	Allowed sequences as values for EMBL rpt_unit feature qualifiers
	because so many entries have them. They are illegal according to
	the Version 4.0 (current) feature table document.

	Allow ? before from and to feature locations in SwissProt. For
	now, these are ignored, though we could add something to hold them
	for accurate output.

	Added modified Harrison solubility probability to PEPSTATS

	ACD attributes now have descriptions in the ajacd.c code which are
	reported by 'entrails'. All ACD attributes have been checked by
	inspection of the code to note those which are used/unused by ACD.
	The ACD "type" attribute for files is renamed "standardtype" to
	reflect its intended use to note standard file types for linking
	applications. Sequences and alignments still have a "type"
	attribute for protein or dna sequence types.

	Aaindexextract (new) reads the AAINDEX database and writes each
	entry to data/AAINDEX directory. New function ajFileDataDirNew to
	read data files from a named directory. New ACD datafile attribute
	'directory' passed to ajFileDataDirNew. AAINDEX directory defined
	for pepwindow and pepwindowall.

	Palindrome can now read in multiple sequences

	Palindrome now does not print a '|' in an alignment where there
	is a mismatched pair of bases.

	Added filelist datatype to ACD

	Mwcontam program added. Displays molecular weights that are common
	across a set of files.

	Showfeat - added '-sort join' to display joined features on one line.

	Diffseq - don't give summary of SNPs if the sequences are proteins.

	Inclusion of stat64 and readdir64 for offsetbits=64 (ajfile.c
	and ajsys.c)

	Workaround for broken Solaris readdir64_r (jembossctl)

	Infoseq can now optionally display GI and Sequence Version numbers.

	Notseq can now read in a file of sequence names.

	Added '-alternative' qualifier to transeq to allow reverse frame
	translations to be done using the codons counted from the start
	of the reversed sequence, rather than, by default, using the
	codons of the corresponding forward frame.

	Added the qualifier '-join' to the program extractfeat.
	If '-join' is set then joined features, such as 'CDS' and 'mRNA'
	are output as a single concatenated sequence.

	Changed the default output filename from 'stdout' to a file for
	the following:
	    infoalign
	    megamerger
	    merger
	    showalign
	    showfeat
	    showseq
	    textsearch

	Lindna/cirdna can now draw filled boxes and the user can change the
 	text size on the command-line. They can also read and display
 	complete genomic sequences.

	Major new revision of protein structure applications - w/o full
 	documentation.

	New applications have been added:
	     pdbparse.c / acd
	     scopseqs.c / acd
	     scopnr.c / acd
	     seqsearch.c / acd
	     seqwords.c / acd
	     seqalign.c / acd
	     hetparse.c / acd
	     scopreso.c / acd
	     scoprep.c / acd
	     profgen.c / acd
	     funky.c / acd
	     hmmgen.c / acd
	     fraggle.c / acd

	Some applications have been deleted:
	     scope.c / acd
	     nrscope.c / acd
	     psiblasts.c / acd
	     swissparse.c / acd
	     alignwrap.c / acd
	     dichet.c / acd

	The deleted applications have been replaced as follows:
	     coordenew  --> pdbparse (coordnew was deleted a while back)
	     scope --> scopparse
	     nrscope --> scopnr
	     psiblasts --> seqsearch
	     swissparse --> seqwords
	     alignwrap  --> seqalign

	New versions of code have been committed:
	     pdbparse.c / acd
	     domainer.c / acd
	     contacts.c / acd
	     interface.c / acd
	     pdbtosp.c / acd
	     scopparse.c / acd
	     scopreso.c / acd
	     scopseqs.c / acd
	     scopnr.c / acd
	     scoprep.c / acd
	     scopalign.c / acd
	     seqsearch.c / acd
	     seqwords.c / acd
	     seqsort.c / acd
	     seqnr.c / acd
	     seqalign.c / acd
	     siggen.c / acd
	     sigscan.c / acd
	     sigplot.c / acd
	     hetparse.c / acd
	     profgen.c / acd
	     funky.c / acd
	     hmmgen.c / acd
	Plus
	     ajxyz.c / ajxyz.h

	Short summaries of the applications are as follows:
	     pdbparse - Parses pdb files and writes cleaned-up protein
			coordinate files.
	     domainer - Reads protein coordinate files and writes
			domains coordinate files.
	     contacts - Reads coordinate files and writes files of
			intra-chain residue-residue contact data.
	     interface- Reads coordinate files and writes files of
			inter-chain residue-residue contact data.
	     pdbtosp  - Convert raw swissprot:pdb equivalence file to
			embl-like format.
	     scopparse- Converts raw scop classification files to a
			file in embl-like format.
	     scopreso - Removes low resolution domains from a scop
			classification file.
	     scopseqs - Adds pdb and swissprot sequence records to a
			scop classification file.
	     scopnr   - Removes redundant domains from a scop
			classification file.
	     scoprep  - Reorder scop classification file so that the
			representative structure of each family is
			given first.
	     scopalign- Generate alignments for families in a scop
			classification file by using STAMP.
	     seqsearch- Generate files of hits for families in a scop
			classification file by using PSI-BLAST with
			seed alignments.
	     seqwords - Generate files of hits for scop families by
			searching swissprot with keywords.
	     seqsort  - Reads multiple files of hits and writes a
			non-ambiguous file of hits (scop families file)
			plus a validation file.
	     seqnr    - Removes redundant hits from a scop families file.
	     seqalign - Generate extended alignments for families in
			a scop families file by using CLUSTALW with seed
			alignments.
	     siggen   - Generates a sparse protein signature from an
			alignment and residue contact data.
	     sigscan  - Scans a signature against swissprot and writes
			a signature hits files.
	     sigplot  - Reads a signature hits file and validation file
			and generates gnuplot data files of signature
			performance.
	     profgen  - Generates various profiles for each alignment
			in a directory.
	     hmmgen   - Generates a hidden Markov model for each alignment
			in a directory.
	     hetparse - Converts raw dictionary of heterogen groups to
			a file in embl-like format.
	     funky    -	Reads clean coordinate files and writes file
			of protein-heterogen contact data.

	Updated "make check" program entrails. Corrected sequence format
	reports, added report and alignment formats and database access
	methods.

	Added scripts/logreport1.pl to report EMBOSS usage from the
	logfile. Takes the logfile name on the command line. Reports
	total use, most active user, and total user count.

	Extractseq now only reads one sequence as input.

Version 2.4.1 14-may-2002

	Fixed error reading multiple databases

	Fixed MacOSX reading of incomplete sequence files

	Fixed indexing of REFSEQ

Version 2.4.0 11-Apr-2002

	New Jemboss authorizing server code. This uses a new set-uid
	program (jembossctl) to perform tasks as the user.

	New alignment output format "match" for wordmatch, reports the
	length, sequence names, and range in each sequence.

	emboss.default.template has been changed to include the new SRSWWW
	access method and the fields definitions for the test databases.

	In dbiblast, renamed the -filename option -filenames to match the
	other dbi indexing programs, and because wildcard filenames are
	supported.

	Removed the -staden option for the dbi indexing programs. This had
	no effect (it was originally included to rename files as
	division.lookup for use by internal utilities at the Sanger
	Centre).

	In qatest.pl test script, added test for missing expected file.
	Only seen for obsolete secondary output files, no tests were passing
	that should have failed.

	Script (scripts/dbilist.pl) to report the contents of EMBLCD
	database indices created by dbiflat, dbigcg, dbifasta or dbiblast.

	Proxy HTTP access for remote servers. Define EMBOSS_PROXY as an
	environment variable, or in emboss.defaults. Can also be set for
	any database as proxy: "hostname:port" or overridden with
	proxy: ":" to use a local server for a database. This is used by
	both the URL and SRSWWW access methods.

	New ajListUnique function to remove duplicate nodes in a list.

	New embxyz.c / .h embXyzSeqsetNRRange functions added

	Report format "table" is the default for several applications. In
	this format, the sequence USA has been removed because it already
	appears in the sequence header part of the report. A new format
	"-rformat nametable" will produce the previous report output for
	users who are relying on parsing it.

	Output files defined with the "nullok" attribute in ACD are not
	created unless requested. The file name and extension are ignored.
	It is possible to add a new associated qualifier to control this
	behaviour, but its use may be confusing with more than one output
	file.

	Precision attribute for report score (default is 3). Other
	floating point report values are written as strings by the
	original application so their precision is defined in the
	code. The score is a float, as part of the internal (GFF) feature
	structure.  A zero value produces an integer score (strictly, it
	uses %.0f as the format). Set precision for etandem, fuzznuc,
	fuzzpro, fuzztran, patmatdb, patmatmotifs (integer scores) and
	restrict (no score)

	Report output for equicktandem and etandem, with -origfile to
	write the original output format for sites (Sanger for example)
	who still require it. By default, the origfile output file is
	not created.

	Report output for patmatdb and patmatmotifs. For patmatmotifs the
	prosite documentation appears in the report footer, with the
	addition of the motif name and the number of matches in the
	sequence.

	Report headers and footers automatically trim last newline.

	Reports in -rformat SeqTable right-align numbers.

	Report output for marscan (-rformat GFF by default)

	Report output for fuzztran (-rformat table with the translation
	included as a report field). Using -rformat seqtable with fuzztran
	now also shows the original DNA sequence.

	Report output for fuzznuc and fuzzpro (-rformat SeqTable by default)

	New report qualifiers -raccshow to include accession in header
	and -rdesshow to include description in header

	Two access methods "file" and "offset" were defined as valid in
	database definitions, but are really reserved for simple file reading.
	They are removed from the database access methods list.

	Two access methods "cmd" and "nbrf" are obsolete (cmd was never
	implemented, nbrf is replaced by gcg which includes a query
	mechanism). Both are removed from the database access methods list,
	and the source code is commented out.

	SRS, SRSFASTA and SRSWWW database access can read all entries This
	is not recommended for SRSWWW access because it will read
	everything into memory - all of EMBL for example - then strip out
	HTML tags before reading. For SRS it is not recommended because
	"methodall: direct" is faster. For SRSFASTA it is necessary
	because using SRSFASTA implies EMBOSS does not read the original
	data format. However, not implementing an "all" search left a gap
	in the SRS access methods which would generate a bad SRS command
	line or URL.

	NBRF sequence reading trims last character only if it is '*'
	to catch cases where SRS reports the sequence as 'plain'

	GCG database text has the spaces in ". ." strings removed.

	Database entry text and sequence saved for binary formats (GCG, BLAST)
	for use by entret and other applications

	Dbiblast indices with split databases (formatdb -v) fixed for reading
	all entries (was only reading the first file)

	Dbiblast and dbigcg indices support exclude and file definitions
	to create database subsets

	Database include and file definitions can use the simple filename.
	In some cases the full path was used. Database files are checked
	both with and without the directory path for back-compatibility.

	srswww access method created to query a remote web server.
	Preferred to using URL access as SRS queries can be built

	Sequence objects include the SeqVersion, Keyword list and Taxonomy
	list.

	The GI number is read as an alternative SeqVersion where it is
	available (GenBank and some NCBI formats). The GI number is
	reported in GenBank format if available, but the GenBank VERSION
	line may have only the SeqVersion if, for example, the sequence
	was read from an EMBL entry. "sv" queries check both the
	SeqVersion and GI number.

	Accession numbers have a strict definition, which covers the old
	and new EMBL/GenBank format, SwissProt, PIR, and REFSEQ
	(NM_nnnnnn). Earlier versions would accept any "accession number"
	in some sequence formats, especially NCBI format.

	SeqVersion (EMBL SV line, GenBank VERSION line) is used in preference
	to accession number where available. Can also be read in FASTA
	and NCBI formats. Where only the SeqVersion is available, the
	accession number is generated.

	USA queries implement searches by SV, DES, ORG and KEY. These work
	with SRS access methods (SRS, SRSFASTA, SRSWWW) by building SRS
	queries, and with direct access (simple file reading) by
	testing the sequence object.

	Key and Org queries are for full keywords (including spaces) and
	for each level of the taxonomy.

	Des queries, if the access method does not provide a mechanism,
	(if the access method does not have its own index) are applied to
	words within the description. Words start with a letter or number,
	and end with a letter or number. SRS typically does the same, but
	allows a single quote at the end. This catches words such as 3'
	and 5' but is a problem with some quoted text.

	Queries for ID ACC SV DES ORG and KEY are valid for all file
	access methods, including URL, external, cmd, app, file and by
	default any new method added. If the internal query data is not
	flagged by the access method (to show the database has been
	queried) the sequence object is automatically tested.

	Missing description, keyword, organism, or seqversion fields cause
	queries to fail if they are used on inappropriate data.

	Dbiflat, dbigcg dbifasta and dbiblast can index the new
	fields. All fields are available in dbiflat and dbigcg. The sv and
	des fields are available in dbifasta and dbiblast. If any specific
	formats make it possible to parse the org (or key) field they can
	be added as new formats.

	The new EMBLCD index files are named as follows: des for the
	descriptions (no obvious standard name), seqvn for the seqversion
	(no obvious standard name), keyword for keywords (EMBLCD
	distribution name) and taxon to organism (EMBCD distribution
	name). The EMBLCD distribution also included a freetext index
	which is similar to the SRS alltext search so we did not use the
	name for the description index.

	We are working through the EMBLCD format documentation to make
	EMBOSS indices more compatible. For example, all tokens in the TRG
	index files should have trailing spaces. We use a NULL to mark the
	end of the string.

	EMBLCD index files now expand to fit the longest token, including
	the entryname index which was limited to 12 characters (only one
	site reported a problem with this in dbifasta with long ID names).

	A new qualifier -maxindex sets an upper limit (25 is recommended)
	to limit the size of all index files. Currently this applies to
	all indices. We can add separate maxima for each field if
	needed. We expect very few sites to use the extra index fields
	as SRS is a simpler alternative.

	New database definition token 'fields' with a list of indexed fields
	can be set to 'sv des org key' for SRS databases.

	USAs check the query field against the database 'fields'
	definition. ID and ACC are always allowed. dbname:name still
	searches ID and ACC (no change from previous version)

	USAs with a filename can include the new query fields. The syntax is
	filename:field:query for example empro.dat:id:eclaci (the extended
	syntax is because empro.dat-id:eclaci looks like a filename ending
	in -id)

	Application 'tranalign' added.
	This aligns nucleic coding regions based on a set of aligned proteins.

Version 2.3.1 07-mar-2002

	Est2genome fixed for large alignments (over 40Mbase for
	est * genomic sequence length).

	Sequence reading for ABI files fixed (and selex files tested).

	Genbank feature input working.

	Pepinfo PNG output larger to make the text readable (only affects
	PNG output).

	Empty sequence file input fails gracefully.

	Empty sequence input fails gracefully (and only needs one
	^D from stdin).

Version 2.3.0 03-mar-2002

	Seqretall, seqretallfeat and seqretset moved to 'make check'.
	Seqret has all the functionality of the above.

	Fix for NBRF accession number reading (ajseqread.c).

	Whichdb program added.

	Fix for dbifasta and wormpep.

	Fix for problem reading plain format sequences by primer3.

	Primer3 renamed eprimer3 to avoid conflicts with the Whitehead's
	Primer3 version 3.0.6.

	Transeq's '-frame' can have a list of values, as: '-frame=1,2,3'.

	Non-existent files in lists are again ignored.

	Various wildcard database search fixes.

	ESIM4 added as an embassy package.

Version 2.2.0 12-Jan-2001

	New applications:
	Biosed, Contacts, Dichet, Psiblasts,
	Scopalign, Sigscan, Siggen.

	Configure tidy.

	Alignment report fixes.

Version 2.1.0  24-Dec-2001

	Jemboss.

	More formats for reports and alignments.

Version 2.0.1  29-jul-2001

	Release of HMMER as an embassy package.

	DBIGCG bugfix

Version 2.0.0  15-jul-2001

	New feature table handling etc.

Version 1.13.1 25-may-2001

	Fix emboss.default.template problem

Version 1.13.0 24-may-2001

	New applications showalign and embossversion.

	Prophet fixed.

Version 1.12.0 17-Apr-2001

	New applications distmat and cai.

Version 1.11.0 10-mar-2001

	New applications charge and degapseq.

Version 1.10.1 26-Feb-2001

	Bug fixes of marscan, getorf and garnier

Version 1.10.0 18-Feb-2001

	New applications scope, nrscope, domainer.

	Initial large file model support.

Version 1.9.0 22-Jan-2001

	New applications abiview and  recode.

	Linked list and string iterator code rewritten.

Version 1.8.0 20-Nov-2000

	New application coderet.

	Corba test routines

Version 1.7.0 31-Oct-2000

	New application entret.

	GCG output style changed.

	Fixed -slower & -supper input options for multiple sequences

Version 1.6.3 25-Oct-2000

	Further mods for seqed files.

	Rewrite of profile core routines.

	Added %id, %sim and fasta output to needle and water.

Version 1.6.2 23-Oct-2000

	Now reads GCG seqed mangled files.

	Phylip  output fixed.

	Numerous minor changes.

Version 1.6.1 11-Oct-2000

	RedHat Linux 7.0 fpos_t fix

Version 1.6.0 06-Oct-2000

	New application cons.

Version 1.5.6 3-Oct-2000

	URL access handles new SRS6.07* format.

	Library and applications leak-free.

	Error messages made less daunting.

Version 1.5.5 28-Sep-2000

	dbigcg changes for genbank.

	Memory leaks plugged.

Version 1.5.4 23-Sep-2000

	Added blast multi-volume support for database indexing.

	More gui hints in ACD files.

Version 1.5.3 18-Sep-2000

	LinuxPPC support added.

Version 1.5.2 5-Sep-2000

	dbigcg changes for embl database in GCG format.

Version 1.5.1 09-Sep-2000

	Changes to graphics data output for GUIs.

Version 1.5.0 07-Sep-2000

	New application emowse.

Version 1.4.3 03-Sep-2000

	tfm corrected.

	HTML documentation corrected.

	More GUI work.

Version 1.4.2 29-aug-2000

	Changes to graphics data output for GUIs.

Version 1.4.1 25-aug-2000

	Minor library changes.

Version 1.4.0 20-aug-2000

	New application silent

Version 1.3.1 18-aug-2000

	Indexing filenamelen fix.

	Modification to diffseq.

Version 1.3.0 17-aug-2000

	New applications vectorstrip and diffseq.

Version 1.2.0 15-aug-2000

Version 1.1.0 09-aug-2000

Version 1.0.2 08-aug-2000

Version 1.0.0 15-jul-2000

Version 0.0.4 Dec-1998