File: awt_csp.hlp

package info (click to toggle)

arb 6.0.2-1%2Bdeb8u1

links: PTS, VCS
area: non-free
in suites: jessie
size: 65,916 kB
ctags: 53,258
sloc: ansic: 394,903; cpp: 250,252; makefile: 19,620; sh: 15,878; perl: 10,461; fortran: 6,019; ruby: 683; xml: 503; python: 53; awk: 32

file content (77 lines) | stat: -rw-r--r-- 2,541 bytes

parent folder | download | duplicates (6)

#Please insert up references in the next lines (line starts with keyword UP)
UP	arb.hlp
UP	glossary.hlp

#Please insert subtopic references  (line starts with keyword SUB)
SUB	pos_var_pars.hlp

# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}

#************* Title of helpfile !! and start of real helpfile strunk********
TITLE		Estimate Parameters from Column Statistics

OCCURRENCE	ARB_DIST

DESCRIPTION	In a standard RNA, base frequencies are not equally
		distributed. Especially in the archea subclass we find
		extremely G+C rich sequences.
		This yielded in a couple of new rate corrections, algorithms
		and programs which:

			- calculate the average G+C content of all/two sequences
			- correct the distance.

		But further research showed us that the G+C frequencies are
		not equally distributed within a sequence. Especially helical
		parts have a significant higher G+C content than non
		helical parts.
		One strait forward algorithm would calculate each frequency
		independently for each column.
		Especially for small datasets the resulting frequencies would
		look like random data, as too few examples are analyzed.

		In ARB we implemented a combination of the 2 approaches.
			Lets say we want to estimate a Parameter 'P' with
			a maximum variance 'maxvar', so we need a minimum
			samples 'minsap'.

 			- All sequence positions are clustered according to

				- helical/non helical region
				- variability

			  The size of the cluster is choosen with respect
			  to the variability of the sequences to get a
			  minimum of independent events.

			- The final parameter estimate for a column is a
			  weighted sum between the estimate for the
			  cluster and the estimate for the single position.

		You can give your favorite method a higher weight by
		controlling the smoothing parameter:

			Less smoothing -> independent parameter estimates

			Much smoothing -> clustered parameter estimates

		To get a good tree we recommend you to try all selections.

NOTES		To get parameters from a column statistic you first have
		to create one.
		Do this with <ARB_NT/SAI/Positional Variability (Parsimony M.)>

WARNINGS	Problems may occur when

			 1. 'independent parameter estimates' is selected and
			 2. your dataset is quite small (<100 Sequences) and
			 3. one sequence is bad or badly aligned

		or

			 1. Much smoothing of parameters is selected and
			 2. you are analyzing ribosomal RNA and
			 3. 'Use Helix Information' is turned off


BUGS		No bugs known