
|
#=================================================================#
# #
# PAL2NAL: robust conversion of protein sequence alignments #
# into the corresponding codon-based DNA alignments #
# #
# version 1.0 June 15, 2005 #
# version 1.1 October 11, 2005 #
# version 2.1 March 28, 2006 #
# version 8.0 May 29, 2006 #
# version 10.0 August 1, 2006 #
# version 11.0 August 9, 2006 #
# version 12.0 February 2, 2007 #
# version 12.1 June 22, 2009 #
# version 12.2 September 9, 2009 #
# version 13.0 July 26, 2010 #
# version 14.0 December 2, 2011 #
# version 14.1 October 17, 2017 #
# #
# Mikita Suyama (mikita@bioreg.kyushu-u.ac.jp) #
# #
#=================================================================#
#------------------#
# What is pal2nal?
#------------------#
PAL2NAL is a program that converts a multiple sequence alignment
of proteins and the corresponding DNA (or mRNA) sequences into
a codon-based DNA alignment. The program automatically assigns
the corresponding codon sequence even if the input DNA sequence
has mismatches with the input protein sequence, or contains UTRs,
polyA tails. It can also deal with frame shifts in the input
alignment, which is suitable for the analysis of pseudogenes.
The resulting codon-based DNA alignment can further be subjected
to the calculation of synonymous (Ks) and non-synonymous (Ka)
substitution rates.
The script is licensed under GPL v2.
#-----------#
# Reference
#-----------#
If you use PAL2NAL, please cite the following paper:
- Mikita Suyama, David Torrents, and Peer Bork (2006)
PAL2NAL: robust conversion of protein sequence alignment into
the corresponding codon alignments.
Nucleic Acids Res. 34:W609-W612.
#-------#
# Files
#-------#
The distribution version should contain the following files:
README - This document
pal2nal.pl - The script (version 14) (written in Perl)
test.aln - test data (protein alignment)
test.nuc - test data (DNA sequences)
for_paml (directory)
test.cnt - control file for codeml
test.tree - tree file used for codeml
test.codeml.ori - an example of codeml output
#-------#
# Usage
#-------#
Usage: pal2nal.pl pep.aln nuc.fasta [nuc.fasta...] [options]
pep.aln: protein alignment either in CLUSTAL or FASTA format
- works not only a pairwise alignment but also
the alignment with more than 2 sequences.
- if there are frame shifts in your alignment,
you have to specify those positions by numbers:
for example '2' in the alignment means that
there are only two bases (i.e. one base deletion)
(see 'test.aln').
nuc.fasta: DNA sequences
(single multiple fasta format file,
or may be separated files)
Options:
-h Show help
-output (clustal|paml|fasta|codon)
Output format, default = clustal
-blockonly Show only user specified blocks
'#' under CLUSTAL alignment (see example)
-nogap remove columns with gaps and inframe stop codons
-nomismatch remove mismatched codons (mismatch between
pep and cDNA) from the output
-codontable (1(default)|2|3|4|5|6|9|10|11|12|13|14|15|16|21|22|23)
NCBI GenBank codon table
1 Universal code
2 Vertebrate mitochondrial code
3 Yeast mitochondrial code
4 Mold, Protozoan, and Coelenterate Mitochondrial code
and Mycoplasma/Spiroplasma code
5 Invertebrate mitochondrial
6 Ciliate, Dasycladacean and Hexamita nuclear code
9 Echinoderm and Flatworm mitochondrial code
10 Euplotid nuclear code
11 Bacterial, archaeal and plant plastid code
12 Alternative yeast nuclear code
13 Ascidian mitochondrial code
14 Alternative flatworm mitochondrial code
15 Blepharisma nuclear code
16 Chlorophycean mitochondrial code
21 Trematode mitochondrial code
22 Scenedesmus obliquus mitochondrial code
23 Thraustochytrium mitochondrial code
-html HTML output (only for the web server)
-nostderr No STDERR messages (only for the web server)
- The correspondence of IDs between pep.aln and nuc.fasta is
automatically checked:
- If you use the same IDs in both pep.aln and nuc.fasta,
the sequences don't have to be in the same order.
- If not, the order of the sequences in pep.aln and nuc.fasta
has to be the same.
- IDs in pep.aln are used in the output.
Example: pal2nal.pl test.aln test.nuc -output paml -nogap
#---------------------------------#
# How to calculate Ks, Ka values?
#---------------------------------#
To calclate Ks, Ka values, you need the codeml program, which
is included in the PAML package. You can download PAML from
http://abacus.gene.ucl.ac.uk/software/paml.html
As an example, a control file (test.cnt) and a tree file (test.tree)
are in the "for_paml" sub-directory.
These control file and tree file are designed for the 'test' data
used in PAL2NAL.
Example:
pal2nal.pl test.aln test.nuc -output paml -nogap > for_paml/test.codon
cd for_paml
codeml test.cnt
You can find the output of codeml in "test.codeml".
Ks, Ka values are very end of the output file.
Just for comparison, there is a sample output file, "test.codeml.ori".
#------------#
# WWW server
#------------#
http://www.bork.embl.de/pal2nal
or
http://www.genome.med.kyoto-u.ac.jp/cgi-bin/suyama/pal2nal/index.cgi (not working)
#---------#
# Contact
#---------#
If you have any questions or comments, please email me:
Mikita Suyama
Medical Institute of Bioregulation
Kyushu University,
812-8582 Fukuoka, JAPAN
tel: +81 92 642 6384
fax: +81 92 642 6562
email: mikita@bioreg.kyushu-u.ac.jp
|