1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777
|
/*
* puzzle.h
*
*
* Part of TREE-PUZZLE 5.2 (July 2004)
*
* (c) 2003-2004 by Heiko A. Schmidt, Korbinian Strimmer, and Arndt von Haeseler
* (c) 1999-2003 by Heiko A. Schmidt, Korbinian Strimmer,
* M. Vingron, and Arndt von Haeseler
* (c) 1995-1999 by Korbinian Strimmer and Arndt von Haeseler
*
* All parts of the source except where indicated are distributed under
* the GNU public licence. See http://www.opensource.org for details.
*
* ($Id$)
*
*/
#ifndef _PUZZLE_
#define _PUZZLE_
#ifndef PACKAGE
# define PACKAGE "tree-puzzle"
#endif
#ifndef VERSION
# define VERSION "5.2-generic"
#endif
#define DATE "July 2004"
/* prototypes */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <ctype.h>
#include <string.h>
#include <limits.h>
#include <float.h>
#include "pstep.h"
#include "util.h"
#include "ml.h"
#include "consensus.h"
#include "treesort.h"
#ifdef PARALLEL
# include "ppuzzle.h"
#endif
#define STDOUT stdout
/* time check interval in seconds for ML/puzzling step status (900 = 15min) */
# define TIMECHECK_INTERVAL 900
/* filenames */
# define FILENAMELENGTH 2048
/* maximum exp difference, such that 1.0+exp(-TP_MAX_EXP_DIFF) == 1.0 */
# define TP_MAX_EXP_DIFF 40
# define INFILEDEFAULT "infile"
# define OUTFILEDEFAULT "outfile"
# define TREEFILEDEFAULT "outtree"
# define INTREEDEFAULT "intree"
# define DISTANCESDEFAULT "outdist"
# define TRIANGLEDEFAULT "outlm.eps"
# define UNRESOLVEDDEFAULT "outqlist"
# define ALLQUARTDEFAULT "outallquart"
# define ALLQUARTLHDEFAULT "outallquartlh"
# define SITELHDEFAULT "outsitelh"
# define SITELHBDEFAULT "outsitelhb"
# define SITERATEDEFAULT "outsiterate"
# define SITERATEBDEFAULT "outsiterateb"
# define OUTPARAMDEFAULT "outparam"
# define SUBSETDEFAULT "insubsetmatr"
# define OUTPTLISTDEFAULT "outpstep"
# define OUTPTORDERDEFAULT "outptorder"
# define INFILE infilename
# define OUTFILE outfilename
# define TREEFILE outtreename
# define INTREE intreename
# define DISTANCES outdistname
# define TRIANGLE outlmname
# define UNRESOLVED outqlistname
# define ALLQUART outallquartname
# define ALLQUARTLH outallquartlhname
# define SITELH outsitelhname
# define SITELHB outsitelhbname
# define SITERATE outsiteratename
# define SITERATEB outsiteratebname
# define OUTPARAM outparamname
# define SUBSET insubsetmatrname
# define OUTPTLIST outpstepname
# define OUTPTORDER outptordername
# define FILEPREFIX fileprefixname
EXTERN char infilename [FILENAMELENGTH];
EXTERN char outfilename [FILENAMELENGTH];
EXTERN char outtreename [FILENAMELENGTH];
EXTERN char intreename [FILENAMELENGTH];
EXTERN char outdistname [FILENAMELENGTH];
EXTERN char outlmname [FILENAMELENGTH];
EXTERN char outqlistname [FILENAMELENGTH];
EXTERN char outallquartname [FILENAMELENGTH];
EXTERN char outallquartlhname [FILENAMELENGTH];
EXTERN char outsitelhname [FILENAMELENGTH];
EXTERN char outsitelhbname [FILENAMELENGTH];
EXTERN char outsiteratename [FILENAMELENGTH];
EXTERN char outsiteratebname [FILENAMELENGTH];
EXTERN char outparamname [FILENAMELENGTH];
EXTERN char insubsetmatrname [FILENAMELENGTH];
EXTERN char outpstepname [FILENAMELENGTH];
EXTERN char outptordername [FILENAMELENGTH];
EXTERN char fileprefixname [FILENAMELENGTH];
#define OUTFILEEXT "puzzle"
#define TREEFILEEXT "tree"
#define DISTANCESEXT "dist"
#define TRIANGLEEXT "eps"
#define UNRESOLVEDEXT "qlist"
#define ALLQUARTEXT "allquart"
#define ALLQUARTLHEXT "allquartlh"
#define SITELHEXT "sitelh"
#define SITELHBEXT "sitelhb"
#define SITERATEEXT "siterate"
#define SITERATEBEXT "siterateb"
#define OUTPARAMEXT "param"
#define SUBSETEXT "subsetmatr"
#define OUTPTLISTEXT "pstep"
#define OUTPTORDEREXT "ptorder"
/* resetprefix values (xxx) */
#define RESET_NONE 0
#define RESET_INFILE 1
#define RESET_INTREE 2
/* setprefix_optn values (xxx) */
#define SETPREFIX_NONE 0
#define SETPREFIX_INFILE 1
#define SETPREFIX_INTREE 2
#define SETPREFIX_RESET 3
#define SETPREFIX_PREFIX 4
/* auto_aamodel/auto_datatype values (xxx) */
#define AUTO_OFF 0
#define AUTO_GUESS 1
#define AUTO_DEFAULT 2
/* qptlist values (xxx) */
#define PSTOUT_NONE 0
#define PSTOUT_ORDER 1
#define PSTOUT_LISTORDER 2
#define PSTOUT_LIST 3
/* dtat_optn values (xxx) */
#define NUCLEOTIDE 0
#define AMINOACID 1
#define BINARY 2
/* typ_optn values (xxx) */
#define TREERECON_OPTN 0
#define LIKMAPING_OPTN 1
#define NUMTYPES 2
/* puzzlemodes (xxx) */
#define QUARTPUZ 0
#define USERTREE 1
#define CONSENSUS 2
#define PAIRDIST 3
#define NUMPUZZLEMODES 4
/* modes for rootsearch (xxx) */
#define ROOT_USER 0
#define ROOT_AUTO 1
#define ROOT_DISPLAYED 2
#define NUMROOTSEARCH 3
#if 0
/* rhetmodes (xxx) Modes of rate heterogeneity */
#define UNIFORMRATE 0
#define GAMMARATE 1
#define TWORATE 2
#define MIXEDRATE 3
#endif
/* defines for types of quartet likelihood computation (xxx) */
#define EXACT 0
#define APPROX 1
/* defines for results of sequence number check (seqnumcheck) */
#define SEQNUM_OK 0
#define SEQNUM_TOOFEW 1
#define SEQNUM_TOOMANY 2
#if 0
typedef struct {
uli fullres_pro;
uli fullres_con;
uli partres_pro;
uli partres_con;
uli unres;
uli missing;
uli qsum;
} qsupportarr_t;
EXTERN cmatrix biparts; /* bipartitions of tree of current puzzling step */
EXTERN cmatrix consbiparts; /* bipartitions of majority rule consensus tree */
EXTERN int xsize; /* depth of consensus tree picture */
EXTERN ivector consconfid; /* confidence values of majority rule consensus tree */
EXTERN ivector conssizes; /* partition sizes of majority rule consensus tree */
EXTERN ivector xcor; /* x-coordinates of consensus tree nodes */
EXTERN ivector ycor; /* y-coordinates of consensus tree nodes */
EXTERN ivector ycormax; /* maximal y-coordinates of consensus tree nodes */
EXTERN ivector ycormin; /* minimal y-coordinates of consensus tree nodes */
/* splits for consensus */
EXTERN int splitlength; /* length of one entry in splitpatterns */
EXTERN uli maxbiparts; /* memory reserved for maxbiparts bipartitions */
EXTERN uli numbiparts; /* number of different bipartitions */
EXTERN int *splitsizes; /* size of all different splits of all trees */
EXTERN uli *splitcomptemp; /* temp variable to compress bipartitions coding */
EXTERN uli *splitfreqs; /* frequencies of different splits of all trees */
EXTERN uli *splitpatterns; /* all different splits of all trees */
EXTERN qsupportarr_t *qsupportarr; /* quartet support values per split */
EXTERN uli consincluded; /* number of included biparts in the consensus tree */
EXTERN uli consfifty; /* number of biparts >= 50% */
EXTERN uli conscongruent; /* number of first incongruent bipart */
#endif
/* variables */
EXTERN cmatrix Seqchars; /* characters contained in data set */
EXTERN ivector Seqgapchar; /* counter for gaps contained in sequence */
EXTERN ivector Seqotherchar; /* counter for ambiguous contained in sequence */
EXTERN cmatrix treepict; /* picture of consensus tree */
EXTERN double minscore; /* value of edgescore on minedge */
EXTERN double tstvf84; /* F84 transition/transversion ratio */
EXTERN double tstvratio; /* expected transition/transversion ratio */
EXTERN double yrtsratio; /* expected pyrimidine/purine transition ratio */
EXTERN dvector ulkl; /* log L of user trees */
EXTERN dmatrix allsites; /* log L per sites of utrees (numutrees,Numptrn) */
EXTERN dvector ulklc; /* log L of utrees (clock) (numutrees,Numptrn) */
EXTERN dmatrix allsitesc; /* log L per sites of user trees (clock) */
EXTERN FILE *utfp; /* pointer to user tree file */
EXTERN FILE *ofp; /* pointer to output file */
EXTERN FILE *seqfp; /* pointer to sequence input file */
EXTERN FILE *tfp; /* pointer to tree file */
EXTERN FILE *dfp; /* pointer to distance file */
EXTERN FILE *trifp; /* pointer to triangle file */
EXTERN FILE *unresfp; /* pointer to file with unresolved quartets */
EXTERN FILE *tmpfp; /* pointer to temporary file */
EXTERN FILE *qptlist; /* pointer to file with puzzling step trees */
EXTERN FILE *qptorder; /* pointer to file with unique puzzling step trees */
EXTERN int SHcodon; /* whether SH should be applied to 1st, 2nd codon positions */
EXTERN int utree_optn; /* use first user tree for estimation */
EXTERN int listqptrees; /* list puzzling step trees */
EXTERN int approxqp; /* approximate QP quartets */
EXTERN int codon_optn; /* declares what positions in a codon should be used */
EXTERN int compclock; /* computation of clocklike branch lengths */
EXTERN int chooseA; /* leaf variable */
EXTERN int chooseB; /* leaf variable */
EXTERN int clustA, clustB, clustC, clustD; /* number of members of LM clusters */
EXTERN int Frequ_optn; /* use empirical base frequencies */
/* PSTEP_ORIG_H */
EXTERN int Maxbrnch; /* 2*Maxspc - 3 */
EXTERN int Maxseqc; /* number of sequence characters per taxum */
EXTERN int mflag; /* flag used for correct printing of runtime messages */
EXTERN int numclust; /* number of clusters in LM analysis */
EXTERN int outgroup; /* outgroup */
EXTERN int puzzlemode; /* computation of QP tree and/or ML distances */
EXTERN int rootsearch; /* how location of root is found */
EXTERN int rhetmode; /* model of rate heterogeneity */
EXTERN int seqnumcheck; /* result of sequence number checkheterogeneity */
EXTERN int usebestq_optn; /* use only best quartet topology, no bayesian weights */
EXTERN int printrmatr_optn; /* print rate matrix to screen */
EXTERN int usebranch_optn; /* use branch lengths given in usertree */
EXTERN int show_optn; /* show unresolved quartets */
EXTERN int savequart_optn; /* save memory block which quartets to file */
EXTERN int savequartlh_optn; /* save quartet likelihoods to file */
EXTERN int saveqlhbin_optn; /* save quartet likelihoods binary */
EXTERN int savesitelh_optn; /* save site likelihoods (PHILIP-like) */
EXTERN int savesitelhb_optn; /* save site likelihoods (binary) */
EXTERN int savesiterate_optn; /* save site rates (PHILIP-like) */
EXTERN int savesiterateb_optn; /* save site rates (binary) */
EXTERN int readquart_optn; /* read memory block which quartets from file */
EXTERN int fixedorder_optn; /* use fixed puzzling insert order 1,2,3,...n */
#if 0
EXTERN int skipmlbranch_optn;/* skip ml branches in final tree */
#endif
EXTERN int conssub50_optn; /* do consensus down to percentage of first incongruence/ambiguity */
EXTERN int qsupport_optn; /* compute quartet support for the splits */
EXTERN int dotreetest_optn; /* compute tree statistics (ELW, KH, SH) */
EXTERN int dotreelh_optn; /* compute lh/branch lengths (QP,consensus) */
EXTERN int setprefix_optn; /* use other FILEPREFIX that INFILENAME */
EXTERN int consensus_optn; /* do consensus tree instead of user trees */
EXTERN int sym_optn; /* symmetrize doublet frequencies */
EXTERN int ytaxcounter; /* counter for establishing y-coordinates of all taxa */
EXTERN int numutrees; /* number of users trees in input tree file */
EXTERN ivector clusterA, clusterB, clusterC, clusterD; /* clusters for LM analysis */
EXTERN ivector ycortax; /* y-coordinates of all taxa */
EXTERN uli badqs; /* number of bad quartets */
EXTERN uli unresqs; /* number of unresolved quartets, formerly badqs */
EXTERN uli partresqs; /* number of partly resolved quartets */
EXTERN uli fullresqs; /* number of fully resolved quartets */
EXTERN uli missingqs; /* number of fully resolved quartets */
EXTERN ulivector badtaxon; /* involment of each taxon in a bad quartet */
EXTERN ulimatrix qinfomatr; /* sums of quartet topologies per taxon [0]=sum */
EXTERN uli Currtrial; /* counter for puzzling steps */
EXTERN uli Numquartets; /* number of quartets */
EXTERN uli Numtrial; /* number of puzzling steps */
EXTERN uli lmqts; /* quartets investigated in LM analysis (0 = ALL) */
EXTERN int auto_datatype; /* guess datatype ? */
EXTERN int guessdata_optn; /* guessed datatype */
EXTERN int auto_aamodel; /* guess amino acid modell ? */
EXTERN int guessauto_aamodel; /* guessed amino acid modell ? */
EXTERN int guessDayhf_optn; /* guessed Dayhoff model option */
EXTERN int guessJtt_optn; /* guessed JTT model option */
EXTERN int guessblosum62_optn; /* guessed BLOSUM 62 model option */
EXTERN int guessmtrev_optn; /* guessed mtREV model option */
EXTERN int guesscprev_optn; /* guessed cpREV model option */
EXTERN int guessvtmv_optn; /* guessed VT model option */
EXTERN int guesswag_optn; /* guessed WAG model option */
/* missing data: for handling subsets */
EXTERN int readsubset_optn; /* guessed WAG model option */
EXTERN int Maxsubset; /* number of subsets */
EXTERN imatrix ss_setovlgraph; /* size of overlap >= 3 between 2 subsets */
EXTERN imatrix ss_setoverlaps; /* size of overlap between 2 subsets */
EXTERN imatrix ss_setovllist; /* list with ovlerlapping subsets */
EXTERN ivector ss_setovllistsize; /* size of list with ovlerlapping subsets */
EXTERN imatrix ss_matrix; /* boolean list: taxon in set? */
EXTERN imatrix ss_list; /* list of taxa in set */
EXTERN ivector ss_listsize; /* size of list with taxa */
/* counter variables needed in likelihood mapping analysis */
EXTERN uli ar1, ar2, ar3;
EXTERN uli reg1, reg2, reg3, reg4, reg5, reg6, reg7;
EXTERN uli reg1l, reg1r, reg2u, reg2d, reg3u, reg3d,
reg4u, reg4d, reg5l, reg5r, reg6u, reg6d;
EXTERN unsigned char *quartetinfo; /* place where quartets are stored */
EXTERN dvector qweight; /* for use in QP and LM analysis */
EXTERN dvector sqdiff;
EXTERN ivector qworder;
EXTERN ivector sqorder;
EXTERN int randseed;
EXTERN int psteptreestrlen;
#if 0
typedef struct treelistitemtypedummy {
struct treelistitemtypedummy *pred;
struct treelistitemtypedummy *succ;
struct treelistitemtypedummy *sortnext;
struct treelistitemtypedummy *sortlast;
char *tree;
int count;
int id;
int idx;
} treelistitemtype;
EXTERN treelistitemtype *psteptreelist;
EXTERN treelistitemtype *psteptreesortlist;
EXTERN int psteptreenum;
EXTERN int psteptreesum;
#endif
/* prototypes */
void makeF84model(void);
void compnumqts(void);
void setoptions(void);
int openfiletoread(FILE **, char[], char[]);
int openfiletowrite(FILE **, char[], char[]);
int openfiletoappend(FILE **, char[], char[]);
void closefile(FILE *);
void symdoublets(void);
void computeexpectations(void);
void putdistance(FILE *);
void findidenticals(FILE *);
double averagedist(int maxspc, dmatrix distanmat, double *meandist, double *mindist, double *maxdist, double *stddevdist, double *vardist);
void initps(FILE *);
void plotlmpoint(FILE *, double, double);
void finishps(FILE *);
void makelmpoint(FILE *, double, double, double);
void timestamp(FILE *);
void writeoutputfile(FILE *, int);
/* definitions for writing output */
#define WRITEALL 0
#define WRITEPARAMS 1
#define WRITEREST 2
void writetimesstat(FILE *ofp);
void writecutree(FILE *, int);
void starttimer(void);
void checktimer(uli);
/* void estimateparametersnotree(void); * moved to mlparam.c */
/* void estimateparameterstree(void); * moved to mlparam.c */
int main(int, char *[]);
int ulicmp(const void *, const void *);
int intcmp(const void *, const void *);
void readid(FILE *, int);
char readnextcharacter(FILE *, int, int);
/* skip rest of the line */
void skiprestofline(FILE *ifp, /* input file stream */
int notu, /* taxon number - for error msg. */
int nsite); /* sequence position - for error msg. */
/* skip control characters and blanks */
void skipcntrl(FILE *ifp, /* input file stream */
int notu, /* taxon number - for error msg. */
int nsite); /* sequence position - for error msg. */
/* read sequences of one data set */
void getseqs(FILE *ifp, /* in: input file stream */
int Maxspc, /* in: number of taxa */
int Maxseqc, /* in: number of sites */
cmatrix *seqch, /* out: alignment matrix */
cmatrix identif); /* io: taxon names (10 char w/o stop) */
/* initialize identifer arrays */
void initid(int t, /* in: number of taxa */
cmatrix *identif, /* out: name array w/o end of string */
cmatrix *namestr); /* out: name array with end of string */
/* copy undelimited identifer array to '\0' delimited identifer array */
void identif2namestr(int num, /* number of taxa */
cmatrix Identif, /* non-delimited names */
cmatrix Namestr); /* proper delimited names */
void fputid10(FILE *, int);
int fputid(FILE *, int);
/* read first line of sequence data set */
void getsizesites(FILE *ifp, /* in: input file stream */
int *Maxspc, /* out: number of taxa */
int *Maxseqc); /* out: number of sites */
/* read alignment from file */
void readsequencefile(FILE *seqfp, /* in: sequence input file stream */
int *Maxspc, /* out: number of taxa */
int *Maxseqc, /* out: number of sites */
cmatrix *identif, /* out: name array w/o end of str */
cmatrix *namestr, /* out: name array with end of str */
cmatrix *Seqchars); /* out: alignment matrix */
/* read subsets from file */
void readsubsetfile(FILE *seqfp, /* in: sequence input file stream */
int Maxspc, /* in: number of taxa */
cmatrix namestr, /* in: names taxa (seq file) */
int *Maxsubset, /* out: number of subsets */
imatrix *ss_setovlgraph, /* out: size of overlap >= 3 between 2 subsets */
imatrix *ss_setoverlaps, /* out: size of overlap between 2 subsets */
imatrix *ss_setovllist, /* out: list with ovlerlapping subsets */
ivector *ss_setovllistsize, /* out: size of list with ovlerlapping subsets */
imatrix *ss_matrix, /* out: boolean list: taxon in set? */
imatrix *ss_list, /* out: list of taxa in set */
ivector *ss_listsize); /* out: size of list with taxa */
/* permute taxon order */
void permutetaxa_ss(int Maxspc, /* in: number of taxa */
ivector permutation, /* permuted taxon order */
cmatrix namestr, /* in: names taxa (seq file) */
int Maxsubset, /* out: number of subsets */
imatrix ss_setovlgraph,/* out: size of overlap >= 3 between 2 subsets */
imatrix ss_setoverlaps,/* out: size of overlap between 2 subsets */
imatrix ss_setovllist, /* out: list with overlapping subsets */
ivector ss_setovllistsize,/* out: size of list with ovlerlapping subsets */
imatrix ss_matrix, /* out: bool list: taxon in set? */
imatrix ss_list, /* out: list of taxa in set */
ivector ss_listsize); /* out: size of list with taxa */
/* print subsets */
void fprintfss(FILE *ofp, /* in: output file stream */
int Maxspc, /* in: number of taxa */
int Maxsubset, /* out: number of subsets */
imatrix ss_setovlgraph, /* out: size of overlap >= 3 between 2 subsets */
imatrix ss_setoverlaps, /* out: size of overlap between 2 subsets */
imatrix ss_setovllist, /* out: list with ovlerlapping subsets */
ivector ss_setovllistsize, /* out: size of list with ovlerlapping subsets */
imatrix ss_matrix, /* out: boolean list: taxon in set? */
imatrix ss_list, /* out: list of taxa in set */
ivector ss_listsize); /* out: size of list with taxa */
/* check connectedness of subsets */
void checkss(FILE *ofp, /* in: output file stream */
int Maxspc, /* in: number of taxa */
int Maxsubset, /* out: number of subsets */
imatrix ss_setovlgraph, /* out: size of overlap >= 3 between 2 subsets */
imatrix ss_setoverlaps, /* out: size of overlap between 2 subsets */
imatrix ss_setovllist, /* out: list with ovlerlapping subsets */
ivector ss_setovllistsize, /* out: size of list with ovlerlapping subsets */
imatrix ss_matrix, /* out: boolean list: taxon in set? */
imatrix ss_list, /* out: list of taxa in set */
ivector ss_listsize); /* out: size of list with taxa */
/* guess data type: NUCLEOTIDE:0, AMINOACID:1, BINARY:2 */
int guessdatatype(cmatrix Seqchars, /* alignment matrix (Maxspc x Maxseqc) */
int Maxspc, /* number of taxa */
int Maxseqc); /* number of sites */
void translatedataset(int maxspc, int maxseqc, int *maxsite, cmatrix seqchars, cmatrix *seqchar, ivector *seqgapchar, ivector *seqotherchar);
void estimatebasefreqs(void);
void guessmodel(void);
#if 0
int *initctree();
void copytree_trueID(int *ctree, /* out: copy for effective sorting */
int *trueID, /* in: permutation vector */
ONEEDGE *edgeset, /* in: intermediate tree topology */
int *edgeofleaf, /* dito. */
int nextleaf); /* in: next free leaf (bound) */
void freectree(int **snodes);
void printctree(int *ctree);
char *sprintfctree(int *ctree, int strlen);
void fprintffullpstree(FILE *outf, char *treestr);
int printfsortctree(int *ctree);
int sortctree(int *ctree);
int ct_1stedge(int node);
int ct_2ndedge(int node);
int ct_3rdedge(int node);
void printfpstrees(treelistitemtype *list);
void printfsortedpstrees(treelistitemtype *list);
void fprintfsortedpstrees(FILE *output, treelistitemtype *list, int itemnum, int itemsum, int comment, float cutoff);
void sortbynum(treelistitemtype *list, treelistitemtype **sortlist);
treelistitemtype *addtree2list(char **tree,
int numtrees,
treelistitemtype **list,
int *numitems,
int *numsum);
void freetreelist(treelistitemtype **list,
int *numitems,
int *numsum);
#endif
#if 0
void resetedgeinfo(void);
void incrementedgeinfo(int, int);
void minimumedgeinfo(void);
#endif
#if 0
void initconsensus(void);
/* recursive function to get bipartitions */
void makepart_trueID(int i, /* */
int curribrnch, /* in: current branch in traversal */
ONEEDGE *edge, /* in: tree topology */
int *edgeofleaf, /* in: ext. edge list */
int *trueID, /* in: permutation list */
cmatrix biparts, /* out: split strings, edge by edge */
int Maxspc); /* in: number of species in tree */
/* compute bipartitions of current puzzling step tree */
void computebiparts_trueID(ONEEDGE *edge, /* in: tree topology */
int *edgeofleaf, /* in: ext. edge list */
int *trueID, /* in: permutation list */
cmatrix biparts, /* out: splits */
int outgroup, /* in: outgroup of tree */
int Maxspc); /* in: number of taxa in tree */
void printsplit(FILE *, uli);
void makenewsplitentries(cmatrix bip, /* in: split string vector */
int numspl, /* in: no. of new splits */
uli **in_splitpatterns,/* io: known compr splits */
int **in_splitsizes, /* io: kn. split sizes: '.'*/
uli **in_splitfreqs, /* io: kn. split frequences*/
uli *in_numbiparts, /* io: no. of splits so far*/
uli *in_maxbiparts, /* io: reserved memory */
int Maxspc); /* in: no. of species */
/* copy bipartition n of all different splitpatterns to consbiparts[k] */
void copysplit(uli n, uli *splitpatterns, int k, cmatrix consbipartsvec);
void makeconsensus(uli, int);
/* write node (writeconsensustree) */
void writenode(FILE *treefile, /* in: output stream */
int node, /* current node */
int qsupp_optn, /* 'print quartet support' flag */
qsupportarr_t *qsupparr, /* quartet support values */
int *column); /* current line position */
/* write consensus tree */
void writeconsensustree(FILE *treefile, /* in: output stream */
int qsupp_optn, /* 'print quartsupp' flag */
qsupportarr_t *qsupparr); /* quartet support values */
void writeconsensustree(FILE *, int, qsupportarr_t *);
void nodecoordinates(int);
void drawnode(int, int);
void plotconsensustree(FILE *);
#endif
unsigned char *callocquartets(int);
void freequartets(void);
unsigned char readquartet(int, int, int, int);
void writequartet(int, int, int, int, unsigned char);
void sort3doubles(dvector, ivector);
void computeallquartets(void);
/* check the branching structure between the leaves (not the taxa!)
A, B, C, and I (A, B, C, I don't need to be ordered). As a result,
the two leaves that are closer related to each other than to leaf I
are found in chooseX and chooseY. If the branching structure is
not uniquely defined, chooseX and chooseY are chosen randomly
from the possible taxa */
void checkquartet(int A, /* quartet taxon ID */
int B, /* dito. */
int C, /* dito. */
int I, /* dito., to be inserted */
int *chooseX, /* (chooseX,chooseY | x,I) */
int *chooseY); /* chooseX+Y are non-neighbors of I */
void checkquartet_trueID(int A, /* quartet leaf idx in permutation array */
int B, /* dito. */
int C, /* dito. */
int I, /* dito., to be inserted */
int *trueID, /* permutation array */
int *chooseX, /* (chooseX,chooseY | x,I) */
int *chooseY); /* chooseX+Y are non-neighbors of I */
void num2quart(uli qnum, int *a, int *b, int *c, int *d);
uli numquarts(int maxspc);
uli quart2num (int a, int b, int c, int d);
void writetpqfheader(int nspec, FILE *ofp, int flag);
/* extracted from main (xxx) */
void compute_quartlklhds(int a, int b, int c, int d, double *d1, double *d2, double *d3, int approx);
/* definitions for timing */
#define OVERALL 0
#define GENERAL 1
#define OPTIONS 2
#define PARAMEST 3
#define QUARTETS 4
#define PUZZLING 5
#define TREEEVAL 6
typedef struct {
int currentjob;
clock_t tempcpu;
clock_t tempfullcpu;
clock_t tempcpustart;
time_t temptime;
time_t tempfulltime;
time_t temptimestart;
clock_t maxcpu;
clock_t mincpu;
time_t maxtime;
time_t mintime;
double maxcpublock;
double mincpublock;
double mincputick;
double mincputicktime;
double maxtimeblock;
double mintimeblock;
double generalcpu;
double optionscpu;
double paramestcpu;
double quartcpu;
double quartblockcpu;
double quartmaxcpu;
double quartmincpu;
double puzzcpu;
double puzzblockcpu;
double puzzmaxcpu;
double puzzmincpu;
double treecpu;
double treeblockcpu;
double treemaxcpu;
double treemincpu;
double cpu;
double fullcpu;
double generaltime;
double optionstime;
double paramesttime;
double quarttime;
double quartblocktime;
double quartmaxtime;
double quartmintime;
double puzztime;
double puzzblocktime;
double puzzmaxtime;
double puzzmintime;
double treetime;
double treeblocktime;
double treemaxtime;
double treemintime;
double time;
double fulltime;
} timearray_t;
EXTERN double cputime, walltime;
EXTERN double fullcpu, fulltime;
EXTERN double fullcputime, fullwalltime;
EXTERN double altcputime, altwalltime;
EXTERN clock_t cputimestart, cputimestop, cputimedummy;
EXTERN time_t walltimestart, walltimestop, walltimedummy;
EXTERN clock_t Startcpu; /* start cpu time */
EXTERN clock_t Stopcpu; /* stop cpu time */
EXTERN time_t MLstepStarttime; /* start time */
EXTERN time_t MLstepStoptime; /* stop time */
EXTERN time_t PStepStarttime; /* start time */
EXTERN time_t PStepStoptime; /* stop time */
EXTERN time_t PEstStarttime; /* start time */
EXTERN time_t PEstStoptime; /* stop time */
EXTERN time_t Starttime; /* start time */
EXTERN time_t Stoptime; /* stop time */
EXTERN time_t time0; /* timer variable */
EXTERN time_t time1; /* yet another timer */
EXTERN time_t time2; /* yet another timer */
EXTERN timearray_t tarr;
void resetqblocktime(timearray_t *ta);
void resetpblocktime(timearray_t *ta);
void inittimearr(timearray_t *ta);
void addtimes(int jobtype, timearray_t *ta);
#ifdef TIMEDEBUG
void printtimearr(timearray_t *ta);
#endif /* TIMEDEBUG */
/* compute Bayesian weights from log-lkls d1, d2, d3 */
unsigned char loglkl2weight(int a,
int b,
int c,
int i,
double d1,
double d2,
double d3,
int usebestq);
#if 0
#ifdef PARALLEL
# include "ppuzzle.h"
#endif
#endif
#endif /* _PUZZLE_ */
|