Methods

PDM: Probabilistic Divergence Measures method

The PDM method utilizes a Markov chain Monte Carlo (MCMC) approach to phylogenetic analysis, that attempts to detect recombinants based on a marginal posterior distribution analysis. A fixed size window (eg, 500bp wide) is moved along a sequence alignment. For every position, the posterior probability of tree topologies conditional on the subsequence alignment selected by the moving window is determined by a MCMC simulation.

On moving into a recombinant region, this marginal posterior distribution of topologies can be expected to change. This can be quantified by probabilistic divergence measures, for example, a local measure (AS) comparing the distributions on two adjacent windows. This divergence measure is then plotted along the alignment. The MCMC approach is currently limited to analyses of about 10 sequences. The MCMC method attempts to focus only on topology changes and to exclude all rate heterogeneity effects.

More information on the PDM method is available in:

More information on the Pruned PDM method is available in:


HMM: Hidden Markov Model method

A hidden Markov Model (HMM) approach can be applied to the problem of detecting recombination in small alignments (number of sequences = 4). The mean distance between recombination breakpoints is modelled by the probability of a recombination event as we move along the sequence alignment. For the four sequences, there are three possible tree (unrooted) topologies.

The transitions between the three topologies are assigned probabilities. The hidden states of the HMM represent the different phylogenetic tree topologies: we observe the sequences but cannot directly see the "hidden" tree topologies. The parameters of the model, namely the branch lengths associated with each topology and the recombination probability, are optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The HMM method focuses only on topology changes in the alignment, and attempts to reduce rate heterogeneity effects. Statistical significance is assessed by (posterior) probabilities assigned to each topology for each position in the alignment.

More information on the HMM method is available in:

 

DSS: Difference of Sums of Squares method

DSS uses fast approximate distance-based phylogenetic methods and can be used with large alignments. The method slides a fixed-size window (eg, 500bp wide) along the alignment, comparing the left-hand window (WL) with the right-hand window (WR). In WL, the matrix of pairwise genetic distances among the sequences is calculated, and a phylogenetic tree is then estimated by minimizing the sums-of-squares SSL between the observed distances and the distances based on the tree. A distance matrix is then calculated for WR, and the WL topology is fitted to it, yielding a second sum-of-squares value (SSR), When the WR topology has changed due to recombination, the WL topology will be a poor fit to the WR distance matrix.

Putative recombination breakpoints can then be observed by plotting the difference between SSL and SSR (DSS statistic) against the window centre. The influence of mean rate heterogeneity is removed from the analysis, but the DSS statistic will still be inflated when branch lengths change non-uniformly among branches as we move along the alignment. Recent improvements allow the statistic significance of DSS peaks to be estimated with parametric bootstrapping.

More information on the DSS method is available in: