TriFusion features

Overview

TriFusion is a modern GUI and command line application designed to make the life of anyone with proteome and/or alignment sequence data easier and more pleasurable. Regardless of your experience in bioinformatics, TriFusion is easy to use and offers a wide array of powerful features to help you deal with your data. At the same time, it was developed to handle the enormous amount of data that is generated nowadays.

Fast and efficient

User friendly

CLI version available

Open Source

Orthology

Search orthologs across proteomes

Searching for orthologs across multiple proteomes can be a daunting and complex task. Current pipelines are great, but they often require a long and tedious setup and/or execution processes. TriFusion handles all of that for you, while providing a user-friendly interface that leverages the most popular framework for finding orthologs, OrthoMCL.

No fuss. No complications. Searching for orthologs in TriFusion is as simples as it gets.

Jump to orthology search tutorial

Explore and filter your orthologs

So, you have completed your search for orthologs. Now what?

The Explore section of TriFusion's orthology module was designed to let you easily explore your ortholog search results. You can filter orthologs groups according to gene copy number, taxon taxon and/or taxon representation. You can also check the effect that these filters have on the final number of ortholog groups using a suite of interactive plot screens.

Jump to orthology exploration tutorial

Export ortholog groups into protein/nucleotide files

The final step of an ortholog search operation is usually the generation of sequence files for each potential ortholog group. TriFusion lets you apply multiple filters on your ortholog groups and export only the groups that you are interested in.

But that's not all. If you provide the CDS files corresponding to each proteome that you used in your search, you can also export those ortholog groups directly into nucleotide sequences. And don't worry if the headers of the sequences in the CDS and proteome files don't match. TriFusion searches for sequence equality, not header equality, so you can be sure that the sequences are always correctly converted.

Jump to ortholog export tutorial

Process

Blazing fast alignment conversion/concatenation

Designed from the ground up to handle the massive number of alignments that are currently being processed in phylogenomic analyses, TriFusion is able to convert and concatenate thousands of alignment files from several input formats into one or more popular output formats (check the supported formats).

Easily load your data and get the desired converted/concatenated output in seconds.

Jump to basic conversion/concatenation tutorial

High performance and efficiency

So you have a 1Gb alignment file? Or perhaps a set of 7,000 individual alignment files. Don't panic, TriFusion can handle it! With the inclusion of high performance libraries and several efficient programming techniques, even the largest data sets can be processed in seconds or a few minutes. Here's a sample of the run time and RAM consumption benchmarks for the main operations of TriFusion for different types of data sets.

	DS1	DS4	DS5	DS9
Files	614	3093	7378	1
Taxa	48	376	29	376
Total bases	~350k	1.25M	~601k	1.25M
Parsing
Alignment reading	0.8s / 13.9Mb	14.5s / 169.5Mb	13.6s / 117.6Mb	2.9s / 14.5Mb
Main operations
Conversion (nexus)	0.5s / 0.3Mb	17.0s / 0.6Mb	93.7s / 1.0Mb	2.4s / 22.8Mb
Conversion (interleave)	2.4s / 0.4Mb	49.9s / 0.6Mb	110.1s / 1.0Mb	74.9s / 17.0Mb
Concatenation	4.7s / 5.1Mb	554.4s / 21.6Mb	211.2s / 7.1Mb	NA

You can check the complete benchmarks table to see in detail how particular operations of TriFusion fare in terms of run time and memory usage across a wide range of datasets.

Collapse identical sequences

With the Collapse operation, you can collapse identical sequences into the same haplotype, creating an alignment that contains only unique sequences.

Here's a demonstration of how Collapse transforms an alignment:

Example:

# Four taxon alignment where there are only two unique sequences
>TaxonA
AAAAAA
>TaxonB
AAAAAA
>TaxonC
TTTTTT
>TaxonD
TTTTTT

# Collapsed alignment with only the two unique sequences
>Hap1
AAAAAA
>Hap2
TTTTTT

A .haplotypes files is generated alongside the collapsed alignment with the correspondence between haplotype names and the original taxon names.

Jump to Collapse tutorial

Create consensus sequences from alignments

With the Consensus operation, you can create a consensus sequence for each alignment individually or for the concatenated alignment. There are several methods available on how to deal with sequence variation.

Options for sequence variation handling during consensus
IUPAC	Soft mask	Remove	First sequence
(Nucleotide sequences only) Converts variable sites using the IUPAC nucleotide ambiguity code	Replaces variable sites with the missing data symbol	Removes variable sites from the consensus	An arbitrary method that simply uses the first sequence without modification

Optionally, consensus sequences from multiple alignments can be merged into a single output file, ready for downstream analyses. Here's an example of how consensus sequences are created for three input alignments.

Example:

# Alignment1.fas
>TaxonA
ATAAAA
>TaxonB
AAAAAA
>TaxonC
AAAAAG
>TaxonD
AAAAAA

# Alignment2.fas
>TaxonA
TATTTT
>TaxonB
TTTTTC
>TaxonC
TTTTTT
>TaxonD
TTTTTT

# Alignment3.fas
>TaxonA
GAGGGG
>TaxonB
GGGGGG
>TaxonC
GGGGGG
>TaxonD
GGGGGG

Create individual consensus masking sequence variation

# Alignment1_consensus.fas
>consensus
ANAAAN

# Alignment2_consensus.fas
>consensus
TNTTTN

# Alignment3_consensus.fas
>consensus
GNGGGG

Optionally, merge all consensus into a single file

# consensus.fas
>Alignment1_consensus
ANAAAN
>Alignment2_consensus
TNTTTN
>Alignment3_consensus
GNGGGG

Jump to Consensus tutorial

Filter alignments by taxa

Alignments can be filtered depending on whether they contain or excluded user-defined taxa groups. These taxa groups can be easily defined in TriFusion or imported from text files and can represent any number of samples in the data set. For instance, in the example below, we filter out alignments that do not contain both TaxonA and TaxonD, resulting in only 2 out of 4 alignments being further processed.

Example:

# Alignment1.fas
>TaxonA
AAAAAA
>TaxonB
TTTTTT
>TaxonC
CCCCCC
>TaxonD
DDDDDD

# Alignment2.fas
>TaxonA
AAAAAA
>TaxonB
TTTTTT

# Alignment3.fas
>TaxonB
TTTTTT
>TaxonC
CCCCCC
>TaxonD
DDDDDD

# Alignment4.fas
>TaxonA
AAAAAA
>TaxonD
DDDDDD

Preserving only alignments that contain TaxonA and TaxonD

# Alignment1.fas
>TaxonA
AAAAAA
>TaxonB
TTTTTT
>TaxonC
CCCCCC
>TaxonD
DDDDDD

# Alignment4.fas
>TaxonA
AAAAAA
>TaxonD
DDDDDD

Jump to taxa filter tutorial

Filter alignments by codons

Alignment columns can be filtered according to their codon positions in any combination. This is quite useful when you wish to remove, for example, saturated codon positions for downstream analyses. However, note that, for now, TriFusion will filter the same codon positions for each input alignment.

Jump to codon filter tutorial

Filter alignments by missing data/genes

Filtering alignments and alignments columns according to missing taxa and missing data, respectively, is one of the most common processing steps when dealing with large amounts of data.

Filter missing data within alignments

1. Replace gaps at the extremities

The first step when filtering alignment columns for missing data is to replace the gap symbols at the extremities of the alignment by missing data symbols. Gaps at the extremities of the alignment are usually introduced by alignment programs but they actually represent missing data. The distinction may be important for several downstream analyses, for filtering according to gaps/missing data or for coding gaps at the end of the alignment matrix.

>TaxonA
--AA-AA--A-
>TaxonB
-AAAA--AAAA
>TaxonC
AAA--AAAA--
>TaxonD
AAA--AA-AAA

>TaxonA
NNAA-AA--AN
>TaxonB
NAAAA--AAAA
>TaxonC
AAA--AAAANN
>TaxonD
AAA--AA-AAA

2. Filter columns with excessive gaps/missing data

You may want to filter your alignments in order to reduce the amount of columns with excessive missing data, or just plainly remove all missing data from the alignments. TriFusion allows you to set two independent thresholds for gaps and missing data and then removes all alignment columns that exceed those thresholds.

# Filter columns with max gap 50%
# and max missing data 50%
>TaxonA
NNAA-AA--AN
>TaxonB
NAAAA--AAAA
>TaxonC
AAA--AAAANN
>TaxonD
AAA--AA-AAA

# Two columns filtered
>TaxonA
NAAAA--AN
>TaxonB
AAA--AAAA
>TaxonC
AA-AAAANN
>TaxonD
AA-AA-AAA

Filter alignments with low taxa coverage

When you have multiple files you may want to filter alignments that have low taxa coverage in order to produce more dense alignment matrices. This can be easily accomplished by setting a minimum taxa representation proportion that alignments will have to pass to be further processed. This can be used in combination with the taxa filter option for more fine grained control on the resulting alignment matrices.

  # 6 taxa file
  >TaxonA
  AAAAAA
  >TaxonB
  AAAAAA
  >TaxonC
  AAAAAA
  >TaxonD
  AAAAAA
  >TaxonE
  AAAAAA

  # 3 taxa file
  >TaxonA
  AAAAAA
  >TaxonD
  AAAAAA
  >TaxonE
  AAAAAA

  # 2 taxa file
  >TaxonA
  AAAAAA
  >TaxonB
  AAAAAA

  # 5 taxa file
  >TaxonA
  AAAAAA
  >TaxonB
  AAAAAA
  >TaxonC
  AAAAAA
  >TaxonD
  AAAAAA
  >TaxonE
  AAAAAA

Minimum taxa representation of 50%

  # 6 taxa file
  >TaxonA
  AAAAAA
  >TaxonB
  AAAAAA
  >TaxonC
  AAAAAA
  >TaxonD
  AAAAAA
  >TaxonE
  AAAAAA

  # 5 taxa file
  >TaxonA
  AAAAAA
  >TaxonB
  AAAAAA
  >TaxonC
  AAAAAA
  >TaxonD
  AAAAAA
  >TaxonE
  AAAAAA

Jump to missing data filter tutorial

Filter alignments by sequence variation

Alignments can be filtered according to the number of variable and/or parsimoniously informative columns. You can specify a range of acceptable values for either metric and keep only the alignments with the type of variation that you are interested in.

Example:

# 4 variable sites
# Alignment1.fas
>TaxonA
AAAAAA
>TaxonB
ATAATA
>TaxonC
TAAAAA
>TaxonD
AAAAAT

# 1 variable site
# Alignment2.fas
>TaxonA
AAAAAA
>TaxonB
AAAATA
>TaxonC
AAAAAA
>TaxonD
AAAAAA

# 0 variable sites
# Alignment3.fas
>TaxonA
AAAAAA
>TaxonB
AAAAAA
>TaxonC
AAAAAA
>TaxonD
AAAAAA

# 3 variable sites
# Alignment4.fas
>TaxonA
ATAATA
>TaxonB
AAAAAA
>TaxonC
AAAAAA
>TaxonD
AAAAAT

Filter alignments w/ less than 4 var sites

# 4 variable sites
>TaxonA
AAAAAA
>TaxonB
ATAATA
>TaxonC
TAAAAA
>TaxonD
AAAAAT

Jump to sequence variation filter tutorial

Code gaps into the alignment matrix

If you suspect that the indel patterns in your alignments may contain valuable phylogenetic information, TriFusion has the option to code your indel patterns as binary states at the end of the alignment matrix and creates a well formatted Nexus file ready for downstream analysis with programs such as MrBayes. If you are concatenating multiple alignments, the binary matrix will be concatenated as well.

Jump to gap coding tutorial

Reverse a concatenated alignment

If you have a concatenated alignment and a partitions file, you can easily export those partitions as individual alignment files. More generally, TriFusion allows you to export subsets of any given alignment that are defined in a provided partitions file OR defined within TriFusion - even codon partitions!

Example:

If you have a 1,000bp alignment and wish to export 4 partitions of equal length as individual alignment files, you just need to write a partitions file like this (Alternatively you could also define those partitions inside TriFusion - see this tutorial):

# Export these 4 partitions
charset alignment1 = 1:250;
charset alignment2 = 251:500;
charset alignment3 = 501:750;
charset alignment4 = 751:1000;

The export of codon partitions is also possible using the /3 notation. For instance, to split the first partition in the example above into three codon partitions you could do the following:

# Export these 4 partitions
charset alignment1_1 = 1:250/3;
charset alignment1_2 = 2:250/3;
charset alignment1_3 = 3:250/3;
charset alignment2 = 251:500;
charset alignment3 = 501:750;
charset alignment4 = 751:1000;

Jump to reverse concatenation tutorial

Combine multiple operations on a single run

All operations of TriFusion's Process can be chained in any way during a single run. For instance, you may want to concatenate your alignments, filter alignments that exclude a given taxa group, remove alignment columns with high missing data and collapse sequences in unique haplotypes. All in the same execution.

In addition, the output of any individual operation can be saved on a new file that is independent from the main output file. For instance, you can select the concatenation and collapse of your alignments, but save the output of the collapse operation on an second file.

For performance reasons all these operations follow a specific order:

Main operation (Concatenation/Conversion/Reverse concatenation)
Filter

Taxa filter
Minimum taxa representation
Taxa filter
Codon filter
Missing data filter
Sequence variation filter

Collapse
Gap coding
Consensus

The video below demonstrates a single run where 141 alignments are concatenated, collapsed, filtered by taxa group, filtered by minimum taxa representation, filtered by missing data and filtered by sequence variation, and the consensus sequence of the concatenated alignment is also generated in an additional output file.

Jump to multiple Process operations tutorial

Setup and usage of data set groups

Sometimes it may be desired to perform different tasks on different sets of taxa or alignments.

For instance, you may have a data set of nuclear and mitochondrial alignments and wish to create a concatenated alignment for each set of alignments.

Or perhaps your data set includes several taxonomic groups of interest and you want to create multiple concatenated data matrices that maximize the data for those particular groups of taxa.

To make your life easier when these needs arise, TriFusion allows you to create or import groups of taxa and/or alignment and then rapidly switch between them. The video below demonstrates how groups of files and taxa can be quickly created and then how you can perform different tasks on them.

Jump to data set groups tutorial

Save frequently used data sets as projects

If you have one or more data sets that you need to process or analyze frequently, TriFusion offers the possibility of saving any set of files currently loaded into the application as a project, so that they can be quickly loaded on future sessions. This can be useful in the same session as well, as a way of quickly switching between different sets of files.

Jump to Projects tutorial

Set partitions and substitution models

Partitions can be easily defined and changed within TriFusion, imported from partitions files, or specified within the alignment file in the Nexus format. You can also set and modify the substitution model for each partition, which can be useful for several downstream analyses. In the case of nucleotide sequences, codon partitions can also be defined in any possible combination and with independent substitution models.

Jump to Partitions tutorial

Statistics

Quick summary statistics

As soon as you load your alignment data and enter the Statistics screen, calculation of summary statistics will in the background. When completed, you will able to see several statistics related to general, missing data and sequence variation information averaged across the data set. In addition, you can also see these statistics detailed for each input alignment file in a tabular format with sorting and search functions. Both of these formats can be exported into a .csv file that can be opened in any spreadsheet software.

Jump to Statistics tutorial

Plot categories

One of the main challenges of dealing with very large sets of alignment files is the difficulty in getting a good feel for the data, that is, understanding its characteristics and peculiarities.

How much missing data do I have? And sequence variation? How are these metrics distributed across taxa and alignments? Are there any outlier taxa/alignments?

Answering these questions is the main purpose of the Statistics module. There are dozens of plotting options available for you to quickly explore and visualize many aspects of data. You can check the plot gallery to see some examples.

Plotting options

General information

Distribution of sequence size
Proportion of nucleotides or residues
Distribution of taxa frequency

Polymorphism and variation

Pairwise sequence similarity
Segregating sites
Alignment length/polymorphism correlation
Allele frequency spectrum

Missing data

Gene occupancy
Distribution of missing taxa
Distribution of missing data
Cumulative distribution of missing genes

Outlier detection

Missing data outliers
Segregating sites outliers
Sequence size outliers

Plot types

For each of these plotting options, you may have up to three plot types that provide a different focus on the same analysis:

Single gene: Plots the information along the alignment length (Sliding window).
Per species: Plots the information with the taxa along the x-axis or in triangular matrices.
Average: Plots information averaged across the active data set.

Jump to Statistics tutorial

Data visualization focusing on single alignments

Some plotting analyses can be performed on individual analyses and generate a sliding window plot with the variation of a given metric along the length of the alignments. For the moment, these plot types are only provided for some Polymorphism and sequence variation plots plot options

Here's an example of the sliding window analyses for individual alignments.

Jump to Statistics tutorial

Data visualization focusing on taxa

If you are interested in evaluating the impact of that some metric has on individual taxa use the Taxa plot type, when available. This will focus the analysis to give particular focus on taxa by discriminating them along the x-axis, in a triangular matrix or along a distribution.

Here's an example with the generation of some taxa focused plots.

Jump to Statistics tutorial

Data visualization focusing on alignments average

If you are interested in getting data set wide trends or distribution use the Average plot type, when available. These plot types will usually generate the averaged distribution of the metric of interest across the active data set.

Here's an example with the generation of some data set wide distributions.

Jump to Statistics tutorial

Fast plot switching

During the exploration endeavour, it's frequent to switch back and forth among several plots. Since some plots may take a while to be generated, depending on the size of the active data set, several techniques were implemented to allow fast switching among plots that have already been generated. In addition, some computations are stored locally on a SQLite database in order to reduce waiting times when changing the active data set.

Jump to Statistics tutorial

Usage of data set groups

Just like in the Process section, you can create or import groups of taxa/files and switch between them to generate the plots. Simply create your groups of interest and then select the ones you want for any plotting options.

Jump to Statistics tutorial

Export plots as figures or tables

All plots generated in the Statistics module can be exported as a figure (in several graphical formats, including vectorial). Generally, the plot generators were designed to produce high quality and publication ready graphics. However, you may want to tweak, change or add content to these plots. To make life easier for you, you can also export the data used to generate the plot as a .csv files, which can be easily loaded by other graphical libraries and software.

Jump to Statistics tutorial

TriFusion features

Search orthologs across proteomes

Orthology search demo

Explore and filter your orthologs

Orthology explore demo

Export ortholog groups into protein/nucleotide files

Export orthologs demo

Blazing fast alignment conversion/concatenation

Concatenation/Conversion demo

High performance and efficiency

Collapse identical sequences

Collapse alignments demo

Create consensus sequences from alignments

Create consensus demo

Filter alignments by taxa

Filter by taxa demo

Filter alignments by codons

Filter by codons demo

Filter alignments by missing data/genes

Filter by missing data demo

Filter alignments by sequence variation

Filter by variation demo

Code gaps into the alignment matrix

Gap coding demo

Reverse a concatenated alignment

Reverse concatenation demo

Combine multiple operations on a single run

Multiple operations demo

Setup and usage of data set groups

Data set groups demo

Save frequently used data sets as projects

Projects demo

Set partitions and substitution models

Partitions and models demo

Quick summary statistics

Summary statistics demo

Plot categories

Plotting options

Plot types

Statistics categories demo

Data visualization focusing on single alignments

Single gene plots demo

Data visualization focusing on taxa

Taxa focused plots demo

Data visualization focusing on alignments average

Average plots demo

Fast plot switching

Fast plot switching demo

Usage of data set groups

Data set groups demo

Export plots as figures or tables

Export plots demo