trifusion.process.data module

exception trifusion.process.data.InvalidPartitionFile(value)[source]

Bases: exceptions.Exception

exception trifusion.process.data.PartitionException(value)[source]

Bases: exceptions.Exception

class trifusion.process.data.Partitions[source]

Bases: object

Alignment partitions interface for Alignment and AlignmentList.

The Partitions class is used to define partitions for Alignment and AlignmentList objects and associate substitution models to each partition. After instantiating, partitions may be set in two ways:

  • Partition files: Being Nexus charset blocks and RAxML partition files currently supported
  • Tuple-like objects: Containing the ranges and names of the partitions

Attributes

partition_length (int) Length of the total partitions.
partitions (OrderedDict) Storage of partition names (key) and their range (values).
partitions_index (list) The index (starting point) for each partition, including codon partitions.
partitions_alignments (OrderedDict) Storage of the partition names (key) and their corresponding alignment files (values).
alignments_range (OrderedDict) Storage of the alignment names (key) and their range (values).
models (OrderedDict) Storage of partition names (key) and their models (values).
merged_files (dict) Storage of the original range (values) of every alignment file (key).
counter (int) Indicator of where the last partition ended.
partition_format (str) Format of the original partition file, if any.

Methods

add_partition(name[, length, locus_range, ...]) Adds a new partition.
change_name(old_name, new_name) Changes name of a partition.
get_model_name(params) Given a list of parameters, return the name of the model
get_partition_names() Returns a list with the name of the partitions
is_single() Returns whether the current Partitions has single or multiple partitions.
iter_files() Iterates over partitions_alignments.items().
merge_partitions(partition_list, name) Merges multiple partitions into a single one.
parse_nexus_model(string) Parses a substitution model defined in a prset and/or lset command.
read_from_dict(dict_obj) Reads partition information from a dict object
read_from_file(partitions_file) Parses partitions from file
read_from_nexus_string(nx_string[, ...]) Parses a single nexus string with partition definition.
remove_partition([partition_name, file_name]) Removes partitions.
reset([keep_alignments_range]) Clears partitions and attributes
set_length(length) Set total length of current locus (over all partitions).
set_model(partition, models[, links, apply_all]) Sets substitution model for a given partition.
split_partition(name[, new_range, new_names]) Splits one partition into two.
write_to_file(output_format, output_file[, ...]) Writes partitions to a file.
add_partition(name, length=None, locus_range=None, codon=False, use_counter=False, file_name=None, model_cls=None, auto_correct_name=True)[source]

Adds a new partition.

Adds a new partition providing the length or the range of current alignment. If both are provided, the length takes precedence.The range of the partition should be in python index, that is, the first position should be 0 and not 1.

Parameters:

name : str

Name of the partition.

length : int, optional

Length of the alignment.

locus_range : list or tuple, optional

Range of the partition.

codon : list

If the codon partitions are already defined, provide the starting points in list format, e.g: [1,2,3].

use_counter : bool

If True, locus_range will be updated according to the counter attribute.

file_name : str

Name of the alignment file.

model_cls :

Specified the substitution model that will be set in models.

auto_correct_name : bool

If set to True, when a partition name already exist, add a counter to the end of the name.

Notes

IMPORTANT NOTE on self.model: The self.model attribute was designed in a way that allows the storage of different substitution models inside the same partition name. This is useful for codon partitions that share the same parent partition name. So, for example, a parent partition named “PartA” with 3 codon partitions can have a different model for each one like this:

self.models["PartA"] = [[[..model1_params..], [..model2_params..],
    [..model3_params..]], [GTR, GTR, GTR], ["1", "2", "3"]]
alignments_range = None
change_name(old_name, new_name)[source]

Changes name of a partition.

Parameters:

old_name : str

Original partition name.

new_name : str

New partition name.

counter = None

The counter attribute will be used as an indication of where the last partition ends when one or more partitions are added

get_model_name(params)[source]

Given a list of parameters, return the name of the model

Parameters:

p : list

List of prset/lset parameters

Returns:

model : str or None

Returns the name of the model if it finds. Else, returns None.

get_partition_names()[source]

Returns a list with the name of the partitions

Returns:

names : list

List with names of the partitions. When a parent partition has multiple codon partitions, it returns a partition name for every codon starting position present.

is_single()[source]

Returns whether the current Partitions has single or multiple partitions.

Returns:

_ : bool

Returns True is there is only a single partition defined, and False if there are multiple partitions.

iter_files()[source]

Iterates over partitions_alignments.items().

Returns:

_ : iter

Iterator of partitions_alignments.items().

merge_partitions(partition_list, name)[source]

Merges multiple partitions into a single one.

Parameters:

partition_list : list

List with partition names to be merged.

name : str

Name of new partition

merged_files = None

This attribute will keep a record of the original ranges of every file that was merged. This is useful to split partitions according to files or to undo any changes. Each entry should be:

{"alignment_file1": (0, 1234), "alignment_file2": (3444, 6291)}
models = None

The self.models attribute will contain the same key list as self.partitions and will associate the substitution models to each partitions. For each partition, the format should be as follows:

models["partA"] = [[[..model_params..]],[..model_names..],
                   ["12", "3"]]

The first element is a list that may contain the substitution model parameters for up to three subpartitions, the second element is also a list with the corresponding names of the substitution models and the third list will store any links between models

parse_nexus_model(string)[source]

Parses a substitution model defined in a prset and/or lset command.

Parameters:

string : str

String with the prset or lset command.

partition_length = None

The length of the locus may be necessary when partitions are defined in the input files using the ”.” notation, meaning the entire locus. Therefore, to convert this notation into workable integers, the size of the locus must be provided using the set_length method.

partitions = None

partitions will contain the name and range of the partitions for a given alignment object. Both gene and codon partitions will be stored in this attribute, but gene partitions are the main entries. An example of different stored partitions is:

partitions = {"partitionA": ((0, 856), False),
              "partitionB": ((857, 1450), [857,858,859] }

“partitionA” is a simple gene partition ranging from 0 to 856, while “partitionB” is an assembly of codon partitions. The third element of the tuple is destined to codon partitions. If there are none, it should be False. If there are codon partitions, a list should be provided with the desired initial codons. In the example above, “partitionB” has actually 3 partitions starting at the first, second and third sequence nucleotide of the main partition.

partitions_alignments = None

The partitions_alignments attribute will associate the partition with the corresponding alignment files. For single alignment partitions, this will provide information on the file name. For multiple alignments, besides the information of the file names, it will associate which alignments are contained in a given partition and support multi alignment partitions. An example would be:

partitions_alignments = {"PartitionA": ["FileA.fas"],
                         "PartitionB": ["FileB.fas", "FileC.fas"]}
partitions_index = None

partitions_index will remember the index of all added partitions. This attribute was created because codon models are added to the same parent partitions, thus losing their actual index. This is important for Nexus files, where models are applied to the index of the partition. This will simply store the partition names, which can be accessed using their index, or searched to return their index. To better support codon partitions, each entry in the partitions_index will consist in a list, in which the first element is the partition name, and the second element is the index of the subpartition. An example would be:

partitions_index = [["partA", 0], ["partA", 1], ["partA", 2],
                    ["partB", 0]]

in which, partA has 3 codon partitions, and partB has only one partition

read_from_dict(dict_obj)[source]

Reads partition information from a dict object

Parses partitions defined and stored in a special OrderedDict. The values of dict_obj should be the partition names and their corresponding values should contain the loci range and substitution model, if any.

Parameters:

dict_obj : OrderedDict

Ordered dictionary with the definition of the partitions

Examples

Here is an example of a dict_obj:

dict_obj = OrderedDict(("GeneA", [(0,234), "GTR"]),
                       ("GeneB", [(235, 865), "JC"))
read_from_file(partitions_file)[source]

Parses partitions from file

This method parses a file containing partitions. It supports partitions files similar to RAxML’s and NEXUS charset blocks. The NEXUS file, however, must only contain the charset block. The model_nexus argument provides a namespace for the model variable in the nexus format, since this information is not present in the file. However, it assures consistency on the Partition object.

Parameters:

partitions_file : str

Path to partitions file.

Raises:

PartitionException

When one partition definition cannot be parsed.

read_from_nexus_string(nx_string, file_name=None, return_res=False)[source]

Parses a single nexus string with partition definition.

Parameters:

nx_string : str

String with partition definition

file_name : str, optional

String with name of the file corresponding to the partition.

return_res : bool

If True, it will only return the parsed partition information. If False, it will add the parsed partition to the Partitions object.

remove_partition(partition_name=None, file_name=None)[source]

Removes partitions.

Removes a partitions by a given partition or file name. This will handle any necessary changes on the remaining partitions. The changes will be straightforward for most attributes, such as partitions_index, partitions_alignments and models, but it will require a re-structuring of partitions because the ranges of the subsequent partitions will have to be adjusted.

Parameters:

partition_name : str

Name of the partition.

file_name : str

Name of the alignment file.

reset(keep_alignments_range=False)[source]

Clears partitions and attributes

Clears partitions and resets object to __init__ state. The original alignment range can be retained by setting the keep_alignments_range argument to True.

Parameters:

keep_alignments_range : bool

If True, the alignments_range attribute will not be reset.

set_length(length)[source]

Set total length of current locus (over all partitions).

Sets the length of the locus. This may be important to convert certain partition defining nomenclature, such as using the ”.” to indicate whole length of the alignment

Parameters:

length : int

Integer that will be set as partition_length.

set_model(partition, models, links=None, apply_all=False)[source]

Sets substitution model for a given partition.

Parameters:

partition : str

Partition name.

models : list

Model names for each of the three codon partitions. If there are no codon partitions, provide only a single element to the list.

links : list

Provide potential links between codon models. For example, if codon 1 and 2 are to be linked, it should be: links=[“12”, “3”]

apply_all : bool

If True, the current model will be applied to all partitions.

split_partition(name, new_range=None, new_names=None)[source]

Splits one partition into two.

Splits a partitions with name into two with the tuple list provided by new_range. If new_range is None, This will split the partition by its alignment files instead.

Parameters:

name : str

Name of the partition to be split.

new_range : list or tuple, optional

List of two tuples, containing the ranges of the new partitions.

new_names : list, optional

The names of the new partitions.

write_to_file(output_format, output_file, model='LG')[source]

Writes partitions to a file.

Writes the Partitions object into an output file according to the output_format. The supported output formats are RAxML and Nexus. The model option is for the RAxML format only.

Parameters:

output_format : str

Output format of partitions file. Can be either “nexus” or “raxml”.

output_file : str

Path to output file.

model : str

Name of the model for the partitions. “raxml” format only.

class trifusion.process.data.Zorro(alignment_list, suffix='_zorro.out', zorro_dir=None)[source]

Bases: object

Class that handles the concatenation of zorro weights.

Parameters:

alignment_list : trifusion.process.sequence.AlignmentList

AlignmentList object.

suffix : str

Suffix of the zorro weight files, based on the corresponding input alignments.

zorro_dir : str

Path to directory where zorro weight files are stored.

Methods

write_to_file(output_file) Creates a concatenated file with the zorro weights for the corresponding alignment files.
write_to_file(output_file)[source]

Creates a concatenated file with the zorro weights for the corresponding alignment files.