trifusion.process.data module¶
-
class
trifusion.process.data.
Partitions
[source]¶ Bases:
object
Alignment partitions interface for Alignment and AlignmentList.
The Partitions class is used to define partitions for Alignment and AlignmentList objects and associate substitution models to each partition. After instantiating, partitions may be set in two ways:
- Partition files: Being Nexus charset blocks and RAxML partition files currently supported
- Tuple-like objects: Containing the ranges and names of the partitions
Attributes
partition_length (int) Length of the total partitions. partitions (OrderedDict) Storage of partition names (key) and their range (values). partitions_index (list) The index (starting point) for each partition, including codon partitions. partitions_alignments (OrderedDict) Storage of the partition names (key) and their corresponding alignment files (values). alignments_range (OrderedDict) Storage of the alignment names (key) and their range (values). models (OrderedDict) Storage of partition names (key) and their models (values). merged_files (dict) Storage of the original range (values) of every alignment file (key). counter (int) Indicator of where the last partition ended. partition_format (str) Format of the original partition file, if any. Methods
add_partition
(name[, length, locus_range, ...])Adds a new partition. change_name
(old_name, new_name)Changes name of a partition. get_model_name
(params)Given a list of parameters, return the name of the model get_partition_names
()Returns a list with the name of the partitions is_single
()Returns whether the current Partitions has single or multiple partitions. iter_files
()Iterates over partitions_alignments.items(). merge_partitions
(partition_list, name)Merges multiple partitions into a single one. parse_nexus_model
(string)Parses a substitution model defined in a prset and/or lset command. read_from_dict
(dict_obj)Reads partition information from a dict object read_from_file
(partitions_file)Parses partitions from file read_from_nexus_string
(nx_string[, ...])Parses a single nexus string with partition definition. remove_partition
([partition_name, file_name])Removes partitions. reset
([keep_alignments_range])Clears partitions and attributes set_length
(length)Set total length of current locus (over all partitions). set_model
(partition, models[, links, apply_all])Sets substitution model for a given partition. split_partition
(name[, new_range, new_names])Splits one partition into two. write_to_file
(output_format, output_file[, ...])Writes partitions to a file. -
add_partition
(name, length=None, locus_range=None, codon=False, use_counter=False, file_name=None, model_cls=None, auto_correct_name=True)[source]¶ Adds a new partition.
Adds a new partition providing the length or the range of current alignment. If both are provided, the length takes precedence.The range of the partition should be in python index, that is, the first position should be 0 and not 1.
Parameters: name : str
Name of the partition.
length : int, optional
Length of the alignment.
locus_range : list or tuple, optional
Range of the partition.
codon : list
If the codon partitions are already defined, provide the starting points in list format, e.g: [1,2,3].
use_counter : bool
If True, locus_range will be updated according to the counter attribute.
file_name : str
Name of the alignment file.
model_cls :
Specified the substitution model that will be set in models.
auto_correct_name : bool
If set to True, when a partition name already exist, add a counter to the end of the name.
Notes
IMPORTANT NOTE on self.model: The self.model attribute was designed in a way that allows the storage of different substitution models inside the same partition name. This is useful for codon partitions that share the same parent partition name. So, for example, a parent partition named “PartA” with 3 codon partitions can have a different model for each one like this:
self.models["PartA"] = [[[..model1_params..], [..model2_params..], [..model3_params..]], [GTR, GTR, GTR], ["1", "2", "3"]]
-
alignments_range
= None¶
-
change_name
(old_name, new_name)[source]¶ Changes name of a partition.
Parameters: old_name : str
Original partition name.
new_name : str
New partition name.
-
counter
= None¶ The counter attribute will be used as an indication of where the last partition ends when one or more partitions are added
-
get_model_name
(params)[source]¶ Given a list of parameters, return the name of the model
Parameters: p : list
List of prset/lset parameters
Returns: model : str or None
Returns the name of the model if it finds. Else, returns None.
-
get_partition_names
()[source]¶ Returns a list with the name of the partitions
Returns: names : list
List with names of the partitions. When a parent partition has multiple codon partitions, it returns a partition name for every codon starting position present.
-
is_single
()[source]¶ Returns whether the current Partitions has single or multiple partitions.
Returns: _ : bool
Returns True is there is only a single partition defined, and False if there are multiple partitions.
-
iter_files
()[source]¶ Iterates over partitions_alignments.items().
Returns: _ : iter
Iterator of partitions_alignments.items().
-
merge_partitions
(partition_list, name)[source]¶ Merges multiple partitions into a single one.
Parameters: partition_list : list
List with partition names to be merged.
name : str
Name of new partition
-
merged_files
= None¶ This attribute will keep a record of the original ranges of every file that was merged. This is useful to split partitions according to files or to undo any changes. Each entry should be:
{"alignment_file1": (0, 1234), "alignment_file2": (3444, 6291)}
-
models
= None¶ The self.models attribute will contain the same key list as self.partitions and will associate the substitution models to each partitions. For each partition, the format should be as follows:
models["partA"] = [[[..model_params..]],[..model_names..], ["12", "3"]]
The first element is a list that may contain the substitution model parameters for up to three subpartitions, the second element is also a list with the corresponding names of the substitution models and the third list will store any links between models
-
parse_nexus_model
(string)[source]¶ Parses a substitution model defined in a prset and/or lset command.
Parameters: string : str
String with the prset or lset command.
-
partition_length
= None¶ The length of the locus may be necessary when partitions are defined in the input files using the ”.” notation, meaning the entire locus. Therefore, to convert this notation into workable integers, the size of the locus must be provided using the set_length method.
-
partitions
= None¶ partitions will contain the name and range of the partitions for a given alignment object. Both gene and codon partitions will be stored in this attribute, but gene partitions are the main entries. An example of different stored partitions is:
partitions = {"partitionA": ((0, 856), False), "partitionB": ((857, 1450), [857,858,859] }
“partitionA” is a simple gene partition ranging from 0 to 856, while “partitionB” is an assembly of codon partitions. The third element of the tuple is destined to codon partitions. If there are none, it should be False. If there are codon partitions, a list should be provided with the desired initial codons. In the example above, “partitionB” has actually 3 partitions starting at the first, second and third sequence nucleotide of the main partition.
-
partitions_alignments
= None¶ The partitions_alignments attribute will associate the partition with the corresponding alignment files. For single alignment partitions, this will provide information on the file name. For multiple alignments, besides the information of the file names, it will associate which alignments are contained in a given partition and support multi alignment partitions. An example would be:
partitions_alignments = {"PartitionA": ["FileA.fas"], "PartitionB": ["FileB.fas", "FileC.fas"]}
-
partitions_index
= None¶ partitions_index will remember the index of all added partitions. This attribute was created because codon models are added to the same parent partitions, thus losing their actual index. This is important for Nexus files, where models are applied to the index of the partition. This will simply store the partition names, which can be accessed using their index, or searched to return their index. To better support codon partitions, each entry in the partitions_index will consist in a list, in which the first element is the partition name, and the second element is the index of the subpartition. An example would be:
partitions_index = [["partA", 0], ["partA", 1], ["partA", 2], ["partB", 0]]
in which, partA has 3 codon partitions, and partB has only one partition
-
read_from_dict
(dict_obj)[source]¶ Reads partition information from a dict object
Parses partitions defined and stored in a special OrderedDict. The values of dict_obj should be the partition names and their corresponding values should contain the loci range and substitution model, if any.
Parameters: dict_obj : OrderedDict
Ordered dictionary with the definition of the partitions
Examples
Here is an example of a dict_obj:
dict_obj = OrderedDict(("GeneA", [(0,234), "GTR"]), ("GeneB", [(235, 865), "JC"))
-
read_from_file
(partitions_file)[source]¶ Parses partitions from file
This method parses a file containing partitions. It supports partitions files similar to RAxML’s and NEXUS charset blocks. The NEXUS file, however, must only contain the charset block. The model_nexus argument provides a namespace for the model variable in the nexus format, since this information is not present in the file. However, it assures consistency on the Partition object.
Parameters: partitions_file : str
Path to partitions file.
Raises: PartitionException
When one partition definition cannot be parsed.
-
read_from_nexus_string
(nx_string, file_name=None, return_res=False)[source]¶ Parses a single nexus string with partition definition.
Parameters: nx_string : str
String with partition definition
file_name : str, optional
String with name of the file corresponding to the partition.
return_res : bool
If True, it will only return the parsed partition information. If False, it will add the parsed partition to the Partitions object.
-
remove_partition
(partition_name=None, file_name=None)[source]¶ Removes partitions.
Removes a partitions by a given partition or file name. This will handle any necessary changes on the remaining partitions. The changes will be straightforward for most attributes, such as partitions_index, partitions_alignments and models, but it will require a re-structuring of partitions because the ranges of the subsequent partitions will have to be adjusted.
Parameters: partition_name : str
Name of the partition.
file_name : str
Name of the alignment file.
-
reset
(keep_alignments_range=False)[source]¶ Clears partitions and attributes
Clears partitions and resets object to __init__ state. The original alignment range can be retained by setting the keep_alignments_range argument to True.
Parameters: keep_alignments_range : bool
If True, the alignments_range attribute will not be reset.
-
set_length
(length)[source]¶ Set total length of current locus (over all partitions).
Sets the length of the locus. This may be important to convert certain partition defining nomenclature, such as using the ”.” to indicate whole length of the alignment
Parameters: length : int
Integer that will be set as partition_length.
-
set_model
(partition, models, links=None, apply_all=False)[source]¶ Sets substitution model for a given partition.
Parameters: partition : str
Partition name.
models : list
Model names for each of the three codon partitions. If there are no codon partitions, provide only a single element to the list.
links : list
Provide potential links between codon models. For example, if codon 1 and 2 are to be linked, it should be: links=[“12”, “3”]
apply_all : bool
If True, the current model will be applied to all partitions.
-
split_partition
(name, new_range=None, new_names=None)[source]¶ Splits one partition into two.
Splits a partitions with name into two with the tuple list provided by new_range. If new_range is None, This will split the partition by its alignment files instead.
Parameters: name : str
Name of the partition to be split.
new_range : list or tuple, optional
List of two tuples, containing the ranges of the new partitions.
new_names : list, optional
The names of the new partitions.
-
write_to_file
(output_format, output_file, model='LG')[source]¶ Writes partitions to a file.
Writes the Partitions object into an output file according to the output_format. The supported output formats are RAxML and Nexus. The model option is for the RAxML format only.
Parameters: output_format : str
Output format of partitions file. Can be either “nexus” or “raxml”.
output_file : str
Path to output file.
model : str
Name of the model for the partitions. “raxml” format only.
-
class
trifusion.process.data.
Zorro
(alignment_list, suffix='_zorro.out', zorro_dir=None)[source]¶ Bases:
object
Class that handles the concatenation of zorro weights.
Parameters: alignment_list : trifusion.process.sequence.AlignmentList
AlignmentList object.
suffix : str
Suffix of the zorro weight files, based on the corresponding input alignments.
zorro_dir : str
Path to directory where zorro weight files are stored.
Methods
write_to_file
(output_file)Creates a concatenated file with the zorro weights for the corresponding alignment files.