trifusion.ortho.OrthomclToolbox module¶
-
class
trifusion.ortho.OrthomclToolbox.
Cluster
(line_string)[source]¶ Bases:
object
Object for clusters of the OrthoMCL groups file. It is useful to set a number of attributes that will make subsequent filtration and processing much easier
Methods
apply_filter
(gene_threshold, species_threshold)This method will update two Cluster attributes, self.gene_flag and parse_string
(cluster_string)Parses the string and sets the group name and sequence list attributes remove_taxa
(taxa_list)Removes the taxa contained in taxa_list from self.sequences and -
apply_filter
(gene_threshold, species_threshold)[source]¶ This method will update two Cluster attributes, self.gene_flag and self.species_flag, which will inform downstream objects if this cluster respects the gene and species threshold :param gene_threshold: Integer for the maximum number of gene copies per species :param species_threshold: Integer for the minimum number of species present
-
-
class
trifusion.ortho.OrthomclToolbox.
Group
(groups_file, gene_threshold=None, species_threshold=None, project_prefix='MyGroups')[source]¶ Bases:
object
This represents the main object of the orthomcl toolbox module. It is initialized with a file name of a orthomcl groups file and provides several methods that act on that group file. To process multiple Group objects, see MultiGroups object
Methods
bar_genecopy_distribution
([dest, filt, ...])Creates a bar plot with the distribution of gene copies across clusters :param dest: string, destination directory :param filt: Boolean, whether or not to use the filtered groups. bar_species_coverage
([dest, filt, ns, ...])Creates a stacked bar plot with the proportion of bar_species_distribution
([dest, filt, ns, ...])Creates a bar plot with the distribution of species numbers across clusters :param dest: string, destination directory :param filt: Boolean, whether or not to use the filtered groups. basic_group_statistics
()This method creates a basic table in list format containing basic exclude_taxa
(taxa_list)Adds a taxon_name to the excluded_taxa list and updates the export_filtered_group
([output_file_name, ...])Export the filtered groups into a new file. get_filters
()Returns a tuple with the thresholds for max gene copies and min species paralog_per_species_statistic
([...])This method creates a CSV table with information on the number of paralog clusters per species :param output_file_name: string. retrieve_sequences
(database[, dest, mode, ...])When provided with a database in Fasta format, this will use the Alignment object to retrieve sequences :param database: String. update_filtered_group
()This method creates a new filtered group variable, like update_filters
(gn_filter, sp_filter)Sets new values for the self.species_threshold and self.gene_threshold and updates the filtered_group :param gn_filter: int. -
bar_genecopy_distribution
(dest='./', filt=False, output_file_name='Gene_copy_distribution.png')[source]¶ Creates a bar plot with the distribution of gene copies across clusters :param dest: string, destination directory :param filt: Boolean, whether or not to use the filtered groups. :param output_file_name: string, name of the output file
-
bar_species_coverage
(dest='./', filt=False, ns=None, output_file_name='Species_coverage')[source]¶ Creates a stacked bar plot with the proportion of :return:
-
bar_species_distribution
(dest='./', filt=False, ns=None, output_file_name='Species_distribution')[source]¶ Creates a bar plot with the distribution of species numbers across clusters :param dest: string, destination directory :param filt: Boolean, whether or not to use the filtered groups. :param output_file_name: string, name of the output file
-
basic_group_statistics
()[source]¶ This method creates a basic table in list format containing basic information of the groups file (total number of clusters, total number of sequences, number of clusters below the gene threshold, number of clusters below the species threshold and number of clusters below the gene AND species threshold) :return: List containing number of
- [total clusters,
- total sequences, clusters above gene threshold, clusters above species threshold, clusters above gene and species threshold]
-
exclude_taxa
(taxa_list)[source]¶ Adds a taxon_name to the excluded_taxa list and updates the filtered_groups list
-
export_filtered_group
(output_file_name='filtered_groups', dest='./', get_stats=False, shared_namespace=None)[source]¶ Export the filtered groups into a new file. :param output_file_name: string, name of the filtered groups file :param dest: string, path to directory where the filtered groups file will be created :param get_stats: Boolean, whether to return the basic count stats or not :param shared_namespace: Namespace object, for communicating with main process.
-
paralog_per_species_statistic
(output_file_name='Paralog_per_species.csv', filt=True)[source]¶ This method creates a CSV table with information on the number of paralog clusters per species :param output_file_name: string. Name of the output csv file :param filt: Boolean. Whether to use the filtered groups (True) or total groups (False)
-
retrieve_sequences
(database, dest='./', mode='fasta', filt=True, shared_namespace=None)[source]¶ When provided with a database in Fasta format, this will use the Alignment object to retrieve sequences :param database: String. Fasta file :param dest: directory where files will be save :param mode: string, whether to retrieve sequences to a file (‘fasta’), or a dictionary (‘dict’) :param filt: Boolean. Whether to use the filtered groups (True) or total groups (False) :param shared_namespace: Namespace object. This argument is meant for when fast are retrieved in a background process, where there is a need to update the main process of the changes in this method :param dest: string. Path to directory where the retrieved sequences will be created.
-
-
class
trifusion.ortho.OrthomclToolbox.
GroupLight
(groups_file, gene_threshold=None, species_threshold=None, ns=None)[source]¶ Bases:
object
Analogous to Group object but with several changes to reduce memory usage
Methods
bar_genecopy_distribution
([filt])Creates a bar plot with the distribution of gene copies across clusters :param filt: Boolean, whether or not to use the filtered groups. bar_genecopy_per_species
([filt])bar_species_coverage
([filt])Creates a stacked bar plot with the proportion of bar_species_distribution
([filt])basic_group_statistics
([update_stats])exclude_taxa
(taxa_list[, update_stats])Updates the excluded_taxa attribute and updates group statistics if update_stats is True. export_filtered_group
([output_file_name, ...])groups
()Generator for group file. iter_species_frequency
()In order to prevent permanent changes to the species_frequency attribute due to the filtering of taxa, this iterable should be used instead of the said variable. retrieve_sequences
(sqldb, protein_db[, ...])param sqldb: srting. Path to sqlite database file update_filters
(gn_filter, sp_filter[, ...])Updates the group filter attributes and group summary stats if update_stats is True. -
bar_genecopy_distribution
(filt=False)[source]¶ Creates a bar plot with the distribution of gene copies across clusters :param filt: Boolean, whether or not to use the filtered groups.
-
bar_species_coverage
(filt=False)[source]¶ Creates a stacked bar plot with the proportion of :return:
-
exclude_taxa
(taxa_list, update_stats=False)[source]¶ Updates the excluded_taxa attribute and updates group statistics if update_stats is True. This does not change the Group object data permanently, only sets an attribute that will be taken into account when plotting and exporting data. :param taxa_list: list. List of taxa that should be excluded from downstream operations :param update_stats: boolean. If True, it will update the group statistics
-
export_filtered_group
(output_file_name='filtered_groups', dest='./', shared_namespace=None)[source]¶
-
groups
()[source]¶ Generator for group file. This replaces the self.groups attribute of the original Group Object. Instead of loading the whole file into memory, a generator is created to iterate over its contents. It may run a bit slower but its a lot more memory efficient. :return:
-
iter_species_frequency
()[source]¶ In order to prevent permanent changes to the species_frequency attribute due to the filtering of taxa, this iterable should be used instead of the said variable. This creates a temporary deepcopy of species_frequency which will be iterated over and eventually modified.
-
retrieve_sequences
(sqldb, protein_db, dest='./', shared_namespace=None, outfile=None)[source]¶ Parameters: - sqldb – srting. Path to sqlite database file
- protein_db – string. Path to protein database file
- dest – string. Directory where sequences will be exported
- shared_namespace – Namespace object to communicate with
TriFusion’s main process :param outfile: If set, all sequeces will be instead saved in a single output file. This is used for the nucleotide sequence export :return:
-
update_filters
(gn_filter, sp_filter, update_stats=False)[source]¶ Updates the group filter attributes and group summary stats if update_stats is True. This method does not change the data of the Group object, only sets attributes that will be taken into account when plotting or exporting data :param gn_filter: integer. Maximum number of gene copies allowed in an ortholog cluster :param sp_filter: integer/float. Minimum number/proportion of taxa representation :param update_stats: boolean. If True it will update the group summary statistics
-
-
class
trifusion.ortho.OrthomclToolbox.
MultiGroups
(groups_files=None, gene_threshold=None, species_threshold=None, project_prefix='MyGroups')[source]¶ Bases:
object
Creates an object composed of multiple Group objects
Methods
add_group
(group_obj)Adds a group object add_multigroups
(multigroup_obj)Merges a MultiGroup object bar_orthologs
([output_file_name, dest, stats])Creates a bar plot with the final ortholog values for each group file :param output_file_name: string. basic_multigroup_statistics
([output_file_name])param output_file_name: get_gnames
()get_group
(group_id)Returns a group object based on its name. group_overlap
()This will find the overlap of orthologs between two group files. iter_gnames
()remove_group
(group_id)Removes a group object according to its name update_filters
(gn_filter, sp_filter[, ...])This will not change the Group object themselves, only the filter mapping. -
add_multigroups
(multigroup_obj)[source]¶ Merges a MultiGroup object :param multigroup_obj: MultiGroup object
-
bar_orthologs
(output_file_name='Final_orthologs', dest='./', stats='total')[source]¶ Creates a bar plot with the final ortholog values for each group file :param output_file_name: string. Name of output file :param dest: string. output directory :param stats: string. The statistics that should be used to generate the bar plot. Options are:
..: “1”: Total orthologs ..: “2”: Species compliant orthologs ..: “3”: Gene compliant orthologs ..: “4”: Final orthologs ..: “all”: All of the above Multiple combinations can be provided, for instance: “123” will display bars for total, species compliant and gene compliant stats
-
basic_multigroup_statistics
(output_file_name='multigroup_base_statistics.csv')[source]¶ Parameters: output_file_name – Returns:
-
get_group
(group_id)[source]¶ Returns a group object based on its name. If the name does not match any group object, returns None :param group_id: string. Name of group object
-
group_overlap
()[source]¶ This will find the overlap of orthologs between two group files. THIS METHOD IS TEMPORARY AND EXPERIMENTAL
-
remove_group
(group_id)[source]¶ Removes a group object according to its name :param group_id: string, name matching a Group object name attribute
-
update_filters
(gn_filter, sp_filter, group_names=None, default=False)[source]¶ This will not change the Group object themselves, only the filter mapping. The filter is only applied when the Group object is retrieved to reduce computations :param gn_filter: int, filter for max gene copies :param sp_filter: int, filter for min species :param group_names: list, with names of group objects
-
-
class
trifusion.ortho.OrthomclToolbox.
MultiGroupsLight
(db_path, groups=None, gene_threshold=None, species_threshold=None, project_prefix='MyGroups', ns=None)[source]¶ Bases:
object
Creates an object composed of multiple Group objects like MultiGroups. However, instead of storing the groups in memory, these are shelved in the disk
Methods
add_group
(group_obj)Adds a group object add_multigroups
(multigroup_obj)Merges a MultiGroup object bar_orthologs
([group_names, ...])Creates a bar plot with the final ortholog values for each group file :param group_names: list. clear_groups
()Clears the current MultiGroupsLight object get_group
(group_id)Returns a group object based on its name. get_multigroup_statistics
(group_obj)return: remove_group
(group_id)Removes a group object according to its name update_filters
(gn_filter, sp_filter, ...[, ...])This will not change the Group object themselves, only the filter mapping. -
add_multigroups
(multigroup_obj)[source]¶ Merges a MultiGroup object :param multigroup_obj: MultiGroup object
-
bar_orthologs
(group_names=None, output_file_name='Final_orthologs', dest='./', stats='all')[source]¶ Creates a bar plot with the final ortholog values for each group file :param group_names: list. If None, all groups in self.group_stats will be used to generate the plot. Else, only the groups with the names in the list will be plotted. :param output_file_name: string. Name of output file :param dest: string. output directory :param stats: string. The statistics that should be used to generate the bar plot. Options are:
..: “1”: Total orthologs ..: “2”: Species compliant orthologs ..: “3”: Gene compliant orthologs ..: “4”: Final orthologs ..: “all”: All of the above Multiple combinations can be provided, for instance: “123” will display bars for total, species compliant and gene compliant stats
-
calls
= ['bar_genecopy_distribution', 'bar_species_distribution', 'bar_species_coverage', 'bar_genecopy_per_species']¶
-
get_group
(group_id)[source]¶ Returns a group object based on its name. If the name does not match any group object, returns None :param group_id: string. Name of group object
-
remove_group
(group_id)[source]¶ Removes a group object according to its name :param group_id: string, name matching a Group object name attribute
-
update_filters
(gn_filter, sp_filter, excluded_taxa, group_names=None, default=False)[source]¶ This will not change the Group object themselves, only the filter mapping. The filter is only applied when the Group object is retrieved to reduce computations
Parameters: - gn_filter – int, filter for max gene copies
- sp_filter – int, filter for min species
- group_names – list, with names of group objects
-