trifusion.process.base module¶
The base module includes the Base class, which is inherited by Alignment and AlignmentList classes and provides several methods of general use.
It also defines the CleanUp decorator used by TriSeq and TriStats to handle the generation of temporary data during their execution as well as keyboard interruptions.
- 
class trifusion.process.base.Base[source]¶
- Bases: - object- Methods - autofinder(reference_file)- Autodetects format, missing data symbol and sequence type. - duplicate_taxa(taxa_list)- Identified duplicate items in a list. - get_loci_taxa(loci_file)- Get the list of taxa from a .loci file. - guess_code(sequence)- Guess the sequence type, i.e. - read_basic_csv(file_handle)- Reads a basic CSV into a list. - rm_illegal(taxon_string)- Removes illegal characters from taxon name. - 
autofinder(reference_file)[source]¶
- Autodetects format, missing data symbol and sequence type. - Attempts to find the file format, missing data symbol and sequence type from a reference file. Due to performance reasons, this function will only read the referenced_file until it finds the first sequence. Then, it evaluates the file format and sequence type. It also attempts to get the missing data symbol, but the first sequence may have no missing data. In that case, the missing data symbol will remain undefined and will be re-evaluated during alignment parsing. - Parameters: - reference_file : str - Path to sequence file - Returns: - fmt : str - File format of reference_file - code : tuple - The sequence type as string in first element and missing data symbol as string in the second element - See also 
 - 
static duplicate_taxa(taxa_list)[source]¶
- Identified duplicate items in a list. - Used to detect, for instance, duplicated taxa in the Alignment object - Parameters: - taxa_list : list - List with taxon names - Returns: - duplicated_taxa : list - List with duplicated taxa from taxa_list 
 - 
static get_loci_taxa(loci_file)[source]¶
- Get the list of taxa from a .loci file. - This is required prior to parsing the alignment in order to correctly add missing data when certain taxa are not present in a locus - Parameters: - loci_file : str - Path to the .loci file. - Returns: - taxa_list : list - List of the taxon names as strings. 
 - 
static guess_code(sequence)[source]¶
- Guess the sequence type, i.e. protein. - Guesses the code of the provided sequence, that is, if it is a DNA or Protein sequence based on the first sequence of the reference file. The second item of the returned code variable is a placeholder for the missing data symbol (can be either ”?”, “n” or “x”). This symbol will be evaluated during the Alignment parsing. - Parameters: - sequence : string - Sequence string that will be used to guess the type - Returns: - code : list - Provides the (<sequence type>, <missing symbol character>). sequence type can be either “DNA” or “Protein”. The missing symbol character may be ”?”, “n”, “x” or None (See Notes). - See also - autofinder,- process.sequence.Alignment.read_alignment- Notes - The missing symbol character that is provided in the returned code tuple may be None if the missing character cannot be determined from the first reference sequence that is used to guess the sequence type. If none of the potential missing data symbol is present in this reference sequence, it will remain undetermined and will be evaluated during the alignment parsing. 
 - 
static read_basic_csv(file_handle)[source]¶
- Reads a basic CSV into a list. - Parses a simples CSV file with only one column and one or more lines while stripping whitespace. - Parameters: - file_handle : file object - File object of the CSV file - Returns: - result : list - Contents of the CSV file with each line in a different entry 
 
- 
- 
class trifusion.process.base.CleanUp(func)[source]¶
- Bases: - object- Decorator class that handles temporary data for TriSeq and TriStats. - This decorator class wraps the main execution functions of TriSeq and TriStats programs. The __init__ requires only the function reference, defines the name of the temporary directory and evaluates whether the provided func is from TriSeq or TriStats. The only requirement of func is that its first argument is the argparser namespace (that is, the arguments must be parsed in the corresponding program before calling the main function). - Parameters: - func : function - Main function of TriSeq or TriStats - See also - Attributes - func - (function) Main function of TriSeq or TriStats - temp_dir - (str) Path to temporary directory where the temporary data will be stored - idx - (int) Integer identifier of the main function’s program. 0 for TriSeq and 2 for TriStats - Methods - __call__(*args)- Wraps the call of func. - 
__call__(*args)[source]¶
- Wraps the call of func. - When the main func is called, this code is wrapped around its execution. This mainly ensures that the temporary data stored in temp_dir is correctly removed at the end of the execution, whether it terminals successfully or not. It also clocks the duration of the execution. - Parameters: - args : list - Arbitrary list of positional arguments of func. The only requirement is that the first element is the argparse namespace object. 
 
- 
- 
trifusion.process.base.merger(ranges)[source]¶
- Generator that merges continuous ranges of tuples in a list. - Parameters: - ranges : list - List of tuples, each with two integer elements defining a range, (0, 100) for instance. 
- 
trifusion.process.base.print_col(text, color, i=0, quiet=False)[source]¶
- Custom print function for terminal updates of CLI programs. - This print function homogenizes the progress logging of all CLI programs while providing some freedom on the formatting of the progress message, name the colors of the messages. A list of terminal colors is provided and a has_colors function checks whether they are supported on the terminal. The colors in use are green for normal logging, yellow for warnings and red for errors. The final formatting of the message is something like: - [<Program Name>[-Error/Warning]] <message> - Parameters: - text : str - The message that will appear in the terminal - color : variable reference - Reference to the terminal colors defined in process.base. Provided that they are imported in the module where they are being called, the options are: {GREEN, YELLOW, RED} - i : int - Integer that specified which CLI program is being logged. The options are: 0: TriSeq 1: OrthoMCL Pipeline 2: TriStats 3: TriOrtho - quiet : bool - Determines whether the message is logged. If True, no messages are printed to the terminal