trifusion.process.base module¶
The base module includes the Base class, which is inherited by Alignment and AlignmentList classes and provides several methods of general use.
It also defines the CleanUp decorator used by TriSeq and TriStats to handle the generation of temporary data during their execution as well as keyboard interruptions.
-
class
trifusion.process.base.
Base
[source]¶ Bases:
object
Methods
autofinder
(reference_file)Autodetects format, missing data symbol and sequence type. duplicate_taxa
(taxa_list)Identified duplicate items in a list. get_loci_taxa
(loci_file)Get the list of taxa from a .loci file. guess_code
(sequence)Guess the sequence type, i.e. read_basic_csv
(file_handle)Reads a basic CSV into a list. rm_illegal
(taxon_string)Removes illegal characters from taxon name. -
autofinder
(reference_file)[source]¶ Autodetects format, missing data symbol and sequence type.
Attempts to find the file format, missing data symbol and sequence type from a reference file. Due to performance reasons, this function will only read the referenced_file until it finds the first sequence. Then, it evaluates the file format and sequence type. It also attempts to get the missing data symbol, but the first sequence may have no missing data. In that case, the missing data symbol will remain undefined and will be re-evaluated during alignment parsing.
Parameters: reference_file : str
Path to sequence file
Returns: fmt : str
File format of reference_file
code : tuple
The sequence type as string in first element and missing data symbol as string in the second element
See also
-
static
duplicate_taxa
(taxa_list)[source]¶ Identified duplicate items in a list.
Used to detect, for instance, duplicated taxa in the Alignment object
Parameters: taxa_list : list
List with taxon names
Returns: duplicated_taxa : list
List with duplicated taxa from taxa_list
-
static
get_loci_taxa
(loci_file)[source]¶ Get the list of taxa from a .loci file.
This is required prior to parsing the alignment in order to correctly add missing data when certain taxa are not present in a locus
Parameters: loci_file : str
Path to the .loci file.
Returns: taxa_list : list
List of the taxon names as strings.
-
static
guess_code
(sequence)[source]¶ Guess the sequence type, i.e. protein.
Guesses the code of the provided sequence, that is, if it is a DNA or Protein sequence based on the first sequence of the reference file. The second item of the returned code variable is a placeholder for the missing data symbol (can be either ”?”, “n” or “x”). This symbol will be evaluated during the Alignment parsing.
Parameters: sequence : string
Sequence string that will be used to guess the type
Returns: code : list
Provides the (<sequence type>, <missing symbol character>). sequence type can be either “DNA” or “Protein”. The missing symbol character may be ”?”, “n”, “x” or None (See Notes).
See also
autofinder
,process.sequence.Alignment.read_alignment
Notes
The missing symbol character that is provided in the returned code tuple may be None if the missing character cannot be determined from the first reference sequence that is used to guess the sequence type. If none of the potential missing data symbol is present in this reference sequence, it will remain undetermined and will be evaluated during the alignment parsing.
-
static
read_basic_csv
(file_handle)[source]¶ Reads a basic CSV into a list.
Parses a simples CSV file with only one column and one or more lines while stripping whitespace.
Parameters: file_handle : file object
File object of the CSV file
Returns: result : list
Contents of the CSV file with each line in a different entry
-
-
class
trifusion.process.base.
CleanUp
(func)[source]¶ Bases:
object
Decorator class that handles temporary data for TriSeq and TriStats.
This decorator class wraps the main execution functions of TriSeq and TriStats programs. The __init__ requires only the function reference, defines the name of the temporary directory and evaluates whether the provided func is from TriSeq or TriStats. The only requirement of func is that its first argument is the argparser namespace (that is, the arguments must be parsed in the corresponding program before calling the main function).
Parameters: func : function
Main function of TriSeq or TriStats
See also
Attributes
func (function) Main function of TriSeq or TriStats temp_dir (str) Path to temporary directory where the temporary data will be stored idx (int) Integer identifier of the main function’s program. 0 for TriSeq and 2 for TriStats Methods
__call__
(*args)Wraps the call of func. -
__call__
(*args)[source]¶ Wraps the call of func.
When the main func is called, this code is wrapped around its execution. This mainly ensures that the temporary data stored in temp_dir is correctly removed at the end of the execution, whether it terminals successfully or not. It also clocks the duration of the execution.
Parameters: args : list
Arbitrary list of positional arguments of func. The only requirement is that the first element is the argparse namespace object.
-
-
trifusion.process.base.
merger
(ranges)[source]¶ Generator that merges continuous ranges of tuples in a list.
Parameters: ranges : list
List of tuples, each with two integer elements defining a range, (0, 100) for instance.
-
trifusion.process.base.
print_col
(text, color, i=0, quiet=False)[source]¶ Custom print function for terminal updates of CLI programs.
This print function homogenizes the progress logging of all CLI programs while providing some freedom on the formatting of the progress message, name the colors of the messages. A list of terminal colors is provided and a has_colors function checks whether they are supported on the terminal. The colors in use are green for normal logging, yellow for warnings and red for errors. The final formatting of the message is something like:
[<Program Name>[-Error/Warning]] <message>
Parameters: text : str
The message that will appear in the terminal
color : variable reference
Reference to the terminal colors defined in process.base. Provided that they are imported in the module where they are being called, the options are: {GREEN, YELLOW, RED}
i : int
Integer that specified which CLI program is being logged. The options are: 0: TriSeq 1: OrthoMCL Pipeline 2: TriStats 3: TriOrtho
quiet : bool
Determines whether the message is logged. If True, no messages are printed to the terminal