trifusion.process.base module

The base module includes the Base class, which is inherited by Alignment and AlignmentList classes and provides several methods of general use.

It also defines the CleanUp decorator used by TriSeq and TriStats to handle the generation of temporary data during their execution as well as keyboard interruptions.

class trifusion.process.base.Base[source]

Bases: object

Methods

autofinder(reference_file) Autodetects format, missing data symbol and sequence type.
duplicate_taxa(taxa_list) Identified duplicate items in a list.
get_loci_taxa(loci_file) Get the list of taxa from a .loci file.
guess_code(sequence) Guess the sequence type, i.e.
read_basic_csv(file_handle) Reads a basic CSV into a list.
rm_illegal(taxon_string) Removes illegal characters from taxon name.
autofinder(reference_file)[source]

Autodetects format, missing data symbol and sequence type.

Attempts to find the file format, missing data symbol and sequence type from a reference file. Due to performance reasons, this function will only read the referenced_file until it finds the first sequence. Then, it evaluates the file format and sequence type. It also attempts to get the missing data symbol, but the first sequence may have no missing data. In that case, the missing data symbol will remain undefined and will be re-evaluated during alignment parsing.

Parameters:

reference_file : str

Path to sequence file

Returns:

fmt : str

File format of reference_file

code : tuple

The sequence type as string in first element and missing data symbol as string in the second element

See also

guess_code

static duplicate_taxa(taxa_list)[source]

Identified duplicate items in a list.

Used to detect, for instance, duplicated taxa in the Alignment object

Parameters:

taxa_list : list

List with taxon names

Returns:

duplicated_taxa : list

List with duplicated taxa from taxa_list

static get_loci_taxa(loci_file)[source]

Get the list of taxa from a .loci file.

This is required prior to parsing the alignment in order to correctly add missing data when certain taxa are not present in a locus

Parameters:

loci_file : str

Path to the .loci file.

Returns:

taxa_list : list

List of the taxon names as strings.

static guess_code(sequence)[source]

Guess the sequence type, i.e. protein.

Guesses the code of the provided sequence, that is, if it is a DNA or Protein sequence based on the first sequence of the reference file. The second item of the returned code variable is a placeholder for the missing data symbol (can be either ”?”, “n” or “x”). This symbol will be evaluated during the Alignment parsing.

Parameters:

sequence : string

Sequence string that will be used to guess the type

Returns:

code : list

Provides the (<sequence type>, <missing symbol character>). sequence type can be either “DNA” or “Protein”. The missing symbol character may be ”?”, “n”, “x” or None (See Notes).

See also

autofinder, process.sequence.Alignment.read_alignment

Notes

The missing symbol character that is provided in the returned code tuple may be None if the missing character cannot be determined from the first reference sequence that is used to guess the sequence type. If none of the potential missing data symbol is present in this reference sequence, it will remain undetermined and will be evaluated during the alignment parsing.

static read_basic_csv(file_handle)[source]

Reads a basic CSV into a list.

Parses a simples CSV file with only one column and one or more lines while stripping whitespace.

Parameters:

file_handle : file object

File object of the CSV file

Returns:

result : list

Contents of the CSV file with each line in a different entry

static rm_illegal(taxon_string)[source]

Removes illegal characters from taxon name.

The illegal characters are defined in the illegal_chars variable

Parameters:

taxon_string : str

Taxon name

Returns:

clean_name : str

Taxon name without illegal characters

class trifusion.process.base.CleanUp(func)[source]

Bases: object

Decorator class that handles temporary data for TriSeq and TriStats.

This decorator class wraps the main execution functions of TriSeq and TriStats programs. The __init__ requires only the function reference, defines the name of the temporary directory and evaluates whether the provided func is from TriSeq or TriStats. The only requirement of func is that its first argument is the argparser namespace (that is, the arguments must be parsed in the corresponding program before calling the main function).

Parameters:

func : function

Main function of TriSeq or TriStats

See also

print_col

Attributes

func (function) Main function of TriSeq or TriStats
temp_dir (str) Path to temporary directory where the temporary data will be stored
idx (int) Integer identifier of the main function’s program. 0 for TriSeq and 2 for TriStats

Methods

__call__(*args) Wraps the call of func.
__call__(*args)[source]

Wraps the call of func.

When the main func is called, this code is wrapped around its execution. This mainly ensures that the temporary data stored in temp_dir is correctly removed at the end of the execution, whether it terminals successfully or not. It also clocks the duration of the execution.

Parameters:

args : list

Arbitrary list of positional arguments of func. The only requirement is that the first element is the argparse namespace object.

trifusion.process.base.merger(ranges)[source]

Generator that merges continuous ranges of tuples in a list.

Parameters:

ranges : list

List of tuples, each with two integer elements defining a range, (0, 100) for instance.

trifusion.process.base.print_col(text, color, i=0, quiet=False)[source]

Custom print function for terminal updates of CLI programs.

This print function homogenizes the progress logging of all CLI programs while providing some freedom on the formatting of the progress message, name the colors of the messages. A list of terminal colors is provided and a has_colors function checks whether they are supported on the terminal. The colors in use are green for normal logging, yellow for warnings and red for errors. The final formatting of the message is something like:

[<Program Name>[-Error/Warning]] <message>

Parameters:

text : str

The message that will appear in the terminal

color : variable reference

Reference to the terminal colors defined in process.base. Provided that they are imported in the module where they are being called, the options are: {GREEN, YELLOW, RED}

i : int

Integer that specified which CLI program is being logged. The options are: 0: TriSeq 1: OrthoMCL Pipeline 2: TriStats 3: TriOrtho

quiet : bool

Determines whether the message is logged. If True, no messages are printed to the terminal