trifusion.ortho.protein2dna module¶
This module deals with the conversion of protein sequences into their corresponding nucleotide sequences. Since the conversion from protein to DNA cannot be made without knowing the nucleotide sequence, this module contains functions that compile and store DNA sequences, convert them into amino acid sequences and then tries to match them to the original protein sequences.
-
trifusion.ortho.protein2dna.
convert_group
(sqldb, cds_file_list, protein_db, group_sequences, usearch_bin, output_dir, shared_namespace=None)[source]¶ Convenience function that wraps all required operations to convert protein to nucleotide files from a Group object
-
trifusion.ortho.protein2dna.
convert_protein_file
(pairs, group_obj, id_db, output_dir, shared_ns)[source]¶ A given protein file will be converted into their corresponding nucleotide sequences using a previously set database using the create_db function :return:
-
trifusion.ortho.protein2dna.
create_db
(f_list, dest='./', ns=None)[source]¶ Creates a fasta database file containing the translated protein sequences from the cds files. The final transcripts.fas file will be use by USEARCH to get matches between the original protein sequences and their nucleotide counterparts. A dictionary database will also be created where the transcript headers will be associated with the original DNA sequence, so that they will be later retrieved :param f_list. List, containing the file names of the transcript files
-
trifusion.ortho.protein2dna.
create_query
(input_list, dest='./')[source]¶ To speed things up, all sequences in the input protein files will be concatenated into a single file, which will be used as query in USEARCH. :param input_list: List, with file names of the protein files to convert
-
trifusion.ortho.protein2dna.
create_query_from_dict
(protein_dict)[source]¶ Analogous to create_query, but begins from a dictionary provided by processed group files :param protein_dict: dictionary
-
trifusion.ortho.protein2dna.
get_pairs
(dest='./', ns=None)[source]¶ Parses the output of USEARCH and creates a dictionary with the header pairs between original protein and transcripts