After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.
Given: A DNA string s (of length at most 1 kbp) and a collection of substrings of s acting as introns. All strings are given in FASTA format.
Return: A protein string resulting from transcribing and translating the exons of s. (Note: Only one solution will exist for the dataset provided.)
>Rosalind_10ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG>Rosalind_12ATCGGTCGAA>Rosalind_15ATCGGTCGAGCGTGT
MVYIADKQHVASREAYGHMFKVCA
这道题要求把内含子切掉,再把外显子连起来翻译,翻译的时候需要先把DNA变成RNA,再按照codon字典进行翻译。读FASTA文件,调用了上一道题Finding a Shared Motif的函数。
#!/usr/bin/env python3
from src.lcsm import readFASTAcodons = {
'UUU': 'F', 'CUU': 'L', 'AUU': 'I', 'GUU': 'V',
'UUC': 'F', 'CUC': 'L', 'AUC': 'I', 'GUC': 'V',
'UUA': 'L', 'CUA': 'L', 'AUA': 'I', 'GUA': 'V',
'UUG': 'L', 'CUG': 'L', 'AUG': 'M', 'GUG': 'V',
'UCU': 'S', 'CCU': 'P', 'ACU': 'T', 'GCU': 'A',
'UCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',
'UCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',
'UCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',
'UAU': 'Y', 'CAU': 'H', 'AAU': 'N', 'GAU': 'D',
'UAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',
'UAA': '\n', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',
'UAG': '\n', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',
'UGU': 'C', 'CGU': 'R', 'AGU': 'S', 'GGU': 'G',
'UGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',
'UGA': '\n', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',
'UGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'}
def translate(gene): res = '';
for i in range(len(gene)//3 -1): res += codons[gene[i*3:(i+1)*3]]
return res
if __name__ == '__main__': description, sequence = readFASTA('DATA/rosalind_splc.txt') gene = sequence[0] intron = sequence[1:]
for i in range(len(intron)): pos = gene.find(intron[i])
if (pos != -1): gene = gene[0:pos] + gene[(pos+len(intron[i])):] gene = gene.replace('T', 'U') print(translate(gene))
联系客服