s08 - RNA Splicing

Problem

After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.

Given: A DNA string s (of length at most 1 kbp) and a collection of substrings of s acting as introns. All strings are given in FASTA format.

Return: A protein string resulting from transcribing and translating the exons of s. (Note: Only one solution will exist for the dataset provided.)

Sample Dataset

>Rosalind_10ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG>Rosalind_12ATCGGTCGAA>Rosalind_15ATCGGTCGAGCGTGT

Sample Output

MVYIADKQHVASREAYGHMFKVCA

Solution

这道题要求把内含子切掉，再把外显子连起来翻译，翻译的时候需要先把DNA变成RNA，再按照codon字典进行翻译。读FASTA文件，调用了上一道题Finding a Shared Motif的函数。

#!/usr/bin/env python3

from src.lcsm import readFASTAcodons = {
    'UUU': 'F',      'CUU': 'L',      'AUU': 'I',      'GUU': 'V',    
    'UUC': 'F',      'CUC': 'L',      'AUC': 'I',      'GUC': 'V',    
    'UUA': 'L',      'CUA': 'L',      'AUA': 'I',      'GUA': 'V',    
    'UUG': 'L',      'CUG': 'L',      'AUG': 'M',      'GUG': 'V',    
    'UCU': 'S',      'CCU': 'P',      'ACU': 'T',      'GCU': 'A',    
    'UCC': 'S',      'CCC': 'P',      'ACC': 'T',      'GCC': 'A',    
    'UCA': 'S',      'CCA': 'P',      'ACA': 'T',      'GCA': 'A',    
    'UCG': 'S',      'CCG': 'P',      'ACG': 'T',      'GCG': 'A',    
    'UAU': 'Y',      'CAU': 'H',      'AAU': 'N',      'GAU': 'D',    
    'UAC': 'Y',      'CAC': 'H',      'AAC': 'N',      'GAC': 'D',    
    'UAA': '\n',     'CAA': 'Q',      'AAA': 'K',      'GAA': 'E',    
    'UAG': '\n',     'CAG': 'Q',      'AAG': 'K',      'GAG': 'E',    
    'UGU': 'C',      'CGU': 'R',      'AGU': 'S',      'GGU': 'G',    
    'UGC': 'C',      'CGC': 'R',      'AGC': 'S',      'GGC': 'G',    
    'UGA': '\n',     'CGA': 'R',      'AGA': 'R',      'GGA': 'G',    
    'UGG': 'W',      'CGG': 'R',      'AGG': 'R',      'GGG': 'G'}
    
def translate(gene):  res = '';  
  for i in range(len(gene)//3 -1):    res += codons[gene[i*3:(i+1)*3]]
  return res
  
if __name__ == '__main__':  description, sequence = readFASTA('DATA/rosalind_splc.txt')  gene = sequence[0]  intron = sequence[1:]
  for i in range(len(intron)):    pos = gene.find(intron[i])
      if (pos != -1):      gene = gene[0:pos] + gene[(pos+len(intron[i])):]  gene = gene.replace('T', 'U')  print(translate(gene))

电梯

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。