Block F, Tri 1, 2011-12

Writing Lookup Functions

Orientation Imagine you want to convert a numerical representation for a month such as 4 for April, and obtain the string "April" when you use the value 4. This is an occasion for creating a lookup function.

Here is a gauche way to do the job. People who do this should be tarred and feathered.

def monthNames(n):
    if n == 1:
        return "January"
    elif n == 2:
        return "February"
    elif n == 3:
        return "March"
    elif n == 4:
        return "April"
    elif n == 5:
        return "May"
    elif n == 6:
        return "June"
    elif n == 7:
        return "July"
    elif n == 8:
        return "August"
    elif n == 9:
        return "September"
    elif n == 10:
        return "October"
    elif n == 11:
        return "November"
    elif n == 12:
        return "December"
    

A Matter of Taste This is in bad taste for two reasons. One is that this function has a dozen exit points. That is truly ugly. The excessive conditional logic is ugly, cumbersome and hard to maintain and understand. Now consider this means of doing things.

def monthNames(n):
    names = [ "January", "February", "March", "April", "May", "June",\
         "July", "August", "September", "October", "November", "December"]
    return names[n - 1]
    

Here we use a data structure to eliminate a lot of conditional logic. This is cleaner, tidier and far nicer.

Programming Problem 1 Create a function for finding the number 1-12 of a month from its name or abbreviation. Use a dictionary with the abbreviations and figure out how to trim the month names in a uniform fashion so the full name or the abbreviation works. A docstring and method stub is given. Write a small test suite.

def monthnumbers(name):
    """precondition: name is the full name or abbreviation for a month 
    postcondtion: the month's number 1-12 is returned.
    Example:  monthNumbers("Jan") → 1
    Example:  monthNumbers("February") → 2
    """
    return "TODO"
    
specified by the docstring.

Programming Problem 2 Create a function for computing the length of a month as specified by the docstring.

The leap year rule is as follows.

For example, there is no February 29, 1900 but there was a February 20, 2000. That was an exceptional day, and you probably did not notice that. Fie and shame.

def monthLength(month, year):
    """prec:  month is a number for a month
    year is a integer
    post:  returns the length of the month in question.  Note the 
    leap year adjustment.
    """
    return -1
    

Programming Problem 3 Create a function that checks to see if a date is valid. Here is the shell. Of course, you should use the other functions which you tested and coded first.

def isValidDate(day, month, year):
    """prec: day is an integer
    month is a integer
    year is an integer
    post:  returns True if and only if the date passed is valid
    """
    return False
    

Hitch up your Jeans for a Little Transcription Now we are going to consider a problem in genetics. We will sketch some bare-bones background so you understand the purpose of what you are programming.

You all know from Bio that DNA is a helical molecule. From our standpoint, that is not so important. We simply think of DNA as a ladder. Just unwind the helix and you will see that.

Genes consist of DNA, short for deoxyribonucleic acid. The rungs of the ladder are called base pairs of molecules. These rungs are stuck together with hydrogen bonds.

LetterName
AAdenine
TThiamine
GGuanine
CCystosine

We shall focus here on the rungs of the ladder. The runners are composed of phosphate and sugar molecules.

A segment of DNA looks like this


                    |-A---T-|
                    |-C---G-|
                    |-G---C-|
                    |-A---T-|
                    |-T---A-|
                    |-C---G-|
                    |-G---C-|
                    |-A---A-|

    

Observe that there is a symmetry; the descending letters on the left are the same as the ascending letters on the right. The molecules A and T always occur in a pair and the molecules G and C occur in a pair. The A-T pairs are bound by a double hydrogen bond and the G-C pairs are bound by a triple hydrogen bond. These bonds can be unzipped for the process of replication of DNA or for transcription of DNA.

The contents of a DNA strand are given by a character string from the alphabet {A, C, T, G}. Also you can see that if we unzip DNA by severing the bonds in the rungs, each piece remaining has sufficient information to reconstruct the other side. This principle is key in genetics.

Help, I am in Jail! DNA resides in the cell nucleus. Most of the time, it resides in a quiescent form and just hangs loosely in the nucleus. When the cell divides, DNA organizes into chromosomes by winding up tightly. In any event, DNA never leaves the nucleus; it is "in jail" there. It relies on another cell function to carry out the instructions encoded in it.

DNA is your generic firmware. It controls the creation of amino acids, which are critical to your metabolic function and to sustaining life. DNA is nature's data structure that makes you what you are.

DNA uses Messenger Ribonucleaic Acid (mRNA) to create sequences of amino acids. This process is called translation. Computationally, we create a messenger molecule. Two things must happen. One, is that every T is transformed into a U. The DNA moledule unzips and a messenger RNA strand is created in the process of translation. This strand can penetrate the wall of the nucleus and get out into the rest of the cell. In this step, the T molecule becomes a U molecule.

For an easy warmup, write and test this translaton function.

def translate(sequence):
    """prec: sequence is an ACTG gene sequence.  Transform it to caps to return it
    postc:  Every T is turned into a U.  Do not use looping or conditional execution;
    use a string method."""
    return ""
    

Malformed Sequences A translated sequence string may only contain the letters A,C, U or G. Its length must be divisible by 3, as the molecules are transcribed (we shall describe this units of 3 called codons. The sequence must begin with th e start codon AUG; this is the signal to the transcription process to begin transcribing. There are three end codons: UAG, UAA or UGA. Here is your method stub. Supply test cases.

def isValidSequence(sequence):
    """prec: sequence is a string
    postc: This returns true if a sequence is not malformed and False otherwise. 
    """
    return False
    

Transcribing a Sequence You are going to write a set of functions to transcribe mRNA into a protein sequence. First, we need a lookup function to transform codons to proteins. We will use this translation table obtained from Wikipedia.

Here is what is happening biologically. The DNA sequence is translated into mRNA. DNA can never leave the nucleus. The mRNA goes to a cell organelle called a ribosome, which then uses this sequence for creating amino acids and proteins needed for the cell's life activity.

  2nd base
U C A G
1st base U UUU (Phe/F) Phenylalanine UCU (Ser/S) Serine UAU (Tyr/Y) Tyrosine UGU (Cys/C) Cysteine
UUC (Phe/F) Phenylalanine UCC (Ser/S) Serine UAC (Tyr/Y) Tyrosine UGC (Cys/C) Cysteine
UUA (Leu/L) Leucine UCA (Ser/S) Serine UAA Stop (Ochre) UGA Stop (Opal)
UUG (Leu/L) Leucine UCG (Ser/S) Serine UAG Stop (Amber) UGG (Trp/W) Tryptophan    
C CUU (Leu/L) Leucine CCU (Pro/P) Proline CAU (His/H) Histidine CGU (Arg/R) Arginine
CUC (Leu/L) Leucine CCC (Pro/P) Proline CAC (His/H) Histidine CGC (Arg/R) Arginine
CUA (Leu/L) Leucine CCA (Pro/P) Proline CAA (Gln/Q) Glutamine CGA (Arg/R) Arginine
CUG (Leu/L) Leucine CCG (Pro/P) Proline CAG (Gln/Q) Glutamine CGG (Arg/R) Arginine
A AUU (Ile/I) Isoleucine ACU (Thr/T) Threonine         AAU (Asn/N) Asparagine AGU (Ser/S) Serine
AUC (Ile/I) Isoleucine ACC (Thr/T) Threonine AAC (Asn/N) Asparagine AGC (Ser/S) Serine
AUA (Ile/I) Isoleucine ACA (Thr/T) Threonine AAA (Lys/K) Lysine AGA (Arg/R) Arginine
AUG[A] (Met/M) Methionine ACG (Thr/T) Threonine AAG (Lys/K) Lysine AGG (Arg/R) Arginine
G GUU (Val/V) Valine GCU (Ala/A) Alanine GAU (Asp/D) Aspartic acid GGU (Gly/G) Glycine
GUC (Val/V) Valine GCC (Ala/A) Alanine GAC (Asp/D) Aspartic acid GGC (Gly/G) Glycine
GUA (Val/V) Valine GCA (Ala/A) Alanine GAA (Glu/E) Glutamic acid GGA (Gly/G) Glycine
GUG (Val/V) Valine GCG (Ala/A) Alanine GAG (Glu/E) Glutamic acid GGG (Gly/G) Glycine

Look at this example sequence.

AUGAAAGGGAUGUGA

It is properly formed. Now we break it into codons. There are 64 possible codons.

AUG AAA GGG AUG UGA

We code it as follows. Encode the ending codons with a *. Stop once you see an ending codon. Encode the protein with the letter code supplied in the table as in the following example.

AUG AAA GGG AUG UGA
          M   K   G   M   *
    

The process We can see that two processes are involved in coding a valid sequence. We need to take each codon and transcribe it into a protein. We then need to iterate through the sequence and transcribe the codons. This process should return a protein sequence. To do this you will write two functions. Create a test suite for each.

You should create a dictionary that does the lookup. Return a * for a stop codon and an X if there is any unknown codon. This is going to be annoying, but you need to encapsulate it so it does not stay annoying.

def proteinLookup(codon):
    """prec: codon is a, duh, codon.
    postc:  returns the protein letter"""
    

Now transcribe.

def transcribe(sequence):
    """prec:  the sequence is a legal sequence.
    postc:  returns the transcription into a protein sequence."""
    return ""