A wet lab has given you the task of
writing a program to do some basic statistics on their data. Every
now and then they sequence oligos from different organisms. Their
wet lab machinery provides them with a file for each organism they
sequence containing the date of the sequencing, the name of the
organism sequenced, a short name for it with no spaces or
punctuation, and the list of oligos obtained, and their starting
positions in the genome. There may or may not be 100% coverage of
the entire genome. No oligo is greater than 12 base pairs in
length. They want the following functionality, using command line
parameters.
- Given a new file, of the format following, they wish to be able to merge it's data into the database if organism hasn't been merged already, i.e. the short name doesn't already exist in the database. (--merge <filename>)
- They want to know for a given oligo which organisms it occurred in, and their positions (--belongs <oligo sequence>)
- They want to know for a given organism, all of the oligos belonging to it, in order (--genome <organism short name>)
- They want to know any shared oligos between two specified organisms, and their positions in each oligo, displayed in a neat table (--shared <organism short name 1> <organism short name 2>)
- They want to know the frequency of a given oligo in the entire database (--freq <oligo sequence>)
Notes:
- You will need to use a file to store the database, and you will need to modify its contents. You can either overwrite the entire thing (easiest), or modify it in place (+3 bonus marks above 100%, can offset poor marks from previous assignments). Your decision heavily influences the format of database file.
- Remember to use command line parameters for input exclusively, this means no raw_input()...
- There is no reason your database can't consist of more than one file
- Oligos must match exactly. Do not combine adjacent oligos, nor search for for sub-oligos within the ones in the database
- You may not use any form of database module, inter alia; any relational database, the shelve or pickle modules, berkdb, ODBC etc... flat standard python files only.
Following are 2 example files and the example outputs for the
program using various command line parameters.
31 Mar 2008: Bugblatter of Traal Neuron
BugBlatNeuron
AACGATCTTACG 0
TGTTGAGACA 16
GCAGATGTCGA 43
CCGAGGCG 86
TGCAGACCATC 111
CACAAACCC 145
02 Apr 2008: Babel Fish Brain Matrix Neuron
BabelMatrix
AGCTAGCATGC 3
CATGATGACGAT 45
TACGAGGA 78
CCGAGGCG 109
GTCCCAG 205
$ oligodb --merge bugblatter.dat
$ oligodb --merge babel.dat
$ oligodb --genome BabelMatrix
AGCTAGCATGC (3)
CATGATGACGAT (45)
TACGAGGA (78)
CCGAGGCG (109)
GTCCCAG (205)
$ oligodb --freq GTCCCAG
1
$ oligodb --belongs CCGAGGCG
BugBlatNeuron: 86
BabelMatrix: 109
$ oligodb --shared BugBlatNeuron BabelMatrix
BugBlatNeuron BabelMatrix
CCGAGGCG 86 109