Programming Assignment #1
Deadline: Monday, March 16 at 11:59:59PM.
In light of the confusion that has been created due to the posting of two
different versions of the assignment, two accommodations will be made:
1. The deadline for submission will be extended to Monday, March 16 at 11:59:59PM, as noted above.
2. Either version of the assignment, as noted below, will be accepted and graded.
Version #1
Your Job
1. Implement a
sequential version of Smith-Waterman in C/C++.
or adapt an existing implementation for
Smith-Waterman, e.g., JAligner or
gotoh.c
2. Parallelize the
sequential algorithm using pthreads on a single multicore machine of the
rlogin.cs.vt.edu cluster.
Evaluation
criteria
Documentation and Readability (20%)
Correctness
(40%)
Performance
(40%)
An Overview of Program Specifications
To make the performance of your programs comparable to
each other, the following aspects are specified, which include the scoring
matrix, way of runtime recording, input command line, and program output. (The
attached file param.h
provides the scoring matrix blosum62 and a function PrintRes()
for printing out your alignment result.)
1. Scoring Matrix
The
scoring matrix blosum62 is in the attached file param.h,
which specifies the matching score of individual characters. For instance, d(a,
a) = 4.0, d(c, g) = -3.0, etc. Also, in the param.h file, a
function char2index1() for
mapping a character to its index in the score matrix is provided, i.e., it maps
a to 0, c to 1, g
to 2, and t to 3.
2. Recording Time
The
function gettimeofday() should be used for runtime
recording. The start time is recorded when the two input sequences are read
from input files (i.e., the time for reading input is not counted). The end
time is when the trace back function is returned. Time for printing out the
alignment result is not counted. The unit of the runtime is millisecond with
one decimal digit.
3. Command Line
Your
program is compiled as:
make -f
makefile
You
should provide the makefile for compilation.
After
compilation, the executable file name is swat_XXX, where XXX is your 9-digit
hokie ID #.
To
run your program, the following command is used:
./swat_xxxx
input1 input2 open_penalty ext_penalty
swat_xxxx
is
your executable file name;
Input1 input
file containing the first sequence (input1_256.txt);
Input2 input
file containing the second sequence (input2_256.txt);
open_penalty is
the open gap penalty, (e.g., 5.0);
ext_penalty is
the extension penalty, (e.g., 1.0).
(Note:
generally, the open gap penalty is larger than the extension gap penalty. For
the two example values, it means if there is an open gap, 5.0 will be
subtracted; if there is an extension gap, 1.0 will be subtracted.
)
4. Program Output
Output
is stored in an file with name "output_(your 9-digit ID#)", which contains
the following contents:
1) Total
alignment score of the two input sequences. (if you use the blosum62
score matrix in Jaligner and set the same open gap penalty and extension gap
penalty in your program as you do in Jaligner, your alignment score should be
the same as that of Jaligner. )
2) Runtime:
its format is specified above with unit millisecond,
3) Alignment
result. The alignment result is printed by the following function:
void PrintResult(char *outstr1,
char
*outstr2,
int
nlen,
int
charsperline,
float
open_penalty,
float
ext_penalty);
PrintResult() prints characters in outstr1
and oustr2
reversely, i.e., the first character is printed last and the last
character is printed first. For example: For outstr1 "ggtcgacgag-----a" and outstr2: "gg--gaag-gtaatta", they are
printed as:
a-----gagcagctgg
| | |.|| ||
attaatg-gaag--gg
In PrintResult():
outstr1 and oustr2 are the aligned
output strings from the trace back function, in them, '-' is used to
indicate a gap.
nlen is the length of the two strings, whose lengths
should be equal.
charsperline is the number of characters printed in a
line, just set it to 50.
open_penalty is the open gap penalty.
ext_penalty is the extension gap penalty.
4) Output string
length, open gap penalty, and extension gap penalty (the PrintRes() function will
print these information for you, you do not need to do that yourself.)
One output example is in log1.txt (input sequence
length is 256).
5. Sample
input files
Two
groups of input files are provided, one is of length 256 (input1_256.txt,
input2_256.txt),
the other is of length 7k (input1_7168.txt,
input2_7168.txt).
What to
submit
Put your solution into a single gzipped tar file with all
source files and a Makefile, if any. Provide a README file to describe how the project
can be compiled and how to run the program. E-mail the above to feng@cs.vt.edu.
Version #2
Your
Assignment
1.
Implement
a sequential version of Smith-Waterman in C/C++
or adapt an existing implementation for
Smith-Waterman, e.g., JAligner or
gotoh.c
2.
Parallelize
the sequential algorithm using pthreads on a single
multicore machine of the rlogin.cs.vt.edu cluster.
Evaluation
Criteria
Documentation
and Readability (20%)
Correctness (40%)
Performance (40%)
Overview of Program Specifications
To make it easier to evaluate your programs in
a uniform way. Please follow the guidelines below.
1.
Recording Time
The function gettimeofday()
should be used for time recording. The start time is the point after the two
input sequences are read from the input files (only the reading operations
are before this point). The end time is when the trace back function is
returned. Time for printing out the alignment result is not counted. The unit
of the runtime is millisecond with one decimal digit, e.g., 12334.3 ms.
2.
Command Line
You should
provide the makefile for compiling your program.
Your program is compiled as:
make -f makefile
The output
executable file name is: swat_XXXXXXXXX,
where XXXXXXXXX is your nine digit hokie ID #.
To run your
program, the following command will be used:
./swat_xxxxxxxxx input1 input2
where
swat_xxxxxxxxx is your executable file name and
input1
input
file containing the first sequence (input1_256.txt);
input2
input
file containing the second sequence (input2_256.txt);
3.
Output
Your output will be printed to a file with the name "output_(your ID#)" and will contain the following:
1)
The
aligned input sequences, i.e., the two output sequences as those of the gotoh.c;
3)
Trace
back result as that of the gotoh.c
2)
Total run time:
its format is specified above with unit millisecond
One output example is
in log2.txt
(input sequence length is 256).
5.
Sample Input files
Two groups of
input files are provided, one is of length 256 (input1_256.txt,
input2_256.txt),
the other is of length 7k (input1_7168.txt,
input2_7168.txt).
What to
submit
Put your solution into a single gzipped tar file with all source files and a Makefile, if any. Provide a README file to describe how
the project can be compiled and how to run the program. E-mail the above to
feng@cs.vt.edu.