Programming Assignment #1

Deadline: Monday, March 16 at 11:59:59PM.

      In light of the confusion that has been created due to the posting of two different versions of the assignment, two accommodations will be made:

1.      The deadline for submission will be extended to Monday, March 16 at 11:59:59PM, as noted above.

2.      Either version of the assignment, as noted below, will be accepted and graded.


Version #1


Your Job

1.      Implement a sequential version of Smith-Waterman in C/C++. or adapt an existing implementation for Smith-Waterman, e.g., JAligner or gotoh.c

2.      Parallelize the sequential algorithm using pthreads on a single multicore machine of the rlogin.cs.vt.edu cluster.

Evaluation criteria

 Documentation and Readability (20%)

 Correctness (40%)

 Performance (40%)

An Overview of Program Specifications

To make the performance of your programs comparable to each other, the following aspects are specified, which include the scoring matrix, way of runtime recording, input command line, and program output. (The attached file param.h provides the scoring matrix blosum62 and a function PrintRes() for printing out your alignment result.)

1.      Scoring Matrix

The scoring matrix blosum62 is in the attached file param.h, which specifies the matching score of individual characters. For instance, d(a, a) =  4.0, d(c, g) = -3.0, etc. Also, in the param.h file, a function char2index1()  for mapping a character to its index in the score matrix is provided, i.e., it maps a to 0, c to 1, g to 2, and t to 3.

 

2.      Recording Time

The function gettimeofday() should be used for runtime recording. The start time is recorded when the two input sequences are read from input files (i.e., the time for reading input is not counted). The end time is when the trace back function is returned. Time for printing out the alignment result is not counted. The unit of the runtime is millisecond with one decimal digit.

 

3.      Command Line

Your program is compiled as:

        make -f makefile

You should provide the makefile for compilation.

After compilation, the executable file name is swat_XXX, where XXX is your 9-digit hokie ID #.

 

To run your program, the following command is used:

./swat_xxxx input1 input2 open_penalty ext_penalty

 

swat_xxxx                    is your executable file name;

Input1                         input file containing the first sequence (input1_256.txt);

Input2                         input file containing the second sequence (input2_256.txt);

open_penalty                        is the open gap penalty,  (e.g., 5.0);

ext_penalty                  is the extension penalty, (e.g., 1.0).

(Note:  generally, the open gap penalty is larger than the extension gap penalty. For the two example values, it means if there is an open gap, 5.0 will be subtracted; if there is an extension gap, 1.0 will be subtracted. )

 

4.      Program Output

Output is stored in an file with name "output_(your 9-digit ID#)", which contains the following contents:

1)      Total alignment score of the two input sequences. (if you use the blosum62 score matrix in Jaligner and set the same open gap penalty and extension gap penalty in your program as you do in Jaligner, your alignment score should be the same as that of Jaligner. )

2)      Runtime:  its format is specified above with unit millisecond,

3)      Alignment result.  The alignment result is printed by the following function:

void PrintResult(char *outstr1,

  char *outstr2,

  int  nlen,

  int  charsperline,

  float open_penalty,

  float ext_penalty);

PrintResult() prints characters in outstr1 and oustr2 reversely, i.e., the first character is printed last and the last character  is printed first. For example: For outstr1 "ggtcgacgag-----a" and outstr2:  "gg--gaag-gtaatta", they are printed as:

a-----gagcagctgg

|     | |.||  ||

attaatg-gaag--gg

In PrintResult():

outstr1 and oustr2 are the aligned output strings from the trace back function, in them,  '-' is used to indicate a gap.

nlen is the length of the two strings, whose lengths should be equal.

charsperline is the number of characters printed in a line, just set it to 50.

open_penalty is the open gap penalty.

ext_penalty is the extension gap penalty.

4)      Output string length, open gap penalty, and extension gap penalty (the PrintRes() function will print these information for you, you do not need to do that yourself.)

One output example is in log1.txt (input sequence length is 256). 

5.      Sample input files

Two groups of input files are provided, one is of length 256 (input1_256.txt, input2_256.txt), the other is of length 7k (input1_7168.txt, input2_7168.txt).  

What to submit

Put your solution into a single gzipped tar file with all source files and a Makefile, if any. Provide a README file to describe how the project can be compiled and how to run the program. E-mail the above to feng@cs.vt.edu.


Version #2


Your Assignment

1.      Implement a sequential version of Smith-Waterman in C/C++ or adapt an existing implementation for Smith-Waterman, e.g., JAligner or gotoh.c

2.      Parallelize the sequential algorithm using pthreads on a single multicore machine of the rlogin.cs.vt.edu cluster.

Evaluation Criteria

 Documentation and Readability (20%)

 Correctness (40%)

 Performance (40%)

Overview of Program Specifications

To make it easier to evaluate your programs in a uniform way. Please follow the guidelines below.

1.      Recording Time

The function gettimeofday() should be used for time recording. The start time is the point after the two input sequences are read from the input files (only the reading operations are before this point). The end time is when the trace back function is returned. Time for printing out the alignment result is not counted. The unit of the runtime is millisecond with one decimal digit, e.g., 12334.3 ms.

2.      Command Line

You should provide the makefile for compiling your program. Your program is compiled as:

        make -f makefile

The output executable file name is:  swat_XXXXXXXXX, where XXXXXXXXX is your nine digit hokie ID #.

 To run your program, the following command will be used:

./swat_xxxxxxxxx input1 input2

 where

 swat_xxxxxxxxx is your executable file name and

input1                          input file containing the first sequence (input1_256.txt);

input2                          input file containing the second sequence (input2_256.txt);

3.      Output

Your output will be printed to a file with the name "output_(your ID#)" and will contain the following:

1)      The aligned input sequences, i.e., the two output sequences as those of the gotoh.c;

3)      Trace back result as that of the gotoh.c

2)      Total run time:  its format is specified above with unit millisecond

One output example is in log2.txt (input sequence length is 256). 

5.      Sample Input files

Two groups of input files are provided, one is of length 256 (input1_256.txt, input2_256.txt), the other is of length 7k (input1_7168.txt, input2_7168.txt).  

What to submit

Put your solution into a single gzipped tar file with all source files and a Makefile, if any. Provide a README file to describe how the project can be compiled and how to run the program. E-mail the above to feng@cs.vt.edu.