NSF PROJECT PAGE

III-CXT: Collaborative Research: A High-Throughput Approach to the Assignment of Orthologous Genes Based on Genome Rearrangement

Introduction: Orthologous genes, or orthologs, are genes in different species that have evolved directly from a common ancestral gene. Genome-scale assignment of orthologs is a fundamental and challenging problem in computational biology, and has a wide range of applications in comparative genomics and functional genomics. This project continues the development of the parsimony approach for assigning orthologs between closely related genomes which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. The project addresses three key algorithmic problems including (i) signed reversal distance with duplicates, (ii) signed transposition distance with duplicates, and (iii) minimum common string partition. Efficient solutions to each of these problems are combined and incorporated into a software system for ortholog assignment, called MSOAR . The project encompasses genome-wide analysis of orthologous (and paralogous) relationships on the human and mouse genomes to valdiate the approach, and more importantly, to address several important evolutionary biological questions including the characterization of gains and losses of duplicated genes in the two genomes, the elucidation of gene movements in one genome with respect to the other genome, and the quantification of different mechanisms of gene duplication.

Principal Investigators:

Tao Jiang (PI) & the UCR project website
Liqing Zhang (co-PI)

Graduate Students, Postdoc, and Alumni funded by or contributed to the project:

Mingming Liu
Mark Lawson
Wenhui Huang
Deng Pan
Darrell Deconge

Publications:

  • Mingming Liu, Yanwei Huang, Liqing Zhang, and David Bevan. A new functional association-based protein complex prediction. Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference. 2011. Page: 488 - 494. Atlanta, GA. ISBN: 978-1-4577-1612-6.
  • Liqing Zhang, Layne T. Watson, and Lenwood S. Heath. A Network of SCOP Hidden Markov Models and Its Analysis. 2011. BMC Bioinformatics. 12:191.
  • Mark J. Lawson, Lenwood S. Heath, Hai Zhao, and Liqing Zhang. Optimizing a Cost Matrix to Solve Rare-Class Biological Problems. Proceedings of the 2011 International Conference on Bioinformatics and Computational Biology (2011). Ed. Hamid R. Arabnia Quoc-Nam Tran.
  • Yanwei Huang, Mingming Liu, and Liqing Zhang. Gene Selection for Cancer Classification by Multiple PCA with sparsity. The 4th International Conference on Bioinformatics and Computational Biology (BICoB) (2011). Accepted.
  • Liqing Zhang, Layne T. Watson, and Lenwood S. Heath. A Network of Hidden Markov Models and Its Analysis. Proceedings of the 2011 International Conference on Bioinformatics and Computational Biology (2011). Ed. Hamid R. Arabnia Quoc-Nam Tran.
  • Lenwood S. Heath, Ao-ping Hou, Huadong Xia, and Liqing Zhang. A Genome compression algorithm supporting manipulation. The International Conference on Computational Systems Bioinformatics (CSB 2010).
  • Liqing Zhang and Layne T. Watson. The Expected Fitness Cost of a Mutation Fixation under the One-Dimensional Fisher Model. International Journal of Pure and Applied Mathematics, p. 129, vol. 62, (2010).
  • Huang, W. H., P. Wang, Z. Liu, L. Q. Zhang. Identifying disease associations via genome-wide association studies. The Seventh Asia Pacific Bioinformatics Conference (APBC 2009).
  • Lawson, M.J. and L.Q. Zhang. Sexy gene conversions: Locating gene conversions on the X-chromosome. Nucleic Acid Research, 2009, 1-10.
  • Pan, D. and L.Q. Zhang. Burst of young retrogenes and independent retrogene formation in mammals. PLoS ONE 4, 2009, 1-19.
  • Pan, D. and L.Q. Zhang. An atlas of the speed of copy number changes in animal gene families and its implications. PLOS ONE, accepted. 2009.
  • Shi, G., L.Q. Zhang, and T. Jiang. MSOAR 2.0: Incorporating tandem duplica- tions into ortholog assignment based on genome rearrangement. Computational Systems Biology (CSB) Conference 2009.
  • M. Lawson, L. Heath, N. Ramakrishnam, and L.Q.Zhang. Using Cost-Sensitive Learning to Determine Gene Conversions. Advanced Intelligent Computing Technology and Applications ICIC2008.
  • Deng Pan and Liqing Zhang. Burst of young retrogenes and independent retrogene formation in mammals. submitted.
  • Deng Pan and Liqing Zhang. Tandemly Arrayed Genes in Vertebrate Genomes. minor revision.
  • Valia Shoja, T. M. Murali, and Liqing Zhang. Expression Divergence of Tandemly Arrayed Genes in Human and Mouse. Comparative and Functional Genomics. v2007, 2007.
  • D. Pan and L.Q. Zhang. Quantifying the major mechanisms of recent gene duplications in the human and mouse genomes: a novel strategy to estimate gene duplication rates. Genome Biology 2007, 8:R158.
  • Funding Sources:

    This project is funded by an NSF grant IIS-0710945 for the period of Sept. 15, 2007 - August 31, 2010. A collaborative grant was simultaneously awarded to the PI Prof. Tao Jiang for the same period.