Thomas J Wheeler
University of Maine
Department of Computer Science
Orono, ME 04469
wheeler@umcs.maine.edu
It is becomming increasingly common that research efforts, and the development of systems to support them, are being undertaken by multidisciplinary teams. Life science multidisciplinary research has become the norm, rather than an innovation. There are several reasons for this trend. Insights from several points of view provide a richer understanding of issues and more opportunities for solutions. Also insights in one discipline often come from thought patterns from another discipline. In many disciplines the research in part(s) of the domain has reached the stage where exploring issues and advances in adjoining parts, and in the interaction of parts, is warrented. In hierarchical systems, especially life, research at individual levels is different in kind from research at others, and integration across levels has become possible and desirable.Multidisciplinary research presents a marvelous opportunity, but also creates serious problems. Merging of the disciplines' conceptualizations must occur, at least in the (separate)minds of the collaborators. To be effective, merging must leverage the expertise of individual discipline members, as well as that of general purpose designers.
Systems that support this type of research, are complex systems, with significant semantic mismatch problems. The increasing use of computer databases for organizing disparit research results in data integrations problems. Models for each database or data source are designed independently, in accordance with a domain’s conceptual model. These models are further specialized to a particular research effort, then encoded using general purpose data models. The independence of development and the differing cultures of the fields, cause incompatibilities between models and programming interfaces. The notation's general purpose nature loses(filters) insights and intuition from domains' natural illustrations and explanations of key models.
Within the software engineering strategy focussed on a designer's perspective of development, there are three classes of concerns that must be addressed in creating a system: developing a substantial understanding of a problem and its domain, designing a system concept, and architecting and realizing that system. This paper explores a mechanism and a methodology for all but realization, based on integration of multiple disciplines' models. It distills the inherent structure of each model, blends models to create the structure for the integrated domain and creates views of this blended structure for each participating discipline
.
The approach has four aspects. “Natural” graphic depictions and explanations are integrated with general purpose models. The underlying structure of the natural models is extracted by analysis of the metaphorical underpinnings of those models. Models are blended using the character of one to underlie semantics taken from others. A framework for visualization of the blended domain is created using the natural depictions, explanations and underlying metaphors.This technique provides a framework for understanding, organizing and supporting interdisciplinary work. It improves the conceptual modeling process by integrating more domain intuition and insight into the process. We will illustrate the mechanisms and a methodology for use with excerpts from interdisciplinary projects in molecular biology and ecology.
Keywords
Integration, Model, Model Blending, Natural Graphic, Multidiscipline, Multidisciplinart ResearchPaper Category: technical paper
Emphasis: research
Within the software engineering strategy focussed on a designer's perspective of development, there are three classes of concerns that must be addressed in creating a system[Guttag,Horning]. First, designers need to develop a substantial understanding of a problem to be solved, the domain in which it resides and the community that works in that domain. Second, they must design a system concept, by a creative process whose success appears to be based on insights into the central concepts and activities of the domain and its community. Third, they need to architect and realize that system. The process of developing the understanding and the insight is called system/domain analysis, the development of the system concept and the architecture of the system are called design and realization is called implementation. Intuition and insight in the analysis/design process come from discipline specific understanding and are captured using formal notations. This perspective on development has modeling as its central focus.
A number concepts that have emerged in cognitive science over the past decade can help with the domain understanding and system concept design concerns. First, understanding the structure and conceptual metaphor basis of human cognitive models provides a framework for understanding and design. Models, on which understanding and design are based, are captured in a more natural way. Second, there is a cognitive process for developing new meaning for concepts, when they are placed in different contexts. In the new analysis of this process, conceptual model "blends" provide a basis for developing new meaning. An analog of this can support cross discipline and multidiscipline envisioning, which these type of systems need to exhibit. Third, explanation and understanding are related, with explanation reifying an understanding, while from the other direction, explanation develops understanding. This interplay of explanation and understanding provides leveraging during analysis. And fourth, the categorization concept of radial categories leads to a useful characterization of "natural" depictions used in explanations. This characterization places constraints on the amount and type of abstraction used in models derived from those explanations, providing guidelines for analysis.
The interoperation of heterogeneous data types requires transformation of data from its original representation to a standard representation (in a data warehouse architecture) or to a usage representation (in a federated architecture). The main technology developers addressing these types of issues come from the Computer Science/ General Purpose Modeling communities [e.g. Roth, Davidson]. Valid integration, however, depends on the expertise of scientific curators, understanding the source(s), and scientists who design and perform virtual experiments and analyses with the resulting merged information [Bult].
To address architecture level concerns, general purpose formal and natural modeling languages must be combined. General purpose modeling languages and notations are needed as a basis for automation, but because they are general purpose, they must abstract away any discipline specific intuition and insight. General purpose modeling languages and notations are also formal. But, Scientists understand and explain concepts and issues in their field using notation and language natural to their discipline. Depth and insight require the notations and thought patterns natural to specific disciplines. The models emerging from their explanations and depictions need be combined with general purpose modeling languages and notations, such as XML and UML. The general purpose, formal models are structured by the natural notations for more valid models. The natural notation models are integrated into the general purpose notation models, to retain the discipline's insight into the domain.
The aim of this effort is to develop a mechanism and a methodology for
integration of separate discipline’s models, based on distilling the inherent
structure of each model, blending them to create the structure for the
integrated domain and creating views of this blended structure for each
participating discipline. This paper results from work on a number of life
sciences projects, taking a systems' biology approach, which is naturally
interdisciplinary, in areas from genomics to ecology. It looks at this
type of system from a point of view which uses component based architectures,
providing data integration through interfaces. The focus of this work is
in integrating ideas about cognitive models and model blending from cognitive
science, into a model based development process .
While multidisciplinary team based research presents a marvelous opportunity, it also creates serious problems. In multidisciplinary research, blending of the disciplines' conceptualizations must occur, at least in the (separate)minds of the collaborators, but also in the resulting or supporting systems.
Researchers in the life sciences are increasingly taking on a multidisciplinary character, using the system's biology approach to understanding issues from molecular biology to ecology. They are finding that integrative issues create a barrier to progress in molecular biology[Paton] and that a complex system approach is essential in ecological systems[Wu,Marceau]. The multidisciplinary character of system's biology is changing the landscape of life science research.
Conceptual model mismatch problems exhibits themselves in system level problems such as (mis)interpretation of results displayed by a system, difficulty in development using software from another discipline, and difficulty in a multidisciplinary team developing complementary and integrated understandings of others' concepts. These problems come about because the designer's conceptual model creates the character of the system and its components, and that character is usually difficult to understand from the (different) point of view natural to the user.
At the architecture level, significant mismatch problem comes from the increasing use of computer databases for organizing research and its results leads to data integrations problems for multidisciplinary research/systems. The data model for each database, or other data source, is designed independently, in accordance with a domain’s conceptual model, specialized to a particular research effort, then encoded using general purpose data models. Because of the independence of development and the differing cultures of the fields, incompatibilities occur between models and at programming interfaces. Because of the general purpose nature of the notation for encoding the data models, the insight and intuition in each domain’s natural illustrations and explanations of its key models is lost.
A related problem occurs in the interplay of formal and natural notations and thought patterns. Models that underlie the interfaces of systems and subsystems, start in notations of, and are framed in terms of thought patterns of, specific disciplines; but must be encoded in terms of general purpose formal notations. The integration or translation that occurs at system and subsystem interfaces requires the use of general purpose languages, models and notation; but scientific depth requires leveraging the notations and thought patterns of specific disciplines.
Everyday existence, thinking about and using familiar, commonplace objects and concepts is organized effectively by perceptual images and metaphor based mental models of the objects and their context [Fauconnier, Lakoff, Johnson, Mandler], so that humans can naturally deal with their everyday environment. These perceptions and concepts provide an organized understanding of the everyday environment. When one wants to function with similar ease and facility in an artificially created environment such as system development or interdisciplinary research, where one cannot have a naturally constructed framework in which to reason and act, one has to consciously create the organized understanding necessary for natural and effective action.
Software engineering has been a search for organization patterns for differing types of system concerns. Creating explicit organized representations for work products has provided guidance in organizing work as well as useful structuring of the results[Parnas, Guttag,Horning]. The effort reported on here is an effort to find and develop a set of organization patterns for system development in the situation where the concerns are complex and dissimilar in some way, and have different cultures, and the development participants, and their products, have to interact.
Our approach is based on using results from cognitive science research, within a framework provided by software engineering, to provide a basis for system design, by capturing the models on which the design is based in a more natural way. This is done by explicit analysis of the natural models of each discipline, capturing the essence of each in a formal model which is then used for system design. The structure and conceptual metaphors used in explanations of the discipline's concepts are captured and analysed for use in formal models in the system design.
We also combine the use of formal and natural models in developing a cross-discipline or multidiscipline understanding of particular domains. This is based on an analysis of the cognitive process which develops new meaning for a concept when placed in a different context, using blended cognitive models and metaphorical mappings. An analog of this process can provide a basis for supporting the cross discipline and multidiscipline envisioning which these systms need to support.
The concept of radial categories characterizing "natural" depictions is used in analysing explanations, and developing the abstraction used in models derived from those explanations. "Natural" semantic support for creative insight in multidiscipline research is provided by the emergent structure of these blended cognitive models and metaphorical mappings of meanings in the user's discipline.The models developed are used to design the system, the perceptual interfaces provided by systems, and the abstract interfaces to data sources, analysis programs and other subsystems.
The mechanism consists of two parts; (1) analysis of idealized cognitive models and metaphorical semantics; and (2) synthesis supporting creativity, using cognitive model blending. Recent results show that people's (e.g. scientist's and developer's) cognitive models appear to be based on idealized abstract cognitive models (Idealized Cognitive Models (ICM's)[Lakoff/Johnson], Schemata[MacEachern], Conceptual Structures[Jackendorf]) and structural mappings among their elements [Fauconnier]. Some of these are learned at an early age, common to people in general, and unconciously applied. Others, specific to their (specialized) domain, are learned as an an adult, shared among the members of the discipline, and are skills, whose unconcious application is because of training and experience[MacEachern], or are conciously applied[Fauconnier].
Metaphors provide a basis for compositional semantics of the natural and abstract world. Analysis of language use shows [Lakoff,Johnson, Lakoff&Johnson] the pervasiveness of conceptual metaphors for both primary concepts and for their composition. Primary metaphors become part of our cognitive unconscious automatically, beginning in infancy[Lakoff&Johnson] providing experiential semantics for abstract concepts and activities. Complex concepts are structured by structural metaphors and mappings[Johnson] to included or associated primary or complex metaphors. As an example of metaphorical semantics from molecular biology (Figure 1) consider the following sentence: "a strand of DNA consists of pairs of bases" where "strand" is metaphorically a path(or line) and "base pairs" are at the positions of steps(or points) along the path.
Mappings of various kinds between cognitive models appear to be at the heart of what we mean when we say we understand some concept[Fauc]. Projection mappings use the structure, and vocabulary, of one domain to understand some other domain. Function mappings structure correspondences, organizing the knowledge in a field. Schema mappings structure situations transfering concepts into new contexts. In the example in figure 1, there is a projection map from the domian of paths, a primary metaphor learned in infancy from (probably) crawling and/or actual observation of different things, animate and inanimate, moving along different paths.
Figure 1
Figure 2
In a conceptual model blend(Figure 2), a person (say a scientist from domain 1) is trying to develop a conceptual model (model1 in Figure 2) of some subject matter. Another person (say a scientist from domain 2) explains the subject matter from her point of view, using a model (model2 in Figure 2) and terminology from her domain. There are some aspects of domain 2 and domain 1 which have a common, abstract semantic basis (modelg in Figure 2) and these serve to provide some abstract semantic anchors between the two people's concepts. But some of the concepts in model1 and model2 have the same metaphorical basis underpinning them (modelb in Figure 2), allowing(causing) the models to form a blend. The first person can understand model2 in the context of domain1 by use of the blend (modelb) and the vocabulary of model2.
Providing a view of a model, from one domain, in terms of a model in another domain is done by a similar technique. The semantics of the second domain are overlain on the information from the first domain. The interpretation in the second domain uses that domains thought patterns, expanded to include data from the first. We refer to this process as model morphing.
As an elaboration of example of metaphorical semantics from molecular biology above (Figure 1) consider the following further sentence: "The DNA 'zipper' (another metaphor) must attach itself to the gene in an area a certain distance unstream from the area to be 'unzipped', for transcription to take place another certain distance downstream". (Figure 3).
Figure 3
Here the geography metaphor is used to explain ( and model) the process of transcription. The geography model is (something like) a map of some terrain with a number of paths, with the DNA path being specialized to a stream flowing downhill in a valley. The DNA and the stream have the same shape. In an area on the map, "upstream" of the start of a distributary (overlain by the unzipping metaphor) the unzipping occurrs. Following that (i.e. downstream from that place) transcription can take place, modeled as a distributary.
Intuition and insight in the analysis process come from discipline specific
understanding. This comes from direct or indirect experience in the domain.
In systems which are primarily the product of an individual, the understanding
comes from working in the field. In systems developed by a team, the understanding
must be shared, through informal(conversations) or formal (meetings) verbal/visual
interactions or documented representations; preferably all of these. The
technique we describe here provides a framework for developing this
understanding.
A research project to look into the development of a database for
the Genome Spatial Information System (GenoSIS) Project required
an integrated genomic-spatial data model, which formalizes genomics(a computer
analog of DNA molecular biology) along with metric, topological, and metrically
uncertain properties and relationships among genome features. Such a genome
spatial data model facilitates the powerful spatial reasoning and inferences
that are part of spatial information science and thereby allows biologists
to ask questions about the contextual and organizational significance of
the spatial arrangement of genome features. These functional capabilities
should, in turn, aid in the automation of repetitive analytical tasks associated
with the mapping of genome features and drive the discovery of biologically
significant aspects of genome organization and function.
We begin the analysis by attempting to characterize the biological processes we hope to model. We characterize the models and thought patterns in the domain both informally by working with the different disciplines and listening to their explanations; and formally by use of the mechanisms described in this paper. We formally characterize the models and thought patterns in the domain in two ways: by considering the natural graphics that are used within the domain among practitioners and by constructing a lexicon or ontology of the concepts which are essential in the domain. With these tools we develop a conceptual model, which can be formalized as the data model.
Figure 4
Some "natural" graphic depictions of the biological processes of interest to us are shown in Figure 4. First, in the lower left, a picture-like image grounds the conceptualization with a real(istic) image. There are a number of natural maps (natural isomorphisms) from that image. It is mapped onto a spirally wound tube, which is then unwound to produce a depiction as a ribbon with the 5' to 3' molecule strand on top. There are two further natural mappings, the upper one showing a simplified straight line depiction, with supplementary colored segments, and the other showing a blowup making sequence of the individual bases apparent. These natural depictions are used to illustrate explanations of the primary concepts in genomics.
The depictions and the accompanying explanations are part of the raw
material for the analyses described above. Another part of the raw material
is an analysis of the vocabulary in the explanations and from gossaries
or ontologies. As an example of an explanation is as follows:
"An Organism is the largest category for our purposes
here. We wish to compare different organisms in some analyses. Each
organism has one or more Genomes. A genome is made
up of one or more Chromosomes. The genome contains
many Features, which we define to be recognizable functional elements.
A feature may be simple or composite, that is, composed
of other features making up a Feature Set. The genome and
any feature within it are sequences of Base Pairs.
The base pair sequence is the raw primary output of genome
sequencing efforts. Features are determined by applying a number
of algorithms, e.g. pattern matching, to the sequence.
We indicate the Start and Stop positions of a
feature as determined by the algorithm used to locate
the feature. Since DNA is double stranded, for any feature on
DNA we indicate which Strand contains the feature and how far
along the strand it starts. Biologists interested in comparing organisms
seek ... "
In this explanation, words denoting objects(concepts) in the model are boldfaced. Words signaling the use of a metaphor are italicized, and words useful in guiding the modeling are underlined, for instance "contains" signals the container metaphor, Strand and how far along signal the path metaphor specialized to a strand, while genome and any feature within it signal the structure of a genome sequence.
Figure 5
The UML model developed from the analysis is shown in Figure 5 (color coded to highlight the biological science parts and the conputer, genomics parts).
A part of the formal Abstract Interace for the database, using this
model would look like:
Using an (object oriented) "XML++ " (;-))
syntax :
______________________________________________________________
<!ELEMENT feature <-- (2) -->
(feature_type,start_coordinate,end_coordinate,strand
feature_name,feature_symbol?,comment?,time_stamp?,
transcript*)> <--!Semantics:Structure-->
<--!Metaphor:Part-Whole-->
......
<!ELEMENT gene1 is_a feature (annotation_list) >
<--!Semantics:is_a = Structure Addition-->
<--!Semantics:(..) = Structure-->
______________________________________________________________
Here, the structure is given by an XML Element definition (instead of BNF), the formal semantics is given by (a formal model from) a collection of formal models, and the "natural semantic basis is given by (a conceptual metaphor from) a collection of (ground or complex) conceptual metaphors.
The automation of the semantics would be accomplished by pattern matching
at the formal model and the conceptual metaphor levels.
As this project is an outgrowth of software engineering research, its core concepts are about design/research organization; characterizing, making explicit and managing the work products of the design/research efforts; developing prescriptive methods for separating the concerns of the effort, and addressing interaction, interface and interoperation issues between disciplines and multi-domain software (sub)systems.
This work is related to some of the work in the reuse community [WISR, & ?] which addresses conceptual underpinnings or reuse and interoperability [Wileden, Porter, Simos, Capilla, Kiczales, Latour]. It is related to the software architecture community[Garlan&Shaw, ] whose work is one of the major sources of organizing, and working at, the Abstract Implementation level in the model presented below. It is also related to, but addresses a different aspect of collaboration than the computer supported cooperative work community[CSCW, ECSCW] who focus mainly on computed mediated interaction, whereas we focus on the perceptual, cognitive and (human) communication aspects of the problem.
The technique creates models and interfaces to software components that are valid with respect to scientific experiments. They accurately reflect the concepts used in the design of experiments by capturing them from the most accurate and insightful representations available. They are structured and given semantics in terms of models isomorphic to those apparent in the minds of discipline members.
The models and interfaces structurally, semantically, and pragmatically conform to each discipline's conceptual models because they capture the essence of the discipline's explanations[Tuffte]. The multi-discipline models are blended by the same mechanisms used by discipline members. The models and interfaces use the thought patterns and activities of each discipline.
Because they capture the essence of the discipline's explanations, the
models and interfaces to software components should resonate with the intuitions
of each discipline. Because they are blended by the same mechanisms used
by discipline member they should support development of new multi-discipline
intiutions and creation of new multidiscipline insights.
F. Belz, D. Suthers, and T. Wheeler, "Architecture Abstraction Hierarchy - Reference Model," IEEE Learning Technology (P1484) Guideline (P1484.1), 1997.
F. P. Brooks, The Mythical Man-Month, Reading, MA: Addison Wesley, 1975, 1996.
C. Bult, et.al. "Mouse Genome Informatics in a New Age of Biological Inquiry" Bio-Informatics and Biomedical Engineering (BIBE2000) Arlington VA Nov. 2000
N. Chomsky, “Linguistics and Adjacent Fields: A Personal View,” The Chomskyan Turn,(A. Kasher, ed.), New York: Blackwell, 1991.
G. M. Copper, The Cell: A Molecular Approach, Washington, D.C.: ASM Press, 1997.
S. Davidson, "BioKlesli: a Digital Library for Biomedical Research" Intl. J. Digit. Lib. 1(1) 1997
G. Fauconnier, "Mappings in Thought and Language" Cambridge Univ. Press 1997
C. Gallistel, Organization of Learning, Cambridge, MA: MIT Press, 1993
D. Garlan, R. Allen, and J. Ockerbloom, “Architectural mismatch, or, why it’s hard to build systems out of existing parts,” 17th International Conference on Software Engineering, ICSE 95, April 1995.
J. Guttag, J. Horning "Formal Specification as a Design Tool"Formal Specification Case Studies MIT Press 1989
D. Hester, D. Parnas, and D. Utter, "Using Documentation as a Software Design Medium," Bell System Technical Journal, V60, 1981.
R. Jackendorf, "Cognitive Architecture of Language" MIT Press, 1984
G. Kiczales, “Aspect-Oriented Programming,” Eighth Annual Workshop on Software Reuse, March 1997.
G. Lakoff, Women, Fire, and Dangerous Things-What Categories Reveal About the Mind, Chicago: University of Chicago Press, 1987.
G. Lakoff and M. Johnson, Philosophy in the Flesh-The Embodied Mind and Its Challenge to Western Thought, New York: Basic Books, 1999.
L. Latour, T. J. Wheeler,and B. Frakes, "Descriptive and predictive aspects of the 3C's model: SETA1 working group summary," First Symposium on Environments and Tools for Ada, Ada Letters, XI, 3, (Spring 1991).
J. Mandler "Preverbal Representation and Language" In Language and Space Bloom, Peterson, Nadel, Garrett Eds. MIT Press 1996
D. Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, San Francisco: W.H. Freeman, 1982.
A.L. MacEachren, How Maps Work-Representation, Visualization, and Design, New York: The Guilford Press, 1995.
D. Norman "The Design of Everyday Things" Penguin 1986
N. Paton et.al. "Conceptual Modelling of Genomic Information"Bioinformatics V16,no.6 2000
S. Pinkler, The Language Instinct: How the Mind Creates Language, New York: William Morrow and Co., 1994.
M. I. Posner, ed., Foundations of Cognitive Science, Cambridge, MA: MIT Press, 1996.
M. Roth, F. Ozcan, L. Haas, "Don't Scrap it, Wrap it, A Wrapper Architecture for Legacy Data Sources" In Proc. VLDB Athens Greece Aug. 1997
M. Shaw and D. Garlan, Software Architecture: Perspectives on an Emerging
Discipline, Upper Saddle River, NJ: Prentice Hall, 1996.
SIGSOFT, Fifth Symposium on Software Reusability, May 1999.
M. A. Simos, “Domain Envisioning: A Lightweight, Incremental Approach to Getting a Company Started with Systematic Reuse,” Ninth Annual Workshop on Software Reuse, January 1999.
J. F. Sowa, Knowledge Representation-Logical, Philosophical, and Computational Foundations, Cambridge, MA: Brooks/Cole, 2000.
R. E. Slavin, Cooperative learning: Theory, research, and practice.
Englewood Cliffs, NJ: Prentice-Hall, 1990.
Spatial-Genomics Project, University of Maine
E. R. Tufte, Envisioning Information, Cheshire, CT: Graphics Press, 1990.
T. J. Wheeler and J. Richardson "A Two Layered Interfacing Architecture," Journal of Standards & Interfaces, v.13, Elsevier-North Holland, 1991.
T. J. Wheeler, "Object Database Interface," DARPA Open Object Oriented Database Workshop, Dallas, Tx., 1992.
T. Wheeler, M. Dolan, and J. Richardson, A Framework for Interdisciplinary Collaboration Univ. of Maine CS Report, 2000
L. Wong, "Kleisli, its Exchange Format, Support Tools, and an Application in Protein Interaction Extraction" Bio-Informatics and Biomedical Engineering (BIBE2000) Arlington VA Nov. 2000
J. Wu, D. Marceau, "Modeling Complex Ecological Systems: an Introduction"
Ecological Modelling 2002
<!DOCTYPE organisms[ <!ELEMENT organisms (organism*)> <!-- !Model: Set --> <!-- !Semantics:Collection --> <!-- *************************************************************** --> <!ELEMENT organism (kingdom,genus,species,subtype?, common_name,comment?,genome+)> <!-- (1) --> <!-- !Model:Structure--> <!-- !Semantics:Whole/Part Construction--> <!ATTLIST organism id ID #REQUIRED <!-- !Model:Unique Nat Num--> <!-- !Semantics:(Source-Path)-Goal--> Name (#PCDATA)> <!-- Model:Name --> <!-- !Semantics:Symbol--> <!ELEMENT kingdom (#PCDATA)> <!-- !Model:Name --> <!-- !Semantics:Symbol--> <!ELEMENT genus (#PCDATA)> <!-- !Model:Name --> <!ELEMENT species (#PCDATA)> <!-- !Model:Name --> <!ELEMENT subtype (#PCDATA)> <!-- !Model:Name --> <!ELEMENT common_name (#PCDATA)> <!-- !Model:Name --> <!ELEMENT comment (#PCDATA)> <!-- !Model:ch* --> <!-- !Semantics:Points_on_Line--> <!-- *************************************************************** --> <!ELEMENT feature (feature_type,start_coordinate, end_coordinate,strand,feature_name,feature_symbol?, comment?,time_stamp?,transcript*)> <!-- (2) --> <!-- !Model:Structure--> <!-- !Semantics:Construction--> <!ATTLIST feature id ID #REQUIRED> ` <!-- !Model:Unique Nat Num--> <!-- !Semantics:(Source-Path)-Goal--> <!ATTLIST feature idref IDREF #REQUIRED> <!-- !Model:REF--> <!-- !Semantics:Source-(Path-Goal)--> <!ELEMENT feature_type (#PCDATA)> <!-- !Model:Name --> <!-- !Semantics:Symbol--> <!ELEMENT start_coordinate (#PCDATA)> <!-- !Model:Nat Num --> <!-- !Semantics:Position_on_Line--> <!ELEMENT end_coordinate (#PCDATA)> <!-- !Model:Nat Num --> <!-- !Semantics:Position_on_Line--> <!ELEMENT strand (#PCDATA)> <!-- !Model:Name(="plus","minus") --> <!ELEMENT feature_name (#PCDATA)> <!-- !Model:Name --> <!ELEMENT feature_symbol (#PCDATA)> <!-- !Model:Name --> <!ELEMENT DNA_SEQUENCE (#PCDATA)> <!-- !Model:ch* --> <!-- !Semantics:Points_on_Line--> <!ELEMENT comment (#PCDATA)> )> <!-- !Model:ch* --> <!ELEMENT time_stamp (#PCDATA)> <!-- !Model:Time -->
<!-- !Semantics:Points_on_Line--> <!-- *************************************************************** --> <!ELEMENT gene1 is_a feature (transcript,annotation_list) > <!-- (3) --> <!-- !Model: is_a = SubType (& deRef)--> <!-- !Semantics:Additional Construction & (Source-Path)-Goal --> <!-- !Model: (..) = Structure --> <!-- !Semantics:Construction --> <!ELEMENT transcript (protein|enzyme) > <!-- !Model:Union --> <!-- !Semantics:Choice--> <!ELEMENT protein(sequence_length,amino_acid_sequence) > <!-- !Model:Structure--> <!-- !Semantics:Construction--> <!ELEMENT sequence_length (#PCDATA)> )> <!-- !Model:Nat Num --> <!-- !Semantics:Line_Segment--> <!ELEMENT amino_acid_sequence> <!-- !Model:ch* --> <!-- !Semantics:Points_on_Line--> <!ELEMENT annotation_list (annotation)*> <!-- !Model: annot* --> <!-- !Semantics:Points_on_Line--> <!ELEMENT annotation(annot_type,annot_val)> <!-- !Model:Structure--> <!-- !Semantics:Part/Whole Construction--> <!ELEMENT annot_type (#PCDATA)> <!-- !Model:Name --> <!-- !Semantics:Symbol--> <!ELEMENT annot_val (#PCDATA)> <!-- !Model:ch* --> <!-- !Semantics:Points_on_Line--> <!-- *************************************************************** --> <!ELEMENT promoter is_a feature (annotation_list) > <!-- (4) --> <!-- !Model: is_a = SubType & deRef--> <!-- !Semantics:Additional Construction & (Source-Path)-Goal --> <!-- !Model: (..) = Structure --> <!-- !Semantics:Construction --> <!ELEMENT annotation_list (annotation)*> <!-- !Model: annot* --> <!-- !Semantics:Points_on_Line--> <!ELEMENT annotation(annot_type,annot_val)> <!-- !Model:Structure--> <!-- !Semantics:Construction--> <!ELEMENT annot_type (#PCDATA)> <!-- !Model:Name --> <!-- !Semantics:Symbol--> <!ELEMENT annot_val (#PCDATA)> <!-- !Model:ch* --> <!-- !Semantics:Points_on_Line--> <!-- *************************************************************** --> <!ELEMENT gene2 is_a feature view_of(promoter, gene1) > <!-- (5) --> <!-- !Model: is_a = SubType & deRef--> <!-- !Semantics:Additional Construction & (Source-Path)-Goal --> <!-- !Model:view_of = View --> <!-- !Semantics:Surface_of --> <!-- !Model: (..) = Structure --> <!-- !Semantics:Construction --> ]>
<!-- *************************************************************** -->
<!-- *************************************************************** -->
<!-- INSTANCES: --> <!-- Eukaryota_Rodentia_Mus_musculus_GALT --> <!-- dtd(1) --> <ORGANISM Name=Mus musculus> <KINGDOM> Eukaryota </KINGDOM> <GENUS> Rodentia </GENUS> <SPECIES> Mus musculus </SPECIES> <strain> B6/CGAFIJ </strain> <db_xref> taxon:10090 </db_xref> <sex> female </sex> <tissue_type> liver </tissue_type> <COMMON_NAME> House Mouse </COMMON_NAME> <annotation> This reference sequence was provided by the Mouse Genome database (MGD). </annotation> <CHROMOSOME> <CHROMOSOME_NUMER> 4 </CHROMOSOME_NUMBER> <CHROMOSOME_NAME> chromosome 4 </CHROMOSOME_NAME> <CHROMOSOME_STRUCTURE> linear </CHROMOSOME_STRUCTURE> <STRAND> plus </STRAND> <Symbol> GALT <Symbol> <Feature_Name> galactose-1-phosphate uridyl transferase </Feature_Name> <cM_Position> 19.9 </cM_Position> <MGI_Accession_ID> M:96265 <MGI_Accession_ID> <FEATURE> <FEATURE_TYPE> source </FEATURE_TYPE> <START_COORDINATE> 1 </START_COORDINATE> <END_COORDINATE> 13731 </END_COORDINATE> <FEATURE_NAME> GALT </FEATURE_NAME> <DNA_SEQUENCE> 1 ttcagggtgg gtgggcgggg ggagacatgg aatggggcgc tcaccttgtg taccttaggt 61 caattcgtgt ggcctcacgt cgcatagcga cgcgatcctg agcagcgcca cgaggcttca 121 gaggcggacc gatggcagcg accttccggg cgagcgaaca ccagcatatt cgctacaacc 181 cgctccagga cgagtgggtg ttagtgtcgg ctcatcgcat gaagcggccc tggcaaggac 241 aagtggagcc ccagcttctg aagacagtgc cccgccacga cccactcaac cctctgtgtc 301 ccggggccac acgagctaat ggggaggtga atccccacta tgatggtacc tttctgtttg 361 acaatgactt cccggctctg cagcccgatg ctccggatcc aggacccagt gaccaccctc 421 tcttccgagc agaggccgcc agaggagttt gtaaggtcat gtgcttccac ccctggtcgg 481 atgtgacgct gccactcatg tctgtccctg agatccgagc tgtcatcgat gcatgggcct 541 cagtcacaga ggagctgggt gcccagtacc cttgggtgca gatctttgaa aataaaggag 601 ccatgatggg ctgttctaac ccccatcccc actgccaggt ttgggctagc agcttcctgc 661 cagatatcgc ccagcgtgaa gagcgatccc agcagaccta tcacagccag catggaaaac 721 ctttgttatt ggaatatggt caccaagagc tcctcaggaa ggaacgtctg gtcctaacca 781 gtgagcactg gatagttctg gtccccttct gggcagtgtg gcctttccag acacttctgc 841 tgccccggcg gcacgtgcgg cggctacctg agctgaaccc cgctgagcgt gatctcgcct 901 ccatcatgaa gaagctcttg accaagtacg acaatctatt tgagacatcc tttccctact 961 ccatgggctg gcatggggct cccacgggat taaagactgg agccacctgt gaccactggc 1021 agctccacgc ccactactac cccccacttc tgcgatccgc aactgtccgg aagttcatgg 1081 ttggaccgtg tacactggca gctcacgccc actacctacc cccacttctc ggatccgcaa 1141 ctgtctatga aatgcttgcc caggcccagc gtgacctcac tcccgaacag gccccagaaa 1201 gattaagggc gcttcccgag gtacactatt gcctggcgca gaaagacaag gaaacggcag 1261 gatcaccatt gcttgactgt gaccacatca gggccttgaa tctttgtacc tgacagacct 1321 gggacctgga gttcgggcag atgtgacatc aataaaactg cgtctcacat ttt </DNA_SEQUENCE> </FEATURE> <!-- *************************************************************** --> <-- dtd(3) --> <GENE1> <FEATURE_TYPE> gene </FEATURE_TYPE> <START_COORDINATE> 132 </START_COORDINATE> <END_COORDINATE> 1313 </END_COORDINATE> <Feature_Name> GALT </Feature_Name> <DNA_SEQUENCE> 121 atggcagcg accttccggg cgagcgaaca ccagcatatt cgctacaacc 181 cgctccagga cgagtgggtg ttagtgtcgg ctcatcgcat gaagcggccc tggcaaggac 241 aagtggagcc ccagcttctg aagacagtgc cccgccacga cccactcaac cctctgtgtc 301 ccggggccac acgagctaat ggggaggtga atccccacta tgatggtacc tttctgtttg 361 acaatgactt cccggctctg cagcccgatg ctccggatcc aggacccagt gaccaccctc 421 tcttccgagc agaggccgcc agaggagttt gtaaggtcat gtgcttccac ccctggtcgg 481 atgtgacgct gccactcatg tctgtccctg agatccgagc tgtcatcgat gcatgggcct 541 cagtcacaga ggagctgggt gcccagtacc cttgggtgca gatctttgaa aataaaggag 601 ccatgatggg ctgttctaac ccccatcccc actgccaggt ttgggctagc agcttcctgc 661 cagatatcgc ccagcgtgaa gagcgatccc agcagaccta tcacagccag catggaaaac 721 ctttgttatt ggaatatggt caccaagagc tcctcaggaa ggaacgtctg gtcctaacca 781 gtgagcactg gatagttctg gtccccttct gggcagtgtg gcctttccag acacttctgc 841 tgccccggcg gcacgtgcgg cggctacctg agctgaaccc cgctgagcgt gatctcgcct 901 ccatcatgaa gaagctcttg accaagtacg acaatctatt tgagacatcc tttccctact 961 ccatgggctg gcatggggct cccacgggat taaagactgg agccacctgt gaccactggc 1021 agctccacgc ccactactac cccccacttc tgcgatccgc aactgtccgg aagttcatgg 1081 ttggaccgtg tacactggca gctcacgccc actacctacc cccacttctc ggatccgcaa 1141 ctgtctatga aatgcttgcc caggcccagc gtgacctcac tcccgaacag gccccagaaa 1201 gattaagggc gcttcccgag gtacactatt gcctggcgca gaaagacaag gaaacggcag 1261 gatcaccatt gcttgactgt gaccacatca gggccttgaa tctttgtacc tga </DNA_SEQUENCE> <TRANSCRIPT> <PROTEIN> <protein_id> AAA37658.1" </protein_id> <db_xref> GI:193422" </db_xref> <SEQUENCE_LENGTH> 109 </SEQUENCE_LENGTH> <AMINO_ACID_SEQUENCE> MAATFRASEHQHIRYNPLQDEWVLVSAHRMKRPWQGQVEPQLLKTVPRHDPLNPLCPG ATRANGEVNPHYDGTFLFDNDFPALQPDAPDPGPSDHPLFRAEAARGVCKVMCFHPWS DVTLPLMSVPEIRAVIDAWASVTEELGAQYPWVQIFENKGAMMGCSNPHPHCQVWASS FLPDIAQREERSQQTYHSQHGKPLLLEYGHQELLRKERLVLTSEHWIVLVPFWAVWPF QTLLLPRRHVRRLPELNPAERDLASIMKKLLTKYDNLFETSFPYSMGWHGAPTGLKTG ATCDHWQLHAHYYPPLLRSATVRKFMVGPCTLAAHAHYLPPLLGSATVYEMLAQAQRD LTPEQAPERLRALPEVHYCLAQKDKETAGSPLLDCDHIRALNLCT </AMINO_ACID_SEQUENCE> </PROTEIN> </GENE1> <!-- *************************************************************** --> <-- dtd(5) --> <GENE2> <GENE1> <<FEATURE_TYPE> gene </FEATURE_TYPE> <START_COORDINATE> 132 </START_COORDINATE> <END_COORDINATE> 1313 </END_COORDINATE> <Feature_Name> GALT </Feature_Name> <DNA_SEQUENCE> 121> atggcagcg accttccggg cgagcgaaca ccagcatatt cgctacaacc ... <PROMOTER> <FEATURE_TYPE> UAS </FEATURE_TYPE> <START_COORDINATE> 13 </START_COORDINATE> <END_COORDINATE> 22 </END_COORDINATE> <DNA_SEQUENCE> 13> gggcgggggg </DNA_SEQUENCE> </PROMOTER> </GENE1> </GENE2> <!-- *************************************************************** -->