Bioinformatics Advance Access originally published online on February 15, 2006
Bioinformatics 2006 22(9):1137-1143; doi:10.1093/bioinformatics/btl054
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Adapters, shims, and glueservice interoperability for in silico experiments
1 Department of Computer Science, University of Bonn D-53117 Bonn, Germany
2 Department of Computer Science, Humboldt-Universität Berlin D-10099 Berlin, Germany
3 Clinic for Psychiatry and Psychotherapy, University of Bonn Medical Center D-53105 Bonn, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Computationally, in silico experiments in biology are workflows describing the collaboration of people, data and methods. The Grid and Web services are proposed to be the next generation infrastructure supporting the deployment of bioinformatics workflows. But the growing number of autonomous and heterogeneous services pose challenges to the used middleware w.r.t. composition, i.e. discovery and interoperability of services required within in silico experiments. In the IRIS project, we handle the problem of service interoperability by a semi-automatic procedure for identifying and placing customizable adapters into workflows built by service composition.
Results: We show the effectiveness and robustness of the software-aided composition procedure by a case study in the field of life science. In this study we combine different database services with different analysis services with the objective of discovering required adapters. Our experiments show that we can identify relevant adapters with high precision and recall.
Availability: The IRIS software and the profile language can be downloaded from http://www.cs.uni-bonn.de/III/bio/iris
Contact: ur{at}iai.uni-bonn.de
| 1 INTRODUCTION |
|---|
|
|
|---|
The bioinformatics research community has created several hundred databases and analysis methods (Galperin, 2005). These resources (services) differ in data formats, interfaces and semantics of concepts used. In silico experiments, however, requiring a flexible combination of services in bioinformatics workflows by applying selected methods to data extracted from databases (Stevens et al., 2003b), but high dynamics in the field of bioinformatics is making the development of stable data formats and semantics required for easy service interoperability almost impossible. These characteristics make the composition of services into meaningful workflows supporting in silico experiments a difficult, often time-consuming and error-prone task. Modern high-level middleware is necessitated to properly address current challenges of composition which involves the discovery and the interoperability of often heterogeneous services.
Today, Web services are proposed for building bioinformatics services. A Web service is a software entity designed to support machine-to-machine interaction over a network. Interaction is reached by standardized interface and message protocols based on the eXtensible Markup Language (XML). The preferred standard for describing interfaces is the Web Services Description Language (WSDL). WSDL defines an interface in a port type that specifies the messages which a Web service can produce and consume. Messages are described by parameter names and data types. WSDL, however, ignores semantic issues on the application level, i.e. it does not contain semantic information about the services and the data. Such interfaces cannot distinguish between a sequence in FASTA format or in EMBL format (both are represented as strings) nor can it distinguish between a DNA sequence and a journal article. Hence, Web services failed to achieve semantic interoperability in bioinformatics (Wilkinson et al., 2005). For discovery and integration the Web services technology suggests the Universal Description, Discovery and Integration (UDDI). It supports mainly keyword-based retrieval that can be realized by using term frequency-inverse document frequency (TF-IDF). But UDDI can neither create new service compositions nor does it support semantic-aware service discovery.
The Grid is proposed to be the next generation middleware supporting design and execution of bioinformatics workflows. Workflows are being defined by control flow and data flow of services. Within the data flow, service outputs of previously executed services are used as inputs to following services. Today, several BioGrid middlewares are underway, like myGrid (Stevens et al., 2003a) or BioOpera (Bausch et al., 2003), supporting the discovery and orchestration of services to form workflows. The creation of workflows is often done through visual builder tools like Taverna (Oinn et al., 2004). However, today's Grid middleware lacks suitable mechanisms for handling the issue of service interoperability. We tackle this problem by customizable intermediates, called adapters. Adapters like services are technically represented as Web service, but they have a different application purpose. A service is an entity providing business functionality. They represent external and internally created bioinformatics databases and applications. Adapters are entities that are automatically identified and combined in order to glue these (business) services (Radetzki et al., 2003). They are concerned with data transformation, identifier mapping and so forth.
Adapters are described by WSDL and additionally by profiles in the Mediator Profile Language (MPL, http://www.cs.uni-bonn.de/III/bio/iris/MediatorProfile.owl) a language developed in the IRIS project1. These profiles are stored in a repository, called adapter pool. Registered adapters are identified based on a novel Software-aided Procedure (SP) supported by IRIS (Fig. 1). In SP we automatically derive a query against the adapter pool from the input and output descriptions of services connected by a data flow, in order to identify relevant adapters and adapter compositions. This query contains syntactic information as well as semantic concepts of a domain ontology. The discovery and matchmaking of appropriate adapters is a difficult task, because often there are only adapters transforming the data partially, thus requiring additional adapters. Our matchmaking algorithms provide the user with both a ranked list of (exact and less exact) matching adapters and a possible composition of adapters that form an operation network. These algorithms further support syntactic and semantic retrieval that is based on data types and ontologies, respectively (Radetzki et al., 2004). When all matching adapters and adapter compositions (composite adapters) are generated, the user selects the most appropriate adapters for the given workflow. Finally these adapters are customized and stored in the adapter pool for future workflows.
|
This paper is organized as follows: Section 2 describes the generation of a query against the adapter pool. Section 3 addresses the aspects of discovering suitable adapters. The SP will be evaluated in Section 4 by an example bioinformatics experiment. In Section 5 this use case will successively be modified to simulate evolution in in silico experiments. Section 6 discusses our approach and Section 7 gives a conclusion of this paper.
| 2 DERIVING ADAPTER REQUIREMENTS |
|---|
|
|
|---|
The first phase of the SP is query generation, i.e. deriving the capabilities an adapter has to fulfill in order to glue between different services of a workflow. Hence, we have to analyze the WSDL descriptions of the data producing and data consuming services and extract the knowledge for the query. This knowledge should contain semantic as well as syntactic aspects. These queries are expressed by MPL in form of query profiles. An adapter with the same or a similar profile is considered as an answer to the query (see Section 3).
A major problem is that port types of WSDL only declare syntactic aspects and textual descriptions of services, but lack precisely defined semantics. It is not possible to assign conceptsspecified by a domain ontologyto a WSDL description. The parameter names within service descriptions are often meaningless, and the data types of the parameters are often misleading. For instance often a character string is declared but the string contains a complex XML document, e.g. representing a sequence in FASTA format. Thus, it is difficult to exactly specifysemantically and syntacticallythe requirements for wanted adapters.
We tackle this problem by an algorithm used at design-time of the workflow. Our algorithm analyzes the available WSDL information of the participating services. In the first step of this algorithm the syntactic signature (name and data type of parameters) as well as the available documentation specified in WSDL are included into the query profile. Thus, the WSDL port type information is fully included in MPL.
In the next step we analyze the parameter names and operation names as well as the available documentations in order to identify descriptive terms that can be used for matchmaking with concepts from domain ontologies. These names are split as far as possible, in order to specify more meaningful terms, e.g. we split getNucSeq in get, nuc and seq. After that, resulting terms of the splitting as well as terms from the documentations are normalized by stemming algorithms and included in a bag of terms. Finally we use stopword lists, in order to filter non-specific terms, like get. The user can further decide if linguistic methods defined by general-purpose ontologies should be applied to the bag of terms, like synonyms, hyponyms or abbreviations. These elements can enhance the bag of terms making it less exact. Finally, the algorithm uses these terms for concept matching with a domain ontology. If a concept matches it will be added to the query profile. This is a major advantage of MPL over WSDL, because MPL allows the explicit annotation of concepts. The domain ontology can arise from a community-driven approach, like BioMOBY's object ontology (MOBY-S, http://biomoby.org/resources/moby-s/objects) (Wilkinson and Links, 2002) containing concepts that are related to data formats and data types usually used in bioinformatics. Moreover, it is possible to use a single domain ontology as well as several domain ontologies. Finally, the user has the possibility to modify and enhance the resulting query profile.
| 3 DISCOVERY OF NEEDED ADAPTERS |
|---|
|
|
|---|
A derived query profile specifies the requirements an adapter has to fulfill in order to glue together heterogeneous bioinformatics services of a designed workflow. In the discovery phase, we have to identify adapters that exactly match the requirements or that can be plugged-in between the participating services of the bioinformatics workflow.
Owing to the fact that we cannot expect to identify a single adapter that can directly be plugged-in nor an adapter that exactly matches the requirements, the discovery should also support relaxed matchings. There are several reasons obstructing a simple solution: first of all, the derived query profile is rather diffuse than exact (see Section 2). The specifications of the capabilities of adapters are also diffuse, because some calculations are very complex, making it difficult to describe the effects of these calculations or the calculations themselves by pure logical conditions. Furthermore, adapters can be built by third parties which are independent from those who built the services. Finally, it is more likely that a combination of several adapters may fulfill the requirements of the query, i.e. that their profiles together match the query profile. Thus, matching algorithms have to support relaxed matchings as well as they have to identify new compositions of adapters.
We address these requirements by a sophisticated registry and discovery unit called adapter pool. The adapter pool contains the profiles of registered adapters. An adapter profile that is declared by MPL specifies in machine- and human-readable terms the capabilities of a customizable adapter. In order to be available for discovery, every adapter has to provide such a profile that is advertized by the adapter pool.
The retrieval engine of the adapter pool supports different kinds of matchmaking algorithms. Some of them act as filters, while other realize relaxed matchings of adapters that also consider possible compositions of adapters (Radetzki and Cremers, 2004). The latter algorithms are based on available syntactic and semantic information of adapters, while the former use tree structures, like adapter and service taxonomies, or keyword bags. In the following we limit our discussion on the signature (syntax) and concept (semantic) matching. Information on remaining filter algorithms can be found in Radetzki and Cremers (2004).
The algorithms for signature and concept matching operate on the parameter names, data types and the concepts of a domain ontology. They identify new compositions of adapters, support polymorphism as well as linguistic analysis like synonyms and abbreviations. Polymorphism is applied to the data types as well as the concepts, whereas linguistic analysis is applied to parameter names and descriptions.
A data type t' is a subtype of a data type t, iff every instance of t' is an instance of t. The subtype hierarchies can be defined explicitly by the type system, but can also be implicitly determined by inference rules (Sycara et al., 2002). Very similar, a concept c subsumes a concept c', iff every instance of the concept c' is also an instance of the concept c. Likewise to subtype hierarchies, subsumption hierarchies can be defined explicitly by an ontology or can be implicitly derived by inference rules. Explicit subsumption relations of concepts are defined by is-a relations between concepts of an ontology.
In our adapter pool we have designed several database relations: for the adapter profiles, the subtype as well as the subsumption hierarchies, the linguistic methods, like synonyms, and the interconnections between different adapters. Let r
be the relation for subtypes, then r
(t, t') holds iff t' is a subtype of t. Let
be the relation for subsumption hierarchies, then
(c, c') holds iff c subsumes c'. Let r
be the relation for synonyms and abbreviations, then r
(n, n') holds iff n' is a synonym or an abbreviation of n. Then, two adapters A and B can be automatically composed by the discovery unit, if one or more of the rules in Equation (1) hold.
Let pA be an output parameter of A and let pB be an input parameter of B. Let
be data types, let
be sets of concepts, and let
be parameter names assigned to parameter pA and pB respectively.
|
| (1) |
Sets of concepts like
are assigned to parameters to tackle the problem that adapters often consume and produce different data belonging to several concepts.
If a new adapter is advertized in the adapter pool, the capabilities are inserted into the corresponding database relation. Furthermore interconnections to other adapters are calculated based on Equation (1). To achieve this, every parameter name n of the newly advertized adapter is compared with parameter names n' of already registered adapters, also considering the relation r
, i.e. the algorithm verifies that r
(n, n') holds. At the same time the corresponding data types and concepts are compared also considering the relation r
and
. Here the algorithm checks, if r
holds for the data types and
holds for the concepts. If
holds for two parameters, we include an adapter link into the domain ontology between the concept of the input parameter and the concept of the output parameter. Thus, during discovery the ontology can act like an index structure for registered adapters.
Depending on the precision required for the given workflow the user can decide, if only one or more rules have to be fulfilled in order to allow a valid interconnection between two adapters. If the algorithm considers the data types only [Equation (1a)], it might result in poor precision. Taking also the names of the parameters into account may lead to a better precision, but a worse recall, because the matching may be too strict [Equation (1a) and (1c)]. If concepts are available, then interconnections based on concepts [Equation (1b)] are often the best choice, because in most cases concepts are more accurate and meaningful than parameter names and data types.
During matchmaking, the algorithm iteratively has to query the database relation for interconnections, in order to retrieve adapters and new adapter compositions. Compositions can be identified, if we consider that the relation for interconnections define a graph of adapters. Furthermore, because the ontology is a graph, one can traverse this graph using adapter links as well as other available relations (e.g. is-a, part-of), in order to identify paths from one concept to another concept. Here, the source concept and target concept are specified by the query profile. Possible paths specify compositions of adapters, even though a concrete adapter is missing in the path. This can happen if no adapter link is available. In this case the algorithm inserts an empty so-called proxy adapter and provides its corresponding profile.
The result of the matchmaking process is an adapter composition in form of an operation network that fulfills the query profile. Figure 2 illustrates such a composition. For instance, adapter A may process an identifier (accession number in NCBI's GenBank) and generates different identifiers for other services (identifier for MGI database, http://www.informatics.jax.org/, or EMBL database). Because the output corresponds to different concepts in the domain ontology, we call the corresponding parameter a disjunctive parameter. Now it is possible to split the composition into independent subparts, each subpart processing one of the possible concepts of the disjunctive parameter. In our example A, E, F and A, [B, C], D, where B and C can be executed in parallel. This is a difference to adapter D and F, where all the parameters are mandatory and need to be filled up by previous adapters. Optional parameters are not included for simplicity. However, it is not required that all data for input parameters of one adapter come from the same adapter. For instance, D gets data from B and C. Disjunctions and conjunctions of parameters can appear for input parameters as well as for output parameters.
|
In the last step of the SP, the user may customize the adapters of a composition in order to bridge the bioinformatics services of the given workflow. This customization can sometimes include the realization of required proxy adapters. In some cases these adapters can be automatically created by schema matching approaches (Rahm and Bernstein, 2001) with the use of ontologies (Bowers and Ludäscher, 2004). For more information about this subject see Radetzki (2005). Finally the customized adapters and the composition are stored in the adapter pool for future workflows.
| 4 SERVICE INTEROPERABILITY USING IRIS |
|---|
|
|
|---|
Owing to the lack of space and time, we cannot discuss a complete in silico experiment, but a small fraction of two types of services often occurring combined in a bioinformatics workflow. Consider the case where a user wants to analyze and annotate nucleotide or protein sequences, e.g. related to schizophrenia, within the human genome and between the target genome and related species. Identification of homologous genes within the genome provides a resource for annotation as well as for evolutionary studies to examine gene and genome duplication events within humans. This can be achieved by alignment of the genome with sequences from related species through methods like BLAST or FASTA (Yuan et al., 2005). The sequence information can be provided in various formats by different database services. The use of a concrete database service depends on the species currently investigated. Figure 3 depicts this fraction of a bioinformatics workflow.
|
Services, like those represented in Figure 3, are independently built using different data formats and semantics. Thus the service interoperability is not guaranteed as the output of one service often is in a different format than the input required by the next service. Further, biologists demand to replace services without jeopardizing the entire workflow, e.g. replacing a BLAST service operating on one database with a BLAST or FASTA service operating on a different database. Here, the problem of heterogeneity arises again, because interfaces and data structures of the replaced and replacing services may differ in unexpected ways. At this point, the SP of the IRIS platform exerts its advantage.
For practical illustration of the SP we assume two composed services. Let the XEMBL service (http://www.ebi.ac.uk/xembl/) and the DDBJ BLAST service (http://xml.nig.ac.jp/wsdl/) be composed in a workflow, whereby XEMBL is the data provider and DDBJ BLAST is the data consumer. Table 1 depicts these services. The XEMBL service provides the operation getNucSeq that returns the parameter result of type string. Parameters also contain a textual documentation. But parameter names like result support no explicit semantical meaning. For example result says nothing about the contents of the result or how it is represented. The data type is misleading too, because it is declared as string, but it returns a complex XML document in form of BSML or AGAVE. Further, the DDBJ BLAST service requires besides others sequence information that is contained in BSML and AGAVE, but that is not directly equivalent to BSML or AGAVE. Thus, in order to have a fully functional workflow, we require an adapter that can glue between BSML and AGAVE, respectively, and the biological sequence.
|
In this scenario we assume that the adapter pool does not contain such an adapter fulfilling the request. However, the adapter pool contains two other adapters specified in Table 2: a bio data adapter and an agave adapter. The bio data adapter can convert biological information available in different data formats, like AGAVE or BSML, into a new data representation. This is semantically specified by the concept biological data format. The behavior of its operation convert can be manipulated by the specified property. Properties can be used during the customization phase in SP. A target of a customization, here the convert operation, is specified by a link (
). In the case of the bio data adapter, one can determine the output format by using the value 03 of property targetFormat. The agave adapter can only process biological content that is represented in the AGAVE data format. This adapter can project on information contained in such documents, e.g. the annotated gene or the publications. The desired target is specified by the projectOn property. The different parameters and properties are semantically described in MPL by attaching concepts of a domain ontology. In this example, we use a small portion of a domain ontology represented in Figure 4. Such a domain ontology should contain information on data formats and data types usually used by bioinformatics services.
|
|
Under these circumstances of the specified bioinformatics workflow as well as the advertized adapters we apply the SP implemented by the IRIS platform. In the first step the query generation of SP generates the query profile depicted in Table 3. This query profile specifies the requirements for adapters gluing the XEMBL service with the DDBJ BLAST service. Besides the syntactic information this profile contains also semantical concepts, e.g. agave, sequence. These are identified by concept matching based on the used domain ontology (Fig. 4). The concept data format is identified, because synonyms are applied to the term format, which is used in the description of the XEMBL service.
|
Next, the discovery phase of the SP is invoked. In our example we decided to use the matchmaking based on concepts [Equation (1b)]. The algorithm identifies all kinds of related adapters w.r.t. the concepts of the query profile. Owing to the fact that the XEMBL service provides BSML as well as AGAVE, which are both biological data formats, the bio data adapter for converting biological data formats is identified. Because nucleotides and amino acid sequences are biological sequences and these are parts of an annotated gene, the algorithm further identifies the AGAVE adapter for the projection on gene-related information. Finally, the algorithm proposes an adapter that should allow the extraction of the sequence based on the annotated gene. This adapter is not yet available in the adapter pool. Thus the algorithm includes an empty proxy adapter and provides its corresponding profile. The proxy adapter has to be realized in the customization phase. The composite adapter is illustrated in Figure 5. At the end of the SP the implemented proxy adapter as well as the composite adapter is stored in the adapter pool.
|
| 5 SIMULATING WORKFLOW EVOLUTION |
|---|
|
|
|---|
In general, users want to alter bioinformatics workflows in order to get new insights into complex biological processes. For instance, they want to make an analysis of various datasets using different algorithms, but they do not want to jeopardize the service interoperability they have achieved before. We simulate this user behavior by successively replacing the XEMBL service by new services providing data and by successively replacing the DDBJ BLAST service by different alignment services of different databases. Our goal is to reuse the composite adapter and the other adapters including the implemented proxy adapter we have customized and stored in the previous example. Thus we want to identify these adapters with high precision and recall. In the test scenario, we use 3 data providing services with 7 operations in combination with 21 data consuming services with 34 operations (Table 4), which we successively exchange. The adapter pool contains 14 adapters with 24 operations. We measure the overall performance using the average precision p and recall r out of 238 runs of the SP and compare different matchmaking variants. Let TP (true positive) be the set of returned relevant adapters, FP (false positive) be the set of returned but wrong adapters and FN (false negative) be the set of relevant adapters not identified. Then we define p = TP/(TP + FP), r = TP/(TP + FN). The first matchmaking algorithm uses the keyword matching based on TF-IDF. It represents a UDDI-like discovery. TF-IDF is known to produce a high recall, but also a low precision. We use the TF-IDF as benchmark and compare it with four variations of the concept matchmaking algorithm using Equation (1b), whereby two of them using linguistic analysis (Ling). The Exact variants consider the information of connected parameters in the data flow separately, while the Merge variants merge the information of all parameters together before they start retrieval. Thus, we expect that the Merge variants will be less precise, but produce a higher recall. Figure 6 depicts the results of this test scenario.
|
|
As expected, the TF-IDF produces a high recall,
100%, but a low precision,
51%. That means, nearly every second adapter is FP. The Exact variant without linguistic analysis produces a recall and a precision below average. In contrast using the LingExact variant that uses linguistic analysis achieves the best precision,
81%, and at the same time a good recall,
89%. Merging all parameter information enhances the recall of the algorithm in both cases, with and without linguistic analysis (LingMerge and Merge, respectively), but LingMerge produces a recall
96% that is near to the recall value of the TF-IDF. The precision is in both merge cases nearly the same,
61% and 63%, and as expected below the LingExact precision. To sum up, LingExact outperforms all other variants including TF-IDF, because only its recall is slightly lesser than the recall of LingMerge and TF-IDF and it has a much higher precision which is required in order to produce meaningful adapter compositions. Further, unlike the TF-IDF, LingExact is able to identify new composite adapters, if the bioinformatics workflow is requiring that. | 6 DISCUSSION |
|---|
|
|
|---|
BioMiddleware supporting in silico experiments is an evolving issue in bioinformatics. Such middleware provides a framework where integration and combination of data and method services can happen on-the-fly. This is a major advantage of this kind of middleware over-distributed query systems (DQS) which are realized as warehouses or federated databases. Unlike DQS the BioMiddleware offer a higher flexibility allowing users to tap into the most recent data from a number of different databases and combining it in novel ways that could not have been foreseen by DQS designers (Wilkinson et al., 2005). This flexibility is reached at the expense of combining autonomous and heterogeneous services where interoperability is not guaranteed.
The IRIS project fills in this blank by providing a software-aided solution for service interoperability. It is based on the identification of light-weight, customizable adapters that are inserted between incompatible services. A major precondition for identification is the availability of standard service ontologies and data ontologies. The development of such ontologies is not an integral part within IRIS, but available ontologies can be used within IRIS. The BioMOBY project as well as the myGrid project support such types of de facto standard ontologies that show their benefits in practical scenarios (Stevens et al., 2003b; Wilkinson et al., 2005). They can directly be used by the adapter pool. Furthermore, the BioMOBY project and the IRIS project are complementary and can benefit from each other. Because in BioMOBY services are defined by ontology-based object definitions (Wilkinson et al., 2005), required adapters can be precisely identified by the adapter pool. Even if a new service requires extensions of the ontology, the adapter pool can detect valuable adapters that can be beneficial for bridging the gap as well as it describes missing proxy adapters. In the scenario of Section 4 we have presented such a case where the adapter pools contain adapters that only partially meet the requirements to bridge two services. We used this setup to illustrate benefits as well as restrictions of our approach. It can happen that a new (proxy) adapter has to be realized. We are aware of the problem that the realization of the proxy adapter could be more complex than the realization of an adapter for the whole bridge. Here the user has to decide what to do. However, in both cases the new adapter is stored and therefore available for future workflows. Furthermore, sometimes the realization can be done automatically by schema matching. Schema matching is a complementary approach to reach service interoperability that can be directly integrated in SP. Several groups address the problem of deriving mappings between heterogeneous schemas (Bowers and Ludäscher, 2004; Madhavan et al., 2003; Magnani et al., 2005; Miller et al., 2001). Rahm and Bernstein (2001) give a good survey and compare prominent approaches. However, often schema matching approaches cannot generate complex mappings required in bioinformatics nor can generated mappings be customized and directly reused in different scenarios.
The query generation module of SP allows standard conforming services, i.e. WSDL services, to be mapped on MOBY-S services or services defined by MPL. By this mapping we are able to register adapters as well as bioinformatics services, like the alignment and data providing services presented above. Therefore, the discovery mechanisms of IRIS are suitable to search not only for adapters and composite adapters, but also for services and new data-driven bioinformatics workflows. These workflows are characterized, like composite adapters, by operation networks containing disjunctive as well as conjunctive parameters. Owing to limited space we cannot go into details any further. An extended discussion can be found in Radetzki (2005).
Although the algorithms for query generation and discovery provide valuable results, we plan to enhance them in a future release by modern data mining methods based on association rules (Dong et al., 2004). Currently, concept matching is based on linguistic methods using general-purpose ontologies. This requires the equality of terms which is a serious problem, e.g. when using not established abbreviations, acronyms or different spellings. When using association rules for the clustering of terms, where every cluster corresponds to a concept of the domain ontology, we could calculate the distance of a term to a cluster. Thus, no equality of terms would be required anymore.
| 7 CONCLUSION |
|---|
|
|
|---|
The SP of the IRIS platform provides a practical solution to the problem of service interoperability within in silico experiments. Customizable adapters and adapter compositions are identified, configured and inserted into workflows, thereby bridging the gap between heterogeneous services. By simulating workflow modifications SP has demonstrated to achieve service interoperability with high precision and recall. Current limitations of the query generation and discovery mechanisms, as discussed above, are actively addressed by the IRIS team and enhanced features will be available in the next release of the platform.
| Acknowledgments |
|---|
The authors would like to thank S. Mancke and H. Wendt for assistance in the implementation work and T. Erdenberger, M. Schaefers and G. Witterstein for valuable discussions and fruitful comments.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
1The project Interoperability and Reusability of Internet Services (IRIS) was initiated at the beginning of 2001 with the objective of developing middleware services for interoperability of Web services in science applications. ![]()
Received on October 20, 2005; revised on February 9, 2006; accepted on February 9, 2006
| REFERENCES |
|---|
|
|
|---|
Bausch, W., Pautasso, C., Alonso, G. (2003) Programming for dependability in a service-based grid. Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid)Tokyo, Japan , pp. 164171.
Bowers, S. and Ludäscher, B. (2004) An ontology-driven framework for data transformation in scientific workflows. Proceedings of the International Workshop on Data Integration in the Life Sciences (DILS'04).
Dong, X., Halevy, A., Madhavan, J., Nemes, E., Zhang, J. (2004) Similarity search for web services. Proceedings of the 30th VLDB ConferenceToronto, Canada.
Galperin, M.Y. (2005) The molecular biology database collection: 2005 update. Nucleic Acids Res, . 33, D5D24
Madhavan, J., Bernstein, P.A., Chen, K., Halevy, A., Shenoy, P. (2003) Corpus-based schema matching. Proceedings of the Workshop on Information Integration on the Web at the 18th International Joint Conference on Artificial Intelligence (IJCAI'2003)Acapulco, Mexico.
Magnani, M., et al. (2005) Schema integration based on uncertain semantic mappings. Proceedings of the Conceptual ModelingER 2005, LNCS, 3716, 3146.
Miller, R.J., et al. (2001) The clio project: managing heterogenity. ACM Sigmod Rec, . 30, 7883[CrossRef].
Oinn, T., et al. (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20, 30453054
Radetzki, U. (2005) Service-interoperability for science applications: identification and adaptation of component-based service mediators. , Germany PhD thesis University of Bonn.
Radetzki, U., Bode, T., Witterstein, G., Gnasa, M., Cremers, A.B. (2003) A service-centric computing environment for heterogeneous biological databases and methods. Proceedings of the Currents in Computational Molecular Biology (RECOMB 2003)Berlin, Germany , pp. 2526.
Radetzki, U. and Cremers, A.B. (2004) IRIS: a framework for mediator-based composition of service-oriented software. Proceedings of the 2004 IEEE International Conference on Web Services (ICWS 2004)San Diego, CA, USA , pp. 752755.
Radetzki, U., Leser, U., Cremers, A.B. (2004) IRIS: a mediator-based approach achieving interoperability of web services in life science applications. Proceedings of the 3rd E-BioSci/ORIEL Annual WorkshopHinxton, UK , pp. 2526 (invited talk paper).
Rahm, E. and Bernstein, P.A. (2001) A survey of approaches to automatic schema matching. VLDB J, . 10, 334350[CrossRef].
Stevens, R.D., et al. (2003a) myGrid: personalised bioinformatics on the information grid. Bioinformatics, 19, 302304.
Stevens, R.D., et al. (2003b) Exploring Williams-Beuren syndrome using myGrid. Bioinformatics, 20, 303310.
Sycara, K., et al. (2002) LARKS: dynamic matchmaking among heterogeneous software agents in cyberspace. Autonomous Agents Multi-Agent Syst, . 5, 173203[CrossRef].
Wilkinson, M.D. and Links, M. (2002) BioMOBY: an open-source biological web services proposal. Briefings Bioinformatics, 3, 331341
Wilkinson, M., et al. (2005) BioMOBY successfully integrates distributed heterogeneous bioinformatics web services. The PlaNet exemplar case. Plant Physiol, . 138, 517
Yuan, Q., et al. (2005) The institute for genomic research Osa1 rice genome annotation database. Plant Physiol, . 138, 1816
This article has been cited by other articles:
![]() |
P. Romano Automation of in-silico data analysis processes through workflow management systems Brief Bioinform, January 1, 2008; 9(1): 57 - 68. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







