Skip Navigation


Bioinformatics Advance Access originally published online on July 21, 2007
Bioinformatics 2007 23(18):2423-2432; doi:10.1093/bioinformatics/btm372
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/18/2423    most recent
btm372v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xiang, Z.
Right arrow Articles by He, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xiang, Z.
Right arrow Articles by He, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

miniTUBA: medical inference by network integration of temporal data using Bayesian analysis

Zuoshuang Xiang 1,4,5, Rebecca M. Minter 2, Xiaoming Bi 2, Peter J. Woolf 3,5 and Yongqun He 1,4,5,*

1Unit for Laboratory Animal Medicine, 2Department of Surgery, 3Department of Chemical Engineering, 4Department of Microbiology and Immunology and 5Center for Computational Medicine and Biology, University of Michigan, Ann Arbor, MI, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: Many biomedical and clinical research problems involve discovering causal relationships between observations gathered from temporal events. Dynamic Bayesian networks are a powerful modeling approach to describe causal or apparently causal relationships, and support complex medical inference, such as future response prediction, automated learning, and rational decision making. Although many engines exist for creating Bayesian networks, most require a local installation and significant data manipulation to be practical for a general biologist or clinician. No software pipeline currently exists for interpretation and inference of dynamic Bayesian networks learned from biomedical and clinical data.

Results: miniTUBA is a web-based modeling system that allows clinical and biomedical researchers to perform complex medical/clinical inference and prediction using dynamic Bayesian network analysis with temporal datasets. The software allows users to choose different analysis parameters (e.g. Markov lags and prior topology), and continuously update their data and refine their results. miniTUBA can make temporal predictions to suggest interventions based on an automated learning process pipeline using all data provided. Preliminary tests using synthetic data and laboratory research data indicate that miniTUBA accurately identifies regulatory network structures from temporal data.

Availability: miniTUBA is available at http://www.minituba.org

Contact: yongqunh{at}med.umich.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In biomedical research and clinical studies, experimental data are often collected across time over a number of similar trials or experimental units. We are often interested in knowing if or how an intervention or an adverse event (e.g. a drug treatment or a pathogenic infection) would affect the patient response over time, and if so, in what manner. Computer-based medical inference for clinical uses such as diagnostics and drug treatment will play an increasingly important role in clinical health care and biomedical research. Due to the noise intrinsic to clinical and biomedical data, medical inference should preferably be based on a probabilistic model. Several methods to model temporal clinical data have been explored including simple correlation (Petrie and Sabin, 2000; Zou et al., 2003), differential equations (Perelson et al., 1993), spectral decomposition methods using Fourier transformation or wavelets (Cordes et al., 2000) and neural networks (William, 1990). Simple correlation assumes linear and typically pairwise relationships, thus limiting the investigator's ability to identify multi-dimensional relationships between variables (Stuart et al., 2004). Differential equation methods are accurate, but are often hand created and as such are usually limited to a small number of variables (Stuart et al., 2004). Results derived from spatial decomposition methods rely on relatively low noise measurements and can be difficult to relate back to mechanisms (Stuart et al., 2004; Thomas, 1995). Neural networks make accurate predictions by mapping the data on to a high dimensional polynomial, thereby allowing all variables to influence each other in complex ways. However, in assuming that all variables cross influence makes it difficult to identify mechanisms using neural networks (Dreyfus, 2005). All of these methods represent well characterized and powerful methods for analyzing temporal data, but none beyond correlations have found widespread use in translational medicine. This lack of adoption is due in part to the complexity and expertise required for interpretation and formulation, as well as the relative noise intolerance present in several of these models. Here, we demonstrate that dynamic Bayesian networks represent a promising alternative method for analyzing biologic and clinical data.

Dynamic Bayesian networks (DBNs) have been suggested as an alternative for analyzing and interpreting heterogeneous, fluctuating data for many systems including clinical data (Korb and Nicholson, 2004). At its core, DBNs are a directed acyclic graph describing how variables influence each other over time. To put DBNs in a broader mathematical context, DBNs can be thought of as a discrete time approximation of a stochastic differential equation or as a Markov chain model with possibly many states. In a first order DBN, variables (nodes) at some time, ti, can influence the nodes it is connected to at a future time step, ti+1 but cannot influence any nodes at the same time step ti. For computational speed, variables are assumed to have a finite number of states that they can take on, such as high, medium or low. As such, the relationship between variables in the future to variables in the past is expressed as conditional probability tables that indicate the probabilities of each child node outcome given the states of the parent nodes.

Static Bayesian network engines for disease diagnosis have a long history of fruitful application to medicine (Burnside, 2005; Gevaert et al., 2006; Kline et al., 2005; Suermondt and Cooper, 1993). In general, these applications use a form to collect disease symptoms and from these symptoms create a set of likely disease explanations. In contrast, the DBN engine captures time varying clinical parameters and predicts a time course of both disease progression and the impact of various clinical interventions. In contrast to static Bayesian networks, DBNs also allow temporal cycles between variables allowing the user to interpret connections as temporal causation—a more clinically relevant definition of causation for many clinicians. DBNs have been applied to bioinformatic analyses of gene expression data (Yu et al., 2004; Zou and Conzen, 2005), and have generated insights that could not be obtained from static Bayesian analysis. In addition, DBNs have been used more recently to understand visual field deterioration (Tucker et al., 2005).

A key advantage of DBNs over static Bayesian network analysis is that the relationships described in DBNs always have an unambiguous direction of causality. Static BNs learned from observational data can have identical probability scores, even when some of the causal relationships are inverted. This property of markov equivalence means that only some of the edges in a static Bayesian network represent relationships with known causation. In contrast, as defined above, DBNs only allow events from the past to influence events in the present. The result of this property is that two DBNs can never be markov equivalent, as reversing any arrow will yield a network where the present influences the past which is not allowed.

Presently, three difficulties with DBN modeling of biomedical data have prohibited the widespread usage of DBNs by clinicians and biomedical researchers. First, although DBN algorithms are publicly available, there is no integrated package for modeling biomedical data that uses DBNs. The data preparation for the DBN modeling is time consuming and requires extensive experience, and the computational power required to perform DBN learning can be large and not available to most experimental or clinical groups. Second, no existing pipeline is currently available for interpretation and inference of the DBN learned from the clinical data. Third, there has not been an easy way to store and update experimental data and analyzed results.

To overcome these problems, we have developed a web-based dynamic Bayesian network analysis system (miniTUBA, which stands for Medical Inference by Network Integration of Temporal Data Using Bayesian Analysis), with the intended goal of learning and simulating biomedical networks using temporal data from experimental and clinical data.

To demonstrate the accuracy and utility of miniTUBA, we present results based on both synthetic and experimental data. Synthetic data are first presented to introduce the algorithms used in our approach. The experimental data shown have been gathered as part of a series of studies undertaken to explore the immune defect present in the setting of biliary obstruction. This immune defect is thought to lead to increased susceptibility for developing infectious complications, sepsis and death in patients with biliary obstruction (Nomura et al., 1999). Using a well-established mouse model of biliary obstruction (Minter et al., 2005), we measured a series of clinical parameters over a period of 8 weeks to determine if miniTUBA could accurately predict known or expected relationships between variables as well as previously unknown causal relationships. Additionally, we used miniTUBA's predictive features to determine if miniTUBA could accurately predict specific outcomes or trajectory of data based on network learning.


    2 MATERIALS AND METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In the following sections, we outline the software architecture of miniTUBA and two validation datasets.

2.1 miniTUBA software pipeline
Figure 1 describes the main miniTUBA pipeline for analysis of the project data.


Figure 1
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. miniTUBA system architecture.

 
2.1.1 Website login
Any user can register for an account on miniTUBA. Having an account enables the user to log into the system and track all of the user's experimental data. The system also stores all the analysis history including analysis parameters and results. Any patient data which is entered into miniTUBA must be de-identified in accordance with the Health Insurance Portability and Accountability Act (HIPAA), and users ensure this has occurred at the time of login.

2.1.2 Data input and project management
miniTUBA is a project-oriented web-based system. One or more projects can be created by a registered user. Currently, each project need to go through an internal review process. This review process ensures that the computational resource is properly used since an approved project can run analyses that take up to 144 h—representing a significant computational investment. Once approved, a user can submit/update data, set up DBN settings and run each analysis. For each project, a user can run multiple analyses and these analyses will be stored in miniTUBA for later use.

2.1.3 DBN parameter setup
The user will next set different DBN parameters that specify how the data are pre-processed and constraints on the DBN learning algorithm. The DBN settings for each analysis will be stored and can be reused for future analyses. Different DBN settings may result in different results. Default DBN settings follow the best practices described elsewhere (Yu et al., 2004), but can be changed by the user if desired. DBN settings include settings for:

  • Selecting a subset of the available experiment units. In some cases it may be desirable to only analyze some of the patients, e.g. without starting a new study.
  • Selecting a subset of the available variables. Similar to above, it is possible to modify the analysis to only focus on certain variables, while ignoring others.
  • Spline fitting. Most clinical data are not gathered at a consistent sampling interval, so miniTUBA overcomes this problem by interpolating across time using spline fits. Such a spline fitting approach has been shown by Yu et al. to yield good behavior for reasonably smooth temporal data (Yu et al., 2004). For spline fitting, the options include none for no fitting and natural fitting using R function splinefun (Forsythe et al., 1977) with method = ‘natural’.
  • Discretization. For efficient learning, the experimental data are first discretized into a finite number of bins. In miniTUBA discretization options include none for no discretization, 2–10 bins interval, 2–10 bins quantile and 2–10 bins customized.
  • Structural priors. In some cases, a user may know that some edges between variables must or must not be present. These constraints can be included in the analysis in miniTUBA as structural priors.
  • Query time. In miniTUBA, the learner will attempt to optimize the structure for a user specified compute time ranging from 1 min to 144 h. To speed up the searching process, the user can run their job in parallel by using between 1 and 16 analysis instances, each to be run on a separate node of the backend cluster.
  • Learning algorithm. miniTUBA can use two different discrete optimization algorithms to learn the underlying DBN: (1) simulated annealing (2) greedy learning. The simulated annealing is the default method and is the method used in this article.
  • Markov lag. A user can also test different Markov lags ranging from 1 to 5 to explore the causal relationships across different time scales. Here, Markov lag means the time lag between the start of an event and its effect. For example, for a project with hourly datasets, Markov lag 1 implies that perturbations made now will have a measurable effect in 1 h, while a Markov lag of 2 means that the effect will be observable after 2 h. Depending on the nature of the experimental units and purpose the experiment, a user may need to try different Markov lags to find out the optimal Markov lag. Although not used here, it is also possible to create models that cover a range of Markov lags. These more complete models are not included here as the results can be difficult to interpret mechanistically.

2.1.4 DBN modeling and distributed computation
miniTUBA uses a modified version of the software package BANJO (http://www.cs.duke.edu/~Eamink/software/banjo/) developed at Duke University for dynamic Bayesian network learning for DBN learning (Smith et al., 2006). The learning jobs are distributed to a 44-node cluster of Apple G5 computers using Xgrid technology. Xgrid is software developed by Apple's Advanced Computation Group that allows us to easily turn a group of networked computers to a supercomputing cluster for parallel computing (http://www.apple.com/server/macosx/features/xgrid.html). In parallel jobs, each processor begins network learning from either a random DBN topology or uses a different random seed when learning with a stochastic method such as simulated annealing. Due to the embarrassingly parallel nature of network structure learning, this approach results in a nearly linear decrease in computing time as additional nodes are added. Because initial network learning can take hours to days to complete, miniTUBA alerts registered users by email when a job completes.

2.1.5 Analysis result output
The top scoring and consensus networks generated by the DBN learning process are visualized using Graphviz (http://www.graphviz.org/) (Gansner and North, 2000). The top 10 scoring network graphs are shown in the results page. A consensus network among the top 10 scoring networks can be generated to show edges that are present in all 10 networks, indicating relationships that are present with high confidence. While other metrics for edge confidence are possible, such as pvalues and probability of conservation, we have found from user studies that these more quantitative metrics tend to overwhelm most non-computational users and end up making the result less useful. As a result, we choose to use a simpler method for representing consensus relationships described above.

Clicking nodes in the networks generates a probability table calculated based on the input dataset and the proposed causal relationships associated with the node. To assess how much better or worse a network is than the others among the top 10 scoring networks, a plot of the Bayes score distribution for these networks can also be displayed in the results page.

To simply and intuitively interpret the relationships predicted by the DBN engine, a module is developed to allow a user to generate 2D/3D scatter plots by clicking on a variable node with 1 or 2 other variable nodes as parents. The R ‘plot’ command and the LiveGraphics3D package (http://www.vis.uni-stuttgart.de/~kraus/LiveGraphics3D/) are used to draw 2D and 3D plots, respectively. The 3D scatter plot can be rotated or zoomed in/out for users to find better angle or resolution.

2.1.6 Prediction
It is possible to predict the values of future time points given a DBN, conditional probabilities generated from experimental data and initial values. A prediction module was written that combines a Gibbs sampler to sample future values and a bootstrapping step to de-discretize the predictions. First, the data are discretized (e.g. low, medium high) and a conditional probability table was generated for each variable. The associated observations for each condition are also recorded. Next, Gibbs sampling is used to predict future states for each variable by sampling from the conditional probability distribution (Korb and Nicholson, 2004). Bootstrapping is used to de-discretize the states to continuous numerical values by sampling from the associated observations of the predicted states. In prediction mode, miniTUBA repeats this process of sampling and bootstrapping 10 000 times. For numerical variables, the mean and the standard error calculated from the 10 000 predictions are plotted along with the initial values. A probability table is given for variables with nominal values and a probability curve is shown for every possible value.

2.1.7 System architecture and database design
miniTUBA currently runs on two Dell Poweredge 2580 servers running the Redhat Linux operating system (Redhat Enterprise Linux ES 4) and Apache HTTP Server. Data are stored in miniTUBA using a MySQL database and the interface is constructed using a variety of scripts including PHP and Perl. The DBN learning jobs are processed by the mentioned Xgrid cluster.

2.2 Validation studies
Given the pipeline described above, we have carried out two validation studies of miniTUBA to demonstrate its user accessibility, utility and accuracy.

2.2.1 Synthetic network data
To illustrate the features of miniTUBA, we constructed a simple four variable clinical patient model. In this model, a patient's status is defined by four variables A, B, C and D. The causal relationship among the four variables is defined in Figure 2a and b. A synthetic dataset was generated for a specific patient based on a given probability table (Fig. 2c). The initial states of the four variables at time point 0 for the patient were randomly chosen, and the values of subsequent time points were sampled based on the probability table using an internally developed simulator. An observation includes a set of data for these four variables at a time point. The size of sample observations ranges from 50 to 2500 with an interval of 10 between two simulations. The data sampling process was repeated 100 times to gather statistics to identify how much data are needed to accurately identify a network.


Figure 2
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. A synthetic Bayesian network with four variables and the corresponding conditional probability tables. These conditional probability tables express the probability of the state of each variable given its parents (those it directly depends upon). (a) Unrolled temporal network; (b) collapsed temporal network and (c) conditional probability tables.

 
2.2.2 Synthetic network learning
After sampling this data, we then used miniTUBA to learn the DBN best describing these data. When learning this network, we tested two different structural prior conditions. In the first case, we included no structural priors and allowed the network searcher to find the best network without constraints. In the second case, we constrained the network using prior knowledge that the variable A has no parents. The datasets were discretized to three levels, and network learning was done using simulated annealing with the following settings: proposer = All Local Moves, initialTemperature = 1000, coolingFactor = 0.9, reannealingTemperature = 500, maxAcceptedNetworksBeforeCooling = 1000, maxProposedNetworksBeforeCooling = 10 000 and minAcceptedNetworksBeforeReannealing = 200. For both cases, we evaluated the accuracy of our DBN analysis with respect to the number of observations needed to find the true network.

2.2.3 Biomedical research data
To validate miniTUBA using clinical data, we next tested our pipeline using data from a mouse model describing liver injury. The biological experiment involved an analysis of 21 clinical and biochemical parameters obtained from mice which have undergone common bile duct ligation (CBDL), sham operation or no treatment (untreated). Blood samples were obtained three times weekly via tail vein bleeding, and analyzed for blood count, weight, survival status and a panel of molecular markers for liver injury. A complete list of parameters measured is provided in Table 1. This experimental approach provided a temporal dataset from each individual animal. These data were evaluated in miniTUBA to identify causal or apparently causal relationships between specific clinical or biochemical parameters and survival, thus identifying potential future targets for intervention. The datasets were discretized to three levels, and network learning was done using simulated annealing using the same settings as with the synthetic network data above.


View this table:
[in this window]
[in a new window]

 
Table 1. Variables used in the biomedical research data simulation

 

    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Synthetic data
To validate the DBN analysis approach, we used our synthetic model to identify how much data are required to correctly identify features of the full network, as is shown in Figure 3. For each dataset, a DBN analysis was run, and the top 10 scoring networks were collected. The probability of correct network identification for each observation size was calculated by comparing the top scoring network structure with the true network. A consensus network represents a network structure that is shared by the top 10 scoring networks. For each observation set, the probability, P, of finding a consensus network with all of its connections a subset of the true network was calculated as follows:


Formula

where N is the number of replicates, and j = 1,...,M iterates through edges in the consensus network from replicate i. (consensus)j isin (true edges) gives 1 if consensus edge j exists in true edges, otherwise 0.


Figure 3
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Network identification curves based on randomly sampled data from a synthetic network. Without prior knowledge, 1280 observations are used to obtain 95 % probability of correct network identification, and 210 observations are needed to have 99 % probability of finding a consensus network with all of its connections being true (a). When prior knowledge is added that variable A has no parent, 1200 observations are required for 95 % probability of correct network identification and only 130 observations are needed to have 99 % probability of finding a consensus network with all of its connections being true (b).

 
The plots in Figure 3 show probabilities of correct identification of the network for different sized datasets. As the number of observations increases, the probability of correct identification also increases. When no structural priors are defined, Figure 3a demonstrates the probability of correct network identification against the number of observations and the probability of finding a consensus network with all of its connections being true. With a sample size of 1280 and up, the analyses showed ≥95 % correct identification of the number 1 or top network as the true network (Fig. 3a). With a smaller sample size (210 and up), the analyses showed ≥99 % correct identification of the consensus network as a subset of the true network (Fig. 3a). Because the consensus network contains at least one edge and the total number of edges in the true network is three, the percentage of edges that recovered with this criterion is ≥33.3 %.

Similar results are obtained when a strong prior that variable A cannot have parents is added (Fig. 3b). Fewer observations (1200) are required to achieve ≥95 % probability of correct network identification (Fig. 3b). Also, a smaller number of observations (130) are needed for obtaining ≥99 % probability of finding a consensus network with all of its connections being correct. These results suggest having prior knowledge reduces the amount of data required to find a correct network.

To test the effect of network size, we repeated this analysis with a different synthetic 20-node network. For this larger network, we found that ~2000 samples are needed to consistently recover the true network (data not shown).

3.2 Biomedical research data
We have used miniTUBA to study causal relationships among a set of clinical parameters (Table 1) related to the inflammatory profile and clinical condition of mice over a period of 8 weeks following common bile duct ligation (CBDL), sham operation or no treatment at all.

Figure 4 shows predicted top networks and score distributions with Markov lags of 1, 2, 3, 4 and 5 days. These networks were generated using a structural prior that forbids self-loops, thereby forcing the network to reveal connections between nodes. A similar set of networks that allow self connections are provided in Figure 5. For most networks in the later case, the best predictor of the variable is its previous state. However, such self-loops are generally uninteresting for the researcher in that most researchers want to know what other factors influence the variable. Nevertheless, it is worthwhile to try both ways since forbidding self-loops may also bring in some weak connections. The absolute values of model probability scores p(M|D), cannot be compared directly across different markov lags because the numbers of observations are different. The distribution of model probability scores in each Markov lag (shown as a subpannel in Figs 4 and 5) indicates that the top network usually has a distinctly higher score than the secondary or other top scoring networks. The large Markov lag models show relationships that span a longer time scale. This range of Markov lags was chosen as it is particularly important from a clinical standpoint to evaluate a time interval or Markov lag which will allow for the earliest possible diagnosis of disease development, while leaving enough time to potentially intervene and alter a predicted deleterious outcome.


Figure 4
View larger version (38K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Predicted top networks and score distributions with different Markov lags ranging from 1 to 5 days using the structural constraint that self-loops are disallowed. The shaded nodes show the causal relationship between Survival and Procedure. The profiles to the right show the relative probability of the top 10 models given data.

 

Figure 5
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Predicted top networks and score distributions with different Markov lags ranging from 1 to 5 days on the condition of self-loop allowed. Self-loops are not shown in this figure but appear in all nodes except Procedure and Survival due to prior setup. The shaded nodes show the causal relationship between Survival and Procedure. The dotted portion of the network was further analyzed. Profiles to the right are the relative probabilities of the top 10 models.

 
Different lags predict different results. As an example, in top scoring models with Markov lags 1, 2, 3 and 5 days, the variable Procedure was found as the primary cause of the mouse death. For Markov lag 4, Procedure was found to cause the change of PLT (Platelet count) which then lead to the mouse death. This is a valuable observation as a decreasing platelet count could serve as a marker of an adverse outcome and this data would be easily and quickly obtained by a clinician or biomedical researcher. More interpretation of the Bayesian network results is described in the Discussion section.

As an example of a predicted relationship, Figure 6 demonstrates specific analysis of the predicted top network with lag 1. Figure 6a shows the scatter plot representing the relationship between parents MCHC (mean corpuscular hemoglobin concentration) and MCV (mean corpuscular volume) and child MCH (mean corpuscular hemoglobin). This modeling result suggests a true relationship between these three variables, indicating that MCHC and MCV determine the value of MCH. A user can rotate or zoom in/out the 3D scatter plot from the miniTUBA website to exam the relationships among the three variables more closely. Figure 6b shows the predicted results of time-dependent MCH. The prediction is based on a leave one out design, where we learn a network with one mouse left out, then use the start of the left out mouse data to predict the rest of it. Figure 6c shows the similar trends in the probability of survival using observed versus predicted data, suggesting miniTUBA can accurately predict survival. As demonstrated, miniTUBA accurately predicts the MCH value in subsequent time points. A similar analysis for a network that allows self-loops is provided in Figure 7.


Figure 6
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Specific analysis of the predicted top network with Markov lag 1 day as shown in Figure 4a. (a) relationship between parent MCH and children MCHC and MCV and a scatter plot showing this relationship. (b) Prediction of time-dependent MCH. The round points without error ranges show the values from previously known time points and the diamond shaped points with error ranges show the predicted values and the associated standard errors. (c) Probability of survival based on observed versus predicted data.

 

Figure 7
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7. Specific analysis of the predicted top network with Markov lag of 1 day as shown in Figure 5a. (a) Relationship between weight change and neutrophil (NE) counts. (b) Prediction of time-dependent NE. The round points without error ranges show the values from previously known time points and the diamond shaped points with error ranges show the predicted values and the associated standard errors.

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
miniTUBA provides a user-friendly web-based tool that allows clinicians and biomedical researchers to use dynamic Bayesian network analyses and to manage their projects easily. miniTUBA is able to explore how temporal data can be used to make medical inferences about the contribution of a specific intervention along a well-controlled experimental design.

The analysis using synthetic data suggested that the probability of correct identification improves as the number of observations increases. To correctly specify the small four variables network shown in Figure 2, the sample size of 1280 is enough to identify the top 1 network being 95 % true compared to the synthetic network structure. With a smaller sample size (210 and up), the analyses showed 99 % correct identification of the consensus network as a subset of the true network (Fig. 3a). This indicates that 16.4 % of the optimal sample size is enough to identify a consensus network. An even smaller sample size is needed to identify the consensus network when a prior is defined (Fig. 3b). Note that these results are specific to this four particular four-node network, and would be different for a different network conditional probability table, or with a different number of nodes. This said, these results suggest that miniTUBA is capable of predicting true causal relationships from a limited data sample.

For the simple experimental dataset analyzed, miniTUBA largely identifies known interactions (Fig. 4). For example, the DBN predicts that CBDL leads to increased aspartate aminotransferase level (AST), alanine aminotransferase level (ALT) and bilirubin levels, which is expected (Minter et al., 2005). Examples of other expected results are: hemoglobin levels (Hb) related to red blood cell (RBC) levels, the relationship between white blood cells (WBC) and lymphocytes (LY) and the relationships among MCH, MCV and MCHC. Figure 4d also predicts that CBDL causes reduction of platelet counts (PLT) which induces mouse death. While a decreased platelet count is known to occur in the setting of liver disease, the observed relationship in the DBN that a decreased platelet count following CBDL leads to death 4 days later is quite useful. As the platelet count is easily determined by obtaining a complete blood cell count (CBC), this could be used as a marker to predict death and to potentially guide intervention in affected subjects. This causal relationship is only found using a Markov lag of 4 days, suggesting it is often necessary to analyze Bayesian networks using various Markov lags in order to uncover time-varying causal relationships. These biomedical results also provide evidence that miniTUBA is robust and is able to accurately identify established relationships.

Figure 7 indicates that when a mouse has a weight gain of within 5 g compared to day 0, the neutrophil level (NE) usually stays within 10 % of the total white blood cell count. However, when a mouse loses weight (weight change is negative), NE becomes much higher. The level of NE increases as the weight loss becomes more significant. Bayesian network analysis provides an efficient way to find this type of causal relationship as compared to traditional probability analysis approaches.

The DBN analysis carried out by miniTUBA also suggests a new approach to how temporal data are gathered. Currently most clinical data are gathered using a fixed sampling rate, and are therefore generally constrained to examine phenomena that take place over that same time period. Using a DBN tool such as is provided by miniTUBA, a researcher can first gather preliminary data and then examine this data using different Markov lags and identify which sampling rate is the most interesting.

miniTUBA provides clinical and biomedical researchers with a user-friendly tool to perform dynamic Bayesian analysis on temporal datasets, and then make predictions for an individual patient or experimental subject based upon the learned network. Such ability has great applicability in many areas of biomedical research. In translational research, in particular, miniTUBA allows researchers to easily perform DBN analyses, find underlying causes of disease or biomarkers and make accurate prospective clinical predictions. To date, failure to appreciate the contextual nature of molecular markers or parameters mediating disease has been cited as a major factor contributing to failed translational research (Greenspan, 2007; Sabroe et al., 2007). Due to constraints in studying human disease in animal models or cell lines, there is a tendency to attribute an activity or outcome to intrinsic properties rather than the context or the environment in which the interaction occurs. With the development of genomic and proteomic technology, it is becoming increasingly clear that most relationships are many to many rather than one to one, and these relationships often change depending on the context in which they occur (Greenspan, 2007). DBN analysis allows for consideration of these contextual relationships.

The numerous failed sepsis therapeutics trials are a prime example of translational studies which have failed due to failure to appreciate the complex interactions between factors affecting disease outcome in variable contextual environments. A number of antibodies and drugs have been brought forward for Phase I and II clinical trials based upon animal studies for the treatment of sepsis. Unfortunately, with the exception of one study (Bernard et al., 2001) which demonstrated minimal benefit in a select group of patients, all other therapeutics have failed to decrease the high rate of mortality observed in patients with severe sepsis (Abraham et al., 1995; Bernard et al., 1997; Bone et al., 1987; Fisher et al., 1996; Opal et al., 1997). The reasons for these failed trials are likely many, but have been primarily attributed to the fact that sepsis is a complex, heterogeneous and dynamic process, and there are multiple factors which influence its outcome. None of the trials to date have been designed to take these factors fully into consideration (Remick, 2003).

miniTUBA may possibly provide a tool which can alleviate some or many of these problems in future trials, allowing for targeted intervention based upon the exact context in which a patient resides at given time point.

Although the focus of miniTUBA is on analyzing clinical data, the methods shown here are also directly related to other time varying datasets. The DBN framework used in miniTUBA provides a combination of model inference, human interpretable models and data prediction that significantly assists in constructing and using data driven models.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Z.X. and the miniTUBA server were supported by Y.H.'s institutional startup funding and NIH grant 1R21AI057875-01. The biomedical research was supported by NIH grant NIGMS K08 GM074678-01A1 (RMM) and the University of Surgeons Foundation (RMM). P.J.W. is supported by NIH grant U54-DA-021519. We acknowledge Alexander Hartemink for his excellent software BANJO and Andrew Hodges for his valuable insight and suggestions.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on March 19, 2007; revised on July 10, 2007; accepted on July 11, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Abraham E, et al. Efficacy and safety of monoclonal antibody to human tumor necrosis factor alpha in patients with sepsis syndrome. A randomized, controlled, double-blind, multicenter clinical trial. TNF-alpha MAb Sepsis Study Group. JAMA (1995) 273:934–941.[Abstract/Free Full Text]

    Bernard GR, et al. The effects of ibuprofen on the physiology and survival of patients with sepsis. The Ibuprofen in Sepsis Study Group. N. Engl. J. Med (1997) 336:912–918.[Abstract/Free Full Text]

    Bernard GR, et al. Efficacy and safety of recombinant human activated protein C for severe sepsis. N. Engl. J. Med (2001) 344:699–709.[Abstract/Free Full Text]

    Bone RC, et al. A controlled clinical trial of high-dose methylprednisolone in the treatment of severe sepsis and septic shock. N. Engl. J. Med (1987) 317:653–658.[Abstract]

    Burnside ES. Bayesian networks: computer-assisted diagnosis support in radiology. Acad. Radiol (2005) 12:422–430.[CrossRef][Web of Science][Medline]

    Cordes D, et al. Mapping functionally related regions of brain with functional connectivity MR imaging. AJNR (2000) 21:1636–1644.[Abstract/Free Full Text]

    Dreyfus G. Neural Networks: Methodology and Applications. (2005) New York: Springer, Berlin.

    Fisher CJ Jr, et al. Treatment of septic shock with the tumor necrosis factor receptor:Fc fusion protein. The Soluble TNF Receptor Sepsis Study Group. N. Engl. J. med (1996) 334:1697–1702.[Abstract/Free Full Text]

    Forsythe GE, et al. Computer Methods for Mathematical Computations. (1977) Upper Saddle River, New Jersey: Prentice Hall.

    Gansner E, North NC. An open graph visualization system and its applications to software engineering. Softw. Pract. Exper (2000) 30:1203–1233.[CrossRef]

    Gevaert O, et al. Predicting the outcome of pregnancies of unknown location: Bayesian networks with expert prior information compared to logistic regression. Hum. Reprod (2006) 21:1824–1831.[Abstract/Free Full Text]

    Greenspan NS. Conceptualizing immune responsiveness. Nat. Immunol (2007) 8:5–7.[CrossRef][Web of Science][Medline]

    Kline JA, et al. Derivation and validation of a Bayesian network to predict pretest probability of venous thromboembolism. Ann. Emerg. Med (2005) 45:282–290.[CrossRef][Web of Science][Medline]

    Korb KB, Nicholson AE. Bayesian Artificial Intelligence. (2004) London, UK: Chapman & Hall/CRC Press.

    Minter RM, et al. Altered Kupffer cell function in biliary obstruction. Surgery (2005) 138:236–245.[CrossRef][Web of Science][Medline]

    Nomura T, et al. Impact of bactibilia on the development of postoperative abdominal septic complications in patients with malignant biliary obstruction. Int. Surg (1999) 84:204–208.[Web of Science][Medline]

    Opal SM, et al. Confirmatory interleukin-1 receptor antagonist trial in severe sepsis: a phase III, randomized, double-blind, placebo-controlled, multicenter trial. The Interleukin-1 Receptor Antagonist Sepsis Investigator Group. Crit. Care Med (1997) 25:1115–1124.[CrossRef][Web of Science][Medline]

    Perelson AS, et al. Dynamics of HIV infection of CD4+ T cells. Math. biosci (1993) 114:81–125.[CrossRef][Web of Science][Medline]

    Petrie A, Sabin C. Medical Statistics at a Glance. (2000) Oxford, Malden MA: Blackwell Science.

    Remick DG. Cytokine therapeutics for the treatment of sepsis: why has nothing worked? Curr. Pharm. Des (2003) 9:75–82.[CrossRef][Web of Science][Medline]

    Sabroe I, et al. Identifying and hurdling obstacles to translational research. Nat. Rev. Immunol (2007) 7:77–82.[CrossRef][Web of Science][Medline]

    Smith VA, et al. Computational inference of neural information flow networks. PLoS comput. biol (2006) 2:e161.[CrossRef][Medline]

    Stuart A, et al. Kendall's Advanced Theory of Statistics. (2004) New York: Oxford University Press.

    Suermondt HJ, Cooper GF. An evaluation of explanations of probabilistic inference. Comput. Biomed. Res (1993) 26:242–254.[CrossRef][Web of Science][Medline]

    Thomas JW. Numerical Partial Differential Equations. (1995) New York: Springer.

    Tucker A, et al. A spatio-temporal Bayesian network classifier for understanding visual field deterioration. Artif. Intell. Med (2005) 34:163–177.[CrossRef][Web of Science][Medline]

    William GB. Use of an artificial neural network for data analysis in clinical decision-making: the diagnosis of acute coronary occlusion. Neural Comput (1990) 2:480–489.[CrossRef]

    Yu J, et al. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics (2004) 20:3594–3603.[Abstract/Free Full Text]

    Zou KH, et al. Correlation and simple linear regression. Radiology (2003) 227:617–622.[Abstract/Free Full Text]

    Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics (2005) 21:71–79.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/18/2423    most recent
btm372v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xiang, Z.
Right arrow Articles by He, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xiang, Z.
Right arrow Articles by He, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?