Skip Navigation


Bioinformatics Advance Access originally published online on December 3, 2008
Bioinformatics 2009 25(3):353-357; doi:10.1093/bioinformatics/btn622
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/3/353    most recent
btn622v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yaragatti, M.
Right arrow Articles by Ungar, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yaragatti, M.
Right arrow Articles by Ungar, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

A predictive model for identifying mini-regulatory modules in the mouse genome

Mahesh Yaragatti *, Ted Sandler and Lyle Ungar

Biotechnology Program, CIS, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 

Motivation: Rapidly advancing genome technology has allowed access to a large number of diverse genomes and annotation data. We have defined a systems model that integrates assembly data, comparative genomics, gene predictions, mRNA and EST alignments and physiological tissue expression. Using these as predictive parameters, we engineered a machine learning approach to decipher putative active regions in the genome.

Results: Analysis of genomic sequences showed nucleosome-free region (NFR) modules containing a higher percentage of conserved regions, RNA-encoding sequences, CpG islands, splice sites and GC-rich areas. In contrast, random in silico fragments revealed higher percentages of DNA repeats and a lower conservation. The larger conserved sequences from the Vista enhancer browser (VEB) showed a greater percentage of short DNA sequence matches and RNA coding regions in multiple species.

Our model can predict small regulatory regions in the genome with >95% prediction accuracy using NFR modules and >85% prediction accuracy with VEB elements. Ultimately, this systems model can be applied to any organism to identify candidate transcriptional modules on a genome scale.

Contact: myar{at}seas.upenn.edu

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 
Discovering transcriptional regulatory regions or modules is prerequisite to deciphering gene regulatory networks in the mammalian genome. Regulatory modules (100–1000 bp) may have multiple transcription factor binding sites or elements that are located throughout the genome (Sinha et al. 2003). Current state-of-the-art approaches integrate highly parallel sequencing technologies and ChIP arrays that work in tandem with computational models (GuhaThakurta 2006). Although computational methods for modeling and targeting DNA regulatory elements have been developed over the past 30 years, recent studies using comparative genomics have reported that even ultra conserved regions (<200 bp) have a very high false detection rate of regulatory elements (Visel et al. 2008).

Despite the arsenal of bioinformatics tools already developed for promoter and gene-finding regions, it is still estimated that there are 105 more regulatory elements than coding genes in most eukaryotic genomes (Yuh et al. 1998). Significantly reducing this genome search space on the order of log scales requires that we use an array of fast and simple approximations. The ENCODE Consortium advanced this idea of collective computational analysis by using a compendium of models to identify functional elements inside 1% of the genome (ENCODE Project Consortium 2007).

A major issue in identifying functional sites in the genome is that regulatory elements are contingent on higher order chromatin structure that is dictated by selective constraints imposed by the cellular environment (Elnitski et al. 2006). Therefore, a model that integrates features of a DNA sequence reflecting genomic, cellular and physiological properties may have high predictive potential (Fig. 1).


Figure 1
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Schematic of predictive model. In silico genomic fragments can range in length from 150 bp to 250 bp. Each genomic fragment is scored using available annotation data and is mapped as a (1 x n) binary vector. A training set (data not shown) is required to create the initial SVM model, where subsequent predictions of genomic fragments can be predicted based on those weighted values. Random genomic fragments can be classified as putative active regulatory modules (highlighted).

 
It is well established that 2% of the cellular genome is replete with untranslated regions, non-coding genes, chromosomal structural elements, conserved areas and nucleosome-free regions (NFRs) (Chiaromonte et al. 2003,Waterston et al. 2002). In a recent study, we identified small functional transcriptional modules derived from NFRs from the mouse genome (Yaragatti et al. 2008). In this article, we propose a parallel approach using encoded regions (150–250 bp) to train an in silico model with the hope of identifying novel regulatory modules in the genome.

Quantitative representation of the genome is rooted in systems biology including the mapping of DNA sequences into vector space (Muller and Koonin 2003). Likewise, methods using numerical feature vectors characterizing DNA sequences have been proposed that use position weight matrices and motifs based on biophysical considerations of protein–DNA interactions (Djordjevic et al. 2003). Faithful to this paradigm of the digital genome, our proposed model transforms genomic sequences into functional bits that can be analyzed in a high dimensional vector space.

We have introduced a systems model that integrates information from sequence features to physiological tissue expression with a widely used learning algorithm known as a support vector machine (SVM). In its most reduced form, genome annotation data have the capacity of classifying genomic DNA into discrete classes that can be either active or inactive. To learn this concept, the SVM is shown a set of labeled DNA vectors from two groups and is trained to differentiate between them. Upon successful training, the SVM will be able to predict functionality of new DNA vectors with a minimal number of errors (Vapnik 1995). SVMs often make significantly fewer errors than other learning algorithms such as neural networks, hierarchical clustering, self-organizing maps and Bayesian methods (Brown et al. 2000).

In this article, we have implemented a systems model using mouse genomic fragments characterized with expression and sequence data to predict putative transcriptional regulatory modules.


    2 SYSTEMS AND METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 
2.1 Sequence database
All sequences used in our analysis can be found in Supplementary Table 2. The NFR sequences were experimentally validated in a mammalian F9 cell line to increase the level of GFP expression when inserted upstream of a promoter site (Yaragatti et al. 2008). The average size of these fragments was 149 bp.

Additional sequences were randomly taken from the Vista enhancer browser (VEB) database (http://enhancer.lbl.gov/). These elements were computationally predicted and tested using a LacZ vector and expression in embryonic mouse blastocysts (Pennachio et al. 2006). The average size of these fragments was 1200 bp.

Mouse chromosomes were downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/genomes/M_musculus). We used a PERL script to concatenate the file into a single linear sequence. In order to emulate the enzymatic approach used in (Yaragatti et al. 2008), we designed a program to search for fragments with ‘GGCC’ sites flanking both ends. Fragments with length more than 150 and less than 250 were selected and all sequences were mapped to the mouse genome (UCSC mm9, July 2007 Assembly). We labeled these sequences as in silico.

The dataset for LIBSVM requires fixed-length vectors. Each vector is labeled with a ‘+1’ to denote ‘functional’ transcriptional regulatory elements and ‘–1’ to denote in silico random fragments. Each vector contains values representative of its features (41 dimensions).

2.2 Support vector machine
We used LIBSVM, which is a publicly available online library for training and predicting with SVMs (Chih-Chung and Chih-Jen 2001). The dataset was imported into the application and we created the model using the linear kernel with default parameters. The model was saved and used later for the prediction phase.

In the training phase, the SVM algorithm searches for a hyperplane in the input space that separates the positive from the negative samples. Based on learning theory, when many hyperplanes exist, an optimal procedure selects the hyperplane that is farthest from any training vector. This optimal hyperplane is known as the maximum margin hyperplane and can be calculated by quadratic programming. In prediction mode, the model was applied to the test data to predict the classification of unknown data.

The linear kernel was chosen for our analysis where training vectors, xi, are mapped into a higher dimensional space by the function {Phi}.

After solving for the optimal separating hyperplane, predictions can be made for samples that are not in the training set. This involves predicting the associated +1/–1 label for each test sample by determining which side of the hyperplane it falls on.

Given a training set of instance-label pairs (xi,yi), i = 1,...,l where xi isin Rn and yisin {–1, +1}, the SVM solves the following optimization problem:


Formula

SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space. The decision function is:


Formula

The output decision values were used to generate ROC curves.

2.3 Statistical analysis
All the vectors in our dataset are separated into two parts (training and test), of which, the test set is considered unknown when training the classifier. In 10-fold cross-validation, the training set is divided into 10 subsets of equal size. Sequentially one subset is tested using the classifier trained on the nine remaining subsets. Thus, each instance of the whole training set is predicted once, so the cross-validation accuracy is the percentage of data that are correctly classified. The goal of cross-validation here is to assess out of sample accuracy.

Logistic regression analysis and t-tests were performed using SPSS software (version 15.0). All data are reported with P < 0.05. Discriminant function analysis was also performed over the dataset using the built in SPSS function. A leave-one-out feature was used to calculate the cross-validation accuracy of the discriminant analysis function.

We created ROC curves using an R script that took the SVM classifier's prediction scores and plotted recall versus false positive rate at each classification threshold.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 
This article introduces a quantitative scheme to functionally represent regulatory sequences and a robust predictive model to capture specific transcriptional regulatory regions in a genome for experimental analysis.

3.1 Current approaches
There is an abundance of bioinformatics tools currently available for identifying regulatory modules based on sequence matching using probabilistic models and gene expression features (Blanchette et al., 2006; Liu et al., 2008; Segal et al., 2003; Sinha et al., 2003). However, the paradigm in sequence analysis assumes evolutionary constraints and hinges on the idea that comparing sequence alignments from multiple species will separate regulatory elements from background sequences (Cliften et al. 2003,Kellis et al. 2003,Liu et al. 2008). Notwithstanding this likelihood, an ab initio analysis compared the widely used phastCon scores available at the UCSC genome browser (http://genome.ucsc.edu/). Our comparison of sequence alignment scores for functional regulatory modules shows a significant loss in precision with an increasing number of vertebrate species (Fig. 2). As expected, the random genomic fragments showed no significant change with an increasing number of species.


Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. The conservation score (phastCon) uses a hidden Markov model to estimate a log-likelihood of a sequence alignment. This PhastCons algorithm produces a score between 0 and 1 that is used to indicate the overall level of conservation for a DNA segment across vertebrate species (e.g. phastCon4 = 4 species). To minimize bias, DNAs containing highly conserved, coding exon sequences or non-conserved sequences such as LTRs and Long Interspersed Elements (LINEs) were excluded from this analysis. For phastCon4 scores, functional sequences (NFR.n = 114, mean = 0.820) report (t = 6.520, DF = 215) when compared to random fragments (In silico.n = 103, mean = 0.586). For phastCon17 scores, functional sequences (n = 114, mean = 0.585) report (t = 1.332, DF = 217) when compared to random fragments (n = 105, mean = 0.513). The box size indicates the distribution of fragments and the middle bar represents the median or where the majority of fragments scored. This analysis shows that the Functional DNA sequences are more highly conserved relative to Random fragments as evident from the number of elements that display higher PhastCons scores.

 
Studies have shown that conserved areas in the genome have shown correlations with regions containing regulatory elements (Pennachio et al. 2006). Although conservation is a key feature for regulatory elements, incorporating more genomes into the phastCon algorithm yields a lower overall score as shown with our functional sequences. Thus, short regulatory elements may be missed if the genomic sequences that are being aligned come from distant organisms. If the organisms are evolutionarily close, however, the alignments could be extensive and therefore generate a large number of inaccurate elements. Regardless, multiple species alignments may still be equally useful in finding regulatory modules, but a complementary method may increase its overall accuracy. Using the same dataset, we attempt an SVM approach with the hope of showing more precision than using a sequence alignment-based approach.

3.2 Analysis of features
We performed a post hoc analysis to uncover characteristic differ-ences between NFR, in silico, and VEB sequences. In a t-test using a 95% confidence interval, we interpreted values (t > 3) as significant. All data can be referred to Supplementary Table 1 and more information about these features can be found at the UCSC Genome Browser website (Karolchik et al. 2008).

Transcriptionally active areas include promoter regions and tran-scription start sites that often have conserved regions, a large percentage of CpG islands, as well as GC-rich areas (Siepel et al. 2005). Significant differences between NFR modules and random genomic fragments included CpG Islands (t = 10.49), GC-rich areas (t = 12.6) and mammalian conservation (t = 6.39). However, VEB fragments only showed changes with mammalian conservation (t = 11.08) and in GC-rich areas (t = 4.59). When we compared NFR to VEB, we saw large differences in GC-rich areas (t = 26.79) and CpG islands (t = 10.66).

Sequence tagged sites (STS) are ~200 bp, while expressed sequence tagged sites (ESTs) are ~500 bp and both serve as markers for transcribed regions in the genome. These tagged sites can be found in GenBank. Additionally, NFR modules displayed a higher change in spliced (t = 6.52) and unspliced ESTs (t = 3.87).

Non-coding sequences compose over a third of the human genome and are typically derived from interspersed transposable elements during genomic evolution (Smit 1996). We defined features of these sequences as neutral since they are not directly associated with transcriptional activity. Tandem repeats have been associated with some regulatory functions or pathological roles; however, short interspersed elements (SINEs), long terminal repeats (LTRs) and low complexity are characteristics of degenerated sequences. Microsatellites include di- and trinucleotide repeats as well as other simple repeats maybe associated with roles in disease as well. Likewise, VEB sequences showed changes in SINE (t = 3.31), LTR (t = 3.42), low complexity DNA (t = 3.36) and short sequence matches (t = 9.85). Surprisingly, comparing NFR modules to VEB sequences revealed a compelling difference (t = 23.10). We report that NFR modules contain a higher number of short sequence matches (t = 5.66).

We interpreted physiological mouse tissue expression by using the Affymetrix chip data, however, only the Affy MOE430 chip (t = 3.59) showed significant differences. The mouse tissues covered include an array of brain and other physiological tissues (Karolchik et al. 2008).

RNA coding features are based on regions where reverse transcription into corresponding cDNA was reported. The unspliced ESTs refer to sequences that did not use 5'-caps, thus there is a possibility of unprocessed mRNA or non-mRNA contamination. Non-mouse RNA displayed translated blat alignments of vertebrate and invertebrate mRNA from GenBank of organisms other than mouse. NFR modules exhibited a higher fold difference of mouse RNAs (t = 4.37) and RNAs from other species (t = 2.67). Vista sequences show a high change in RNAs from other species (t = 9.45). NFR sharply differed from VEB sequences with predicted RNA transcripts from other species (t = 13.89).

A category described as protein-coding features is derived from an array of algorithms used to predict genic regions in the genome. NFR modules exhibited strong differences from random fragments based on models including MGC Genes (t = 4.61), Aceview Genes (t = 4.09), N-SCAN (t = 3.02), GeneTrap (t = 3.91), Exoniphy (t = 3.66) and Alternative Events (t = 3.34). VEB sequences showed a difference in SGP genes (t = 3.34).

By reason of a 10-fold size difference, the VEB sequences are capable of capturing more features imposing a bias in the t-test analysis. Albeit, their performance in the SVM analysis shows that their functional behavior is similar to NFR elements. This exemplifies the advantage of using a machine learning approach rather than a simpler classification technique that associates the distribution of features with functional and random sequences.

3.3 De novo prediction of modules
In order to understand the features associated with active sequences, we implemented a forward stepwise logistic regression to classify random and NFR sequences (random = 0, NFR = 1). We found that the model improves performance as we go from step 1 to step 5 with corresponding increasing values of R2 and decreasing –2log likelihood ratios (computed out of sample). Thus, using just five features (GC%, CpG islands, splice sites, short sequence matches and conservation), we can classify NFR and random sequences with 86.0% accuracy (P < 0.05).

Similarly, with random and VEB sequences (random = 0, VEB = 1), we found that the model improves performance as we go from step 1 to step 8. Thus, using just eight features (CCDS, non-mouse RNA, DNA repeats, tandem repeats, short sequence matches, SINEs, low complexity and conservation), we can classify random and VEB sequences with 84.4% accuracy (P < 0.05).

The logistic regression model is a reductionist way to identify novel modules, which is counterintuitive to a systems approach. Using the minimum number of features may not target distal regulatory regions in the genome (i.e. regions not associated with gene/promoter activity). Thus, using a more appropriate linear classifier such as an SVM may be more advantageous.

Overall, SVM using a linear kernel showed higher cross-validation scores than logistic regression (Fig. 3A) or than SVMs with polynomial, radial basis or sigmoid kernels (data not shown).


Figure 3
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. (A) This graph compares the linear kernel (SVM) and logistic regression prediction performances. Cross-validation (10-fold) was measured on the training dataset. (B) Discriminant analysis was performed on the training and test sets used to generate the SVM model (P < 0.05).

 
In an initial test, we trained our SVM with only random sequences and ran test simulations on the NFR and VEB sequences. The prediction accuracies were significantly low (<50%) suggesting that random elements contain a combination of features that make them poor predictors of functional elements (data not shown). Our analysis using 50% of functional sequences and 50% random sequences verify our assumption that it can be predicted with higher accuracy (NFR > 95%, VEB > 85%). ROC analysis (Fig. 4) was implemented to compare the characterization of features against both sets of functional elements.


Figure 4
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Performance analysis of NFR (blue) and VEB (red) sequences. (A) This plot was generated by thresholding values calculated using a linear kernel in LIBSVM. The true positive rate (sensitivity) is plotted as a function of the false positive rate (specificity) for different cutoff points. Each point on the ROC plot represents a sensitivity/specificity pair corresponding to a particular decision threshold. A test with perfect discrimination (no overlap in the two distributions) has a ROC plot that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore the closer the ROC plot is to the upper left corner, the higher the overall accuracy of the test. (B) This plot was calculated from the logistic regression analysis.

 
As a control for our SVM results, we used SPSS to perform discriminant analysis, a classification function, on our dataset (Fig. 3B). As expected, the group classification accuracies are comparatively high with corresponding leave-one-out cross-validation values. This suggests that the results from the SVM model are not just random occurrences and that subsets of sequences have correlation to certain features.

A key aspect in our findings is that NFR modules slightly edged the VEB sequences in performance despite being about 10-fold smaller in length. Apropos of our model, this novel aspect is precedent since it enables us to identify very small regions in the genome making it easier to target specific transcriptional binding sites.

3.4 Random versus functional DNA
We have shown that using randomly selected fragments from the genome lead to poor predictors of functional transcriptional modules (<50% accuracy). The random fragments may have contained genic, intergenic and non-coding regions of the genome, but it was critical that they were not selected with any bias or criteria in mind except for size. Thus, they do not necessarily function as negative control elements, but rather neutral elements that were not pre-selected in any way (i.e. conserved or nucleosome-free). The parameters used in our SVM do not exclusively define features of regulatory elements, which is why we defined it as a systems model. In fact, based on the sequences we used, the view that our model merely classifies genic versus non-genic regions may have some validity. Regardless, we underscore that this is an ideal high-throughput approach for generating experimentally testable transcriptional regulatory modules.

Our initial experimental approach for isolating NFR modules in (Yaragatti et al. 2008) was ideal in design, but we reported only 20% efficiency in recovering functional sequences. Unequivocally, the approach used to generate VEB sequences suffers from biased targeting of conserved areas, yields very large fragments and has reported a high false positive rate (Pennachio et al. 2006). In addition, both experimental systems only tested for enhancer activity leaving much of the genomic landscape uncharacterized. Thus, the computational model introduced here could easily complement these functional assays by predicting loci-specific genomic sequences.


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 
The integration of sequence information using bit vectors is a gateway for functionally representing modules in a genome and a platform for future analysis. Ultimately, we hope this systems model works in tandem with current experimental approaches to uncover novel transcriptional regulatory elements. Using this quantitative framework, we hope to develop a computational toolbox for analyzing sequences and inferring gene regulatory networks.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 
We would like to specially thank Chris Stoeckert, Sridhar Hannanhalli and Ze Wang for reviewing this manuscript. We are indebted to Claudio Basilico and Lisa Dailey for their help. We also thank Leroy Hood and Scott Diamond for their support.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Trey Ideker

Received on June 26, 2008; revised on October 17, 2008; accepted on November 29, 2008

    References
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 ACKNOWLEDGEMENTS
 References
 

    Blanchette M. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res (2006) 5:656–668.

    Brown MP. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA (2000) 97:262–267.[Abstract/Free Full Text]

    Chiaromonte F, et al. The share of human genomic DNA under selection estimated from human–mouse genomic alignments. Cold Spring Harb. Symp. Quant. Biol (2003) 68:245–254.[Abstract/Free Full Text]

    Chih-Chung C, Chih-Jen L. LIBSVM, a library for support vector machines. last accessed date September 2008. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

    Cliften P. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science (2003) 301:71–76.[Abstract/Free Full Text]

    Djordjevic M. A biophysical approach to transcription factor binding site discovery. Genome Res (2003) 13:2381–2390.[Abstract/Free Full Text]

    Elnitski L. Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res (2006) 12:1455–1464.

    ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 447:799–816.[CrossRef][Web of Science][Medline]

    GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res (2006) 12:3585–3598.

    Karolchik D. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res (2008) 36:D773–D779.[Abstract/Free Full Text]

    Kellis M. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature (2003) 423:241–254.[CrossRef][Medline]

    Liu Y. Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res (2008) 3:451–458.

    Müller HM, Koonin SE. Vector space classification of DNA sequences. J. Theor. Biol. (2003) 2:161–169.

    Pennacchio LA. In vivo enhancer analysis of human conserved non-coding sequences. Nature (2006) 444:499–502.[CrossRef][Medline]

    Segal E. Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics (2003) 19:i273–i282.[Abstract]

    Siepel A. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res (2005) 8:1034–1050.

    Sinha S. A probabilistic method to detect regulatory modules. Bioinformatics (2003) 19:i292–i301.[Abstract]

    Smit AF. The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. (1996) 6:743–748.[CrossRef][Web of Science][Medline]

    Vapnik V. The Nature of Statistical Learning Theory. (1995) New York, NY: Springer.

    Visel A. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat. Genet. (2008) 40:158–160.[CrossRef][Web of Science][Medline]

    Waterston RH. Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 420:520–562.[CrossRef][Medline]

    Yaragatti M. Identification of active transcriptional regulatory modules by the functional assay of DNA from nucleosome-free regions. Genome Res (2008) 6:930–938.

    Yuh CH. Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science (1998) 279:1896–1902.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
D. Y. Lee and K. C. P. Li
Systems Diagnostics: The Systems Approach to Molecular Imaging
Am. J. Roentgenol., August 1, 2009; 193(2): 287 - 294.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/3/353    most recent
btn622v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yaragatti, M.
Right arrow Articles by Ungar, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yaragatti, M.
Right arrow Articles by Ungar, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?