Computational Genomics
Optimization of probe coverage for high-resolution oligonucleotide aCGH
1 Computer Science Department Technion, Israel
2 Agilent Laboratories CA, USA
3 Computer Science Department, Bar-Ilan University Israel
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The resolution at which genomic alterations can be mapped by means of oligonucleotide aCGH (array-based comparative genomic hybridization) is limited by two factors: the availability of high-quality probes for the target genomic sequence and the array real-estate. Optimization of the probe selection process is required for arrays that are designed to probe specific genomic regions in very high resolution without compromising probe quality constraints.
Results: In this paper we describe a well-defined optimization problem associated with the problem of probe selection for high-resolution aCGH arrays. We propose the whenever possible
-cover as a formulation that faithfully captures the requirement of probe selection problem, and provide a fast randomized algorithm that solves the optimization problem in O(n logn) time, as well as a deterministic algorithm with the same asymptotic performance. We apply the method in a typical high-definition array design scenario and demonstrate its superiority with respect to alternative approaches.
Availability: Address requests to the authors.
Contact: dlipson{at}cs.technion.ac.il
| 1 INTRODUCTION |
|---|
|
|
|---|
Alterations in DNA copy number are characteristic of many cancer types and are thought to drive some cancer pathogenesis processes. These alterations include large chromosomal gains and losses as well as smaller scale amplifications and deletions. Because of their role in cancer development, regions of chromosomal instability are useful for elucidating other components of the process. For example, since genomic instability can trigger the activation of oncogenes and the silencing of tumor suppressors, mapping regions of common genomic aberrations has been used to discover cancer-related genes. Understanding genome aberrations is important for both the basic understanding of cancer and for diagnosis and clinical practice.
Alterations in DNA copy number have been initially measured using local fluorescence in situ hybridization-based techniques. These evolved to a genome-wide technique called Comparative Genomic Hybridization [CGH, (Kallioniemi et al., 1993)], now commonly used for the identification of chromosomal alterations in cancer (Mertens et al., 1997; Balsara and Testa, 2002). In this genome-wide cytogenetic method differentially labeled tumor and normal DNA are co-hybridized to normal metaphases. Ratios between the two labels allow the detection of chromosomal amplifications and deletions of regions that may harbor oncogenes and tumor suppressor genes. Classical CGH has, however, a limited resolution (1020 Mb). With such low resolution it is impossible to predict the borders of the chromosomal changes or to identify changes in copy numbers of single genes and small genomic regions. In a more advanced method termed array-based CGH (aCGH), tumor and normal DNA are co-hybridized to a microarray of thousands of BAC, cDNA or oligonucleotide probes (Pinkel et al., 1998; Pollack et al., 1999; Lucito et al., 2003; Brennan et al., 2004; Bignell et al., 2004; Barrett et al., 2004). The use of aCGH allows the determination of changes in DNA copy number of relatively small chromosomal regions. Using oligonucleotides arrays the resolution can, in theory, be finer than single genes.
In fact, when using oligonucleotides the resolution appears to have no limitation as these can be designed to probe any region in the genome of interest. In reality the resolution is limited for two main reasons. One is the fact that not all genomic locations can be effectively probed. For example, genomic sequences that are not unique (genomic repeats) or genomic regions with extreme GC content limit the design of specific probes. The other limitation is array real-estatethe number of genomic regions that can be probed in one array. Note that for technology implementations that use redundant probes this number is much smaller than the number of features on the array.
In this work we address the joint optimization of these two resolution limiting factors. The methods we develop are useful in the context of designing high-definition arrays. These are arrays designed to probe specific genomic regions in very high resolution. For this purpose a dense set of probes is selected for the region of interest. We are interested in doing so while maintaining optimal coverage and without compromising probe quality constraints. Major determinants of probe quality are its expected sensitivity, which is predicted by thermodynamic evaluation of the probetarget complex, and specificity, which is inferred from the extent to which close repeats of the probe can be found in the genome. In previous work (Lipson et al., 2002; Li and Stromo, 2001; Mei et al., 2003; Rouillard et al., 2003) probe sensitivity and specificity have been studied mostly in the context of expression profiling. The background context in CGH is different and so is the thermodynamics but the overall considerations remain valid. The methods described in the current paper are completely independent from those used for assigning quality to candidate probes and can therefore be applicable to any such assignment, including one derived from experimental data.
Lucito et al. (2003) describe design criteria that are based only on an empirically derived quality assignment. That is, they choose to use all probes with a given quality or better without taking any uniformity considerations into account. This approach leads to having regions that are more densely probed than others as well as to coverage discontinuities. When studying a given genomic region we are typically not biased to any specific subregions and seek to obtain copy number information for all subregions. It is therefore best to have a uniform (equidistant) distribution of probes in the region. This way we best avoid coverage discontinuities and have comparable information about all subregions. However, a uniformly distributed set of probes will greatly compromise quality considerations. We may be selecting probes that are highly non-specific, for example. This paper describes methods that assume a given quality threshold above which probes are acceptable and then optimize the coverage using only these candidate probes and working with the array real-estate constraints.
The rest of the paper is organized as follows: in Section 2 we describe a new score that accounts for probe coverage, and formally define the optimization problem. In Section 3 we describe two algorithms that efficiently solve the defined problem: a fast stochastic algorithm, and a deterministic variant. Finally, in Section 4, we demonstrate the application of our method in a biological scenario, and compare the obtained results with some alternative methods.
| 2 MATHEMATICAL FORMULATION |
|---|
|
|
|---|
In this section we formally define the optimization problems associated with probe selection for aCGH arrays, as discussed in the Introduction. We believe that such a formal definition is of independent interest, as the problem is most commonly stated in ill-defined terms.
Two types of parameters are typically considered when evaluating candidate probes for a hybridization assay. Sensitivity is a measure of a probe's ability to strongly interact with its target, and is typically assessed by considering the thermodynamic stability of the probetarget complex. Specificity is a measure of a probe's ability to discriminate between its intended target and other non-specific molecules it might cross hybridize to, and is typically assessed by considering the similarity of the target to the expected molecular background (e.g. the entire transcriptome in an expression profiling assay, or the entire genome in an aCGH assay). Methods for predicting the sensitivity and specificity of candidate probes have been extensively studied in the past (see Introduction) and will not be considered here. Instead, we shall assume we are provided with a quality parameter q(p), specifying the overall predicted performance of the probe p.
When designing oligonucleotide probes for aCGH an additional criterion should also be taken into considerationthat of coverage. Specifically, when an array is designed for the purpose of pinpointing genomic breakpoints, an important consideration in the design is to minimize the uncertainty at which the breakpoints are mapped, wherever they may be. Thus, probes should be somehow uniformly spaced, so as to be not too far from the location of any possible breakpoint.
Accordingly, when designing an aCGH array, we are interested in selecting highest quality probes, with the best possible coverage. In this section we provide formal definitions that capture the intuitive meaning of these notions.
2.1 Probe quality
Let P be the set of candidate probes. For brevity, we identify each probe with its genomic location, thus
. Let
be the quality function associating a quality score with each probe. Intuitively, we are interested in using the best possible probes, i.e. those with the highest quality score. However, the high quality probes may not be evenly distributed within the genomic area of interest. Thus, choosing the globally best probes may result in poor coverage of the genome. Hence, rather than using the globally highest quality probes, we seek to use the locally best probes. Specifically, for a probe p
P and window size w, we define the w-local quality of p to be the percentile of q(p) within scores in the w window around p. Formally, let Pw be the set of candidate probes within the window [p w/2, p + w/2]. Then
![]() | (1) |
[0,1], the w-local
-good probes are all the probes p for which qw(p)
. We shall seek to use only probes which are w-local
-good, for some proper choice of w and
. We note that it is also possible to optimize for
, as explained later.
2.2 Probe coverage
As mentioned, we are also interested that the probes cover the genomic region of interest with the most granular coverage possible. Specifically, if there is a breakpoint at some point on the genome, we would like to be able to determine its location as precisely as possible. However, two factors limit the precision that can obtained:
- With a limited number of probes in the array, probes cannot be placed at each genomic location. Rather, they must be spread across the region of interest. In this case, the localization of genomic events can only be determined by the pair of flanking probes. We shall seek that for all genomic locations, the gap between this pair of probes is as small as possible.
- Some genomic regions do not contain any candidate probes, or only probes of poor quality. In such cases, large gaps between probes are inevitable. Any breakpoint occurring within these gaps can only possibly be localized to within the pair of candidate probes bordering the gap.
DEFINITION 2.1
Given a genomic region G, a set of candidate probes P and a parameter, a subset C = (c1, ... , ck)
P is a whenever possible (WP)
-cover of G with respect to P if for any genomic location x
G, the following holds. Let ci and ci+1 be the two selected probes closest to x from the left and from the right, respectively (if x < c1 then c0 is set to be the left-end of G, and for x > ck, ck+1 is the right-end of G). Then, one of the following holds:
For such a cover C, we say that the resolution of C is
- ci+1 ci
![]()
(i.e. the flanking selected probes are within
distance of each other), or
- there is no candidate probe between ci and ci+1.
.
Thus, a WP
-cover of G guarantees that for any possible breakpoint x, it can either be localized to within
base pairs, or to within the best resolution that could have been obtained even if all candidate probes would have been used. We believe that the notion of WP
-cover faithfully captures the requirements of probe selection.
2.3 The optimization problem
Given a fixed number of probes in the array, the problem of designing an aCGH array is therefore a bicriteria optimization problem: select a subset of probes that are (i) of high quality (ii) with high resolution. A standard approach to such bicriteria problems is to optimize one criterion, given a constraint on the other. In our case, this gives rise to the following two optimization problems:
PROBLEM 2.2
(Probe selectionresolution optimization). Given an integer k, genomic region G, window size w and quality threshold, find the minimal possible resolution
* and a probe subset C
P such that:
- |C| = k,
- C contains only
-good probes,
- C is a WP
*-cover of G with respect to the
-good probes.
PROBLEM 2.3
(Probe selectionquality optimization). Given an integer k, genomic region G, window size w and resolution threshold, find a maximum quality score
*, and a probe subset C
P such that:
- |C| = k,
- C contains only
*-good probes,
- C is a WP
-cover of G with respect to the
*-good probes.
In this paper we focus on the resolution optimization version, but also provide an efficient solution to the quality optimization version.
2.4 Problem variants
2.4.1 Multiple genomic segments
A simple but important variant of the problem involves multiple genomic segments. In many cases, an aCGH array is designed to assay more than a single genomic segment at a time, most typically different chromosomal segments. In this case we would like to uniformly cover all the genomic regions of interest.
2.4.2 Biased selection
Another useful variant of the problem involves biased selection of probes in certain genomic segments. The most typical application of this variant is for increasing the resolution of probes in gene coding regions at the expense of non-coding regions, while preserving the uniform resolution within each of the subtypes.
| 3 ALGORITHMS |
|---|
|
|
|---|
In this section we describe efficient algorithms for the probe selection problem, focusing on the resolution optimization version. We first show how to determine the minimal number of probes required for a WP
-cover, for a given
. We then use this procedure to search for the minimal
for a given number of probes. We provide a fast randomized algorithm, described in Section 3.3.1, that finds the optimal
in O(nlogn) steps. A deterministic algorithm with the same asymptotic performance is described in Section 3.3.2. Finally, we show how these algorithms can be extended to accommodate the variants of the problem that were described in Section 2.4.
3.1 Determining the
-good probes
Given a window size w, it is possible to calculate qw(p) for each p
P by scanning the genome with a sliding window of size w. As we slide the window from left to right, at each step, at most one candidate probe is added and one omitted. We maintain the quality scores of the probes in the window in a balanced binary search tree. This tree can be maintained in O(nlogm) steps, using any of the known balanced tree data structures (here, m is the maximum number of probes in a window of size w, n = |P|). For each probe p, its w-local quality score can be obtained in O(nlogm) steps, by finding its rank in the search tree. Thus, the w-local quality of all candidate probes can be determined in O(nlogm) steps. This also allows to determine the
-good probes, for any threshold
. From here and on, we denote by
(p1 < p2 <
<
) the sequence of
-good candidate probes, for the chosen threshold
.
3.2 Finding the minimal size for a WP
-cover
In order to solve the resolution optimization problem, we first consider the reverse optimization problem:
PROBLEM 3.1
Given a genomic region G = [gbeg,gend] and a fixed value of, find a WP
-cover C for G of minimal size.
Algorithm 1, based on a greedy approach, solves this problem in
steps.
Algorithm 1 Find a minimal size WP
-cover
|
CLAIM 3.2
The subset C returned by Algorithm 1 is a WP-cover of G.
PROOF. Consider a point x
G. Let ci be leftmost probe in C to the right of x. Consider two cases:
- ci is within distance
of ci 1.
- ci is not within distance
of ci 1. In this case, by the algorithm, ci 1 and ci are consecutive candidate probes in
.
CLAIM 3.3
There is no WP-cover C* for G with |C*|<|C|.
PROOF. Assume there is such a WP
-cover C*, with
. Let
be the first i such that
(there must be one since
). Accordingly,
. Two cases are possible:
- (ci ci1) >
and therefore (
.
- ci was chosen as the rightmost probe following ci1 such that (ci ci1)
. Consequently,
.
Let x be a genomic location strictly between the two selected probes
and
.
and there exists an unselected candidate probe ci between
and
contradicting the fact that C* is a WP
-cover.
3.3 Optimizing resolution
In the probe selection problem, we are given a number k of probes and seek to optimize the resolution. We do so by performing a binary search on the resolutions, using Algorithm 1 as the decision criteria. The recursive binary search procedure is described in Algorithm 2.
Algorithm 2 Find a k-sized WP
-cover with minimal
.
|
The procedure performs a binary search for the optimal
within the range [1,|G|]. Clearly, the optimal
is a distance between some two probes
(otherwise,
could be reduced to the closest distance). There are
such pairwise distances. Thus a balanced binary search could pinpoint the optimal value in
recursive calls. However, these
pairwise distances are not provided to us explicitly, and enumerating them all would take
steps by itself. Thus, we seek to somehow perform a balanced binary search on this
-sized space without explicitly enumerating it. Interestingly, this can be done in
time, as explained hereunder. The key element is to find a good split value (Line 4) in linear time. We provide both randomized and deterministic procedures to find such a split value efficiently.
Each invocation of the RecursiveOptimizeResolution procedure takes
steps (the complexity of FindMinCover). Provided that the depth of the recursion is
, the overall complexity is
.
3.3.1 Fast randomized split selection
We first describe a fast randomized procedure for the selection of the split value. The procedure provides that the expected number of recursive calls in Algorithm 2 is
. A description of the algorithm is provided as Algorithm 3, and an explanation follows.
Algorithm 3 Fast randomized selection of split value
|
Let X = X (low, high) be the set of all pairs of probes
pi,pj
such that the distance between the two is within the range [low, high]. We wish to find a split value split such that the number of pairs in X for which the distance is above split is roughly the same as the number of pairs for which the distance is under split. For a given starting probe pi, let
(i) be the smallest index such that
pi, p
(i)
X (i.e. |p
(i) pi|
[low,high]), and let h(i) be the largest such index. Clearly, for any j between
(i) and h(i),
pi, pj
X. The algorithm first determines
(i) and h(i), for all
. This can be completed in
steps, as explained later. Given these values, we can compute the size of X (Line 4), and choose a random element from this set (Lines 56). The distance of this pair is the split value (Line 7). It is easy to see that with probability half the split value is between the 25-th and 75-th percentile of the distances of pairs in X. Hence, after an expected
recursive calls of Algorithm 2, the recursion ends.
It remains to show how to efficiently compute
(i) and h(i). Note that
, and similarly for h(i). Thus, with a single scan over the indexes
, we can determine all
(i)s and h(i)s. We thus obtain:
CLAIM 3.4
Algorithms 2 and 3 solve the resolution optimization version of the probe selection problem inexpected time (where
is the number of
-good candidate probes).
3.3.2 Deterministic split selection
A deterministic procedure for split selection, which also runs in
time, can be obtained using the methods for selection in sorted matrices, directed sum matrices, point distances in
, and other multisets (6,17,1). A full description of the algorithm, as it applies to our setting, is provided in the appendix. With this algorithm we obtain:
CLAIM 3.5
The resolution optimization version of the probe selection problem can be solved deterministically insteps (where
is the number of
-good candidate probes).
In practice, the randomized algorithm is substantially simpler and faster, and provides the same results.
3.4 Algorithmic variants
We now show how to solve the different problem variants discussed in Sections 2.3 and 2.4.
3.4.1 Quality optimization
Suppose we wish to optimize quality, rather than resolution, as described in Problem 3. In this case we perform an algorithm similar to Algorithm 2, using a binary search on quality, rather than resolution. In this case the number of different quality values is bounded by n, so a simple binary search can be employed.
3.4.2 Multiple genomic segments
The case of multiple genomic regions is handled almost identically to the single segment case. To do so, note that for any collection of disjoint genomic segments G1, G2, ... , Gt, any WP
-cover C for this collection can be divided into a collection of disjoint covers C1, C2, ... , Ct, each constituting a WP
-cover for the corresponding Gj. Thus, the minimal WP
-cover for G1, G2, ... , Gt, is the union of the minimal covers for each of the Gjs. Accordingly, in the binary search algorithm (Algorithm 2), we create the minimal cover for the collection of segments (Lines 2 and 5) as the union of the individual covers. All the rest of the algorithms remain the same, with the understanding that the distance between probes in separate segments is infinite.
3.4.3 Biased selection
Suppose that we have two types of regions, say coding and non-coding, and we seek a cover for which the resolution in the coding regions is
more granular than that in the non-coding regions. To do so, we multiply all distances within the coding regions by a factor of
and run the algorithm described above. We will obtain an optimal cover, such that the resolution of the coding region is smaller by a factor of
from that of the non-coding region. This technique can be extended for more than two types of regions as well.
| 4 APPLICATION |
|---|
|
|
|---|
We demonstrate the application of UniProbe (Algorithm 2) on a typical scenario of high-resolution probe design. Assume we are searching for a genomic breakpoint at the 6 Mb p-terminus of chromosome 10. Given a set of candidate probes, we would like to select 3000 oligonucleotide probes that offer the best guaranteed resolution of detecting breakpoints. Figure 1 depicts a set of 82 500 candidate probes, taken from a probe database (unpublished data). Each point represents the genomic position and quality score of a single probe, predicted from thermodynamical consideration of the probe sequence and comparison to the entire genome background.
|
In theory, placing the probes at equal spacing would provide a guaranteed 2 Kb resolution, in the sense that the genomic distance between each pair of consecutive probes would be 2 Kb. However, achieving this theoretical resolution is impossible. For example, unclosed gaps in the genomic sequence at positions 050 Kb and 5.635.68 Mb [UCSC Genome Browser, Kent et al., (2002)] are genomic regions at which maximal resolution cannot be obtained. In addition, we require that highest quality probes are used while sustaining a high degree of uniformity. As can be seen in Figure 1 this dual objective may force us to use probes of lower quality in genomic regions that contain only inferior probes, e.g. at 50130 Kb or 4.975.07 Mb.
We compare the set of probes selected by UniProbe to two different naive methods for uniformly spaced probe selection. The first method (Qual) selects probes on the basis of their quality alone, assuming that randomness in the positions of the superior probes will lead to some degree of uniformity in their coverage. This method was described by Lucito et al. (2003) for designing a whole-genome representational oligonucleotide array, where the probes with the best empirical performance were chosen. Here, we select the 3000 probes with highest quality score. The second naive method (Bin) is based on binning: the total genomic segment is divided into 3000 equally sized bins, and the highest quality probe in each bin is selected. Note that no probes can be selected from empty bins, reducing the total number of selected probes to below 3000.
Figure 2a compares of the resolution performance of the different algorithmsQual, Bin and UniProbe with three different values of
: 0.5, 0.75 and 0.85 (w = 2 Kb). For each algorithm, and for each genomic distance d
6 Kb, we note the fraction of genomic positions in the complete segment that lie within gaps of size
d between adjacent selected probe. For comparison we depict the optimal performance that could be expected from probes that are placed at equal spacing along the segment (Unif). Figure 2b depicts the quality distributions of the selected probes.
|
It is clear that the algorithm Qual performs very badly, in terms of resolution, as could be expected from an algorithm that does not take uniformity directly into consideration, although the probe qualities are clearly superior. The algorithm UniProbe guarantees a whenever possible resolution of
= 2085,2186,2258 for
= 0.5, 0.75, 0.85, respectively, and some 80% of all genomic positions are located within gaps of size 
between adjacent selected probes. The remaining 20% are located within gaps that cannot be probed at the guaranteed resolution. The new algorithm outperforms Bin for all three values of
. The performance of Bin converges with UniProbe for values of d > 4000. This observation is explained by the fact that Bin may deviate from the optimal solution by a maximal factor of 2 (the maximal distance between probes in two consecutive non-empty bins is twice the size of the bin). Overall, the resolution obtained by UniProbe is close to optimal, with values of
approaching the optimal 2 Kb. By definition, this resolution is guaranteed for all genomic locations for which this is possible whereas the closest possible probes are guaranteed for the remaining locations, which are located within gaps in the candidate probe set. The quality of the probes selected by UniProbe, although slightly inferior to those selected by Bin, are still satisfactory high. | APPENDIX |
|---|
|
|
|---|
Here, we show how to deterministically find a split point in
. As noted, the algorithm follows the ideas of (Johnson and Mizoguchi, 1978; Salowe, 1989; Agarwal et al., 1993) for selection in sorted matrices, direct sum sets, point distances in
, and other multisets. We describe the algorithm as it pertains to our problem. The algorithm provides that we find a split value that is guaranteed to be relatively-balanced, i.e. for at least one-fourth of pairs in X the distance is above the split value, and for at least one-fourth of the pairsthe distance is below (recall that X is the set of all probe pairs
pi, pj
such that the distance between the two is in the range [low, high]). A pseudocode description of the procedure is provided as Algorithm 4, and an explanation follows.
In the procedure, we use the following notations. For each
we denote:
(i)smallest index such that |p
(i) pi|
[low,high]
- h(i)largest index such that |ph(i) pi|
[low,high]
- (d(i) = h(i)
(i) + 1number of pairwise distances within the range [low, high] that start at the probe pi
midpoint between
(i) and h(i)
distance from pi to its corresponding mid-point.
The values of
(i) and h(i) can be computed in
steps for all i collectively, as explained above in the description of the randomized procedure. The other values can be computed from
(i) and h(i) in an additional
steps.
Algorithm 4 Deterministic selection of split value
|
Consider the set of mid-point distances
. For a given
, let
![]() | (2) |
, necessarily
. Similarly, setting
![]() | (3) |
for all
. Thus, we seek to find an i for which both below (i) and above (i) are large. Specifically, let i* be such that:- |below(i*)|
|X|/4,
- below(i*) is the smallest possible, subject to the above constraint.
Then,
CLAIM 0.1
For i* as defined above,
- |below (i*)|
|X|/4
- |above (i*)|
|X|/4
PROOF.
- By definition |below(i*)|
|X|/4.
- Suppose that |above(i*)|<|X|/4
Let i1 = argmax{v(i):v(i)<v(i*)}. Note that below(i1)
above(i*) covers at least half of X. Thus, |below(i1)|
|X|/4. However, below(i1)
below (i*), in contradiction to the minimality of below(i*).
Thus, choosing v(i*) as a split value guarantees that at least a constant fraction of the search space is eliminated.
CLAIM 0.2
Algorithm 4 returns v(i*) insteps.
PROOF. We first prove that it returns v(i*). By induction we prove that at all recursive calls to the function RecursiveSearch, S always contains i*. Initially, S is
and the claim holds. Suppose that i*
S for some call of the recursive function. Consider the i for which v(i) is the median chosen in Line 4. If i* < i then i*
S. The value of
is exactly the size of below(i). Hence, the test at Line 8 will succeed and the next iteration will be with S = S and the claim holds. Similarly, if i*
i then i*
S+. Now the test at Line 8 will fail and the next iteration will be with S = S+, and the claim holds.
The running time of executing a single iteration of the function RecursiveSearch is dominated by Lines 47, all of which can be completed in O(|S|) steps. At each recursive call, the size of S is reduced by half. Hence, the entire process is completed in
steps.
| REFERENCES |
|---|
|
|
|---|
Agarwal, P.K., et al. (1993) Selecting distances in the plane. Algorithmica, 9, 495514[CrossRef].
Balsara, B.R. and Testa, J.R. (2002) Chromosomal imbalances in human lung cancer. Oncogene, 21, 68776883[CrossRef][Web of Science][Medline].
Barrett, M.T., et al. (2004) Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. PNAS, 101, 1776517770
Bignell, G.R., et al. (2004) High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res, . 14, 287295
Brennan, C., et al. (2004) High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer Res, . 64, 47448
Johnson, D.B. and Mizoguchi, T. (1978) Selecting the kth element in x + y and x1 + x2 + ... + xm. SIAM J. Comput, . 7, 147153[CrossRef].
Kallioniemi, O.P., et al. (1993) Comparative genomic hybridization: a rapid new method for detecting and mapping DNA amplification in tumors. Semin Cancer Biol, . 4, 4146[Web of Science][Medline].
Kent, W.J., et al. (2002) The human genome browser at UCSC. Genome Res, . 12, 9961006
Li, F. and Stormo, G.D. (2002) Selection of optimal DNA oligos for gene expression arrays. Bioinformatics, 17, 10671076.
Lipson, D., et al. (2002) Designing specific oligonucleotide probes for the entire S. cerevisiae transcriptome. In Second Workshop on Algorithms in Bioinformatics (WABI 02). LNCS, 2452, 491505.
Lucito, R., et al. (2003) Representational oligonucleotide microarray analysis: s high-resolution method to detect genome copy number variation. Genome Res, . 13, 22912305
Mei, R., et al. (2002) Probe selection for high-density oligonucleotide arrays. PNAS, 100, 1123711242.
Mertens, F., et al. (2002) Chromosomal imbalance maps of malignant solid tumors: a cytogenetic survey of 3185 neoplasms. Cancer Res, . 57, 27652780.
Pinkel, D., et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet, . 20, 207211[CrossRef][Web of Science][Medline].
Pollack, J.R., et al. (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet, . 23, 4146[Web of Science][Medline].
Rouillard, J., et al. (2003) Oligoarray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res, . 31, 30573062
Salowe, J.S. (1989) L-infinity interdistance selection by parametric search. Inform. Process. Lett, . 30, 914[CrossRef].
This article has been cited by other articles:
![]() |
S. Lemoine, F. Combes, and S. Le Crom An evaluation of custom microarray applications: the oligonucleotide design challenge Nucleic Acids Res., April 1, 2009; 37(6): 1726 - 1739. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

P is a whenever possible (WP) 

do
.
i + 1
.
do
]
is the number of 



then
then
)

