Bioinformatics Advance Access originally published online on January 18, 2007
Bioinformatics 2007 23(6):769-770; doi:10.1093/bioinformatics/btl655
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Biskit—A software platform for structural bioinformatics
1CRG, Systems Biology program, Dr Aiguader 88, E-08003 Barcelona, Spain, 2Unité de Bio–informatique structurale, CNRS URA 2185, Institut Pasteur, F-75724 Paris CEDEX 15, France and 3Swedish NMR Centre at Göteborg University, P.O. Box 465, SE-405 30 Göteborg, Sweden
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Biskit is a modular, object-oriented python library that provides intuitive classes for many typical tasks of structural bioinformatics research. It facilitates the manipulation and analysis of macromolecular structures, protein complexes and molecular dynamics trajectories. At the same time, Biskit offers a software platform for the rapid integration of external programs and new algorithms into complex structural bioinformatics workflows. Calculations are thus often delegated to established programs like Xplor, Amber, Hex, Prosa, Hmmer and Modeller; interfaces to further software can be easily added. Moreover, Biskit simplifies the parallelization of time consuming calculations via PVM (Parallel Virtual Machine).
Availability: The latest snapshot of Biskit, documentation and examples are freely available under the GNU General Public License at http://biskit.sf.net (alternate url http://biskit.pasteur.fr).
Contact: johan.leckner{at}nmr.se, raik.gruenberg{at}crg.es
| 1 INTRODUCTION |
|---|
|
|
|---|
Typical structural bioinformatics projects combine independent external programs with home made programming. This involves shuttling data between different software packages, each of which has specific demands on input formats and, in turn, often yields results in proprietary output formats. For small projects, converting data back and forth and executing various programs is usually manageable by hand, albeit being tedious work. Larger projects require automation and hence often confine themselves to in-house developed software only. Biskit was created out of our need for automating and parallelizing complex structural bioinformatics workflows—see Fig. 1 for some examples. It allows the rapid implementation of new algorithms and strategies without the need of reinventing the wheel for more established tasks.
|
By way of example, we have used Biskit for tasks like:
- analyzing molecular structures, complexes and molecular dynamics trajectories
- automated conformal sampling (Grünberg et al., 2004, 2006)
- automated homology modeling (to be published)
- protein-protein ensemble docking (Grünberg et al., 2004)
- quasiharmonic entropy calculations (Grünberg et al., 2006)
Biskit is a modular object-oriented python library. It currently runs on Unix-like systems (Unix, Linux, Mac OS X) but should also be portable to other platforms. The Python programming language allows for rapid development, is highly readable yet concise and comes with an extensive standard library. This is probably a reason why there already are a few related Python software projects, for example MMTK (Hinsen, 2000), MGLTools (Sanner, 1999) and Biopython (Hamelryck and manderick, 2003) although these projects have a somewhat different focus. Biskit concentrates on the structure-centered integration of diverse data and algorithms ranging from molecular dynamics simulations (Wang et al., 2001) and protein–protein docking (Ritchie and Kemp, 2000) to sequence searches and visualization. It seamlessly integrates about 15 popular external programs, such as Amber, Blast, Hex, Hmmer, Fold-X, Modeller, Prosa, Pymol, T-Coffee and Xplor. The complete list is given on the Biskit home page and further software can be easily added using a uniform interface (also see subsequent text).
Nevertheless, many calculations are performed within the Biskit modules themselves and our data models (Fig. 2) were tailored for easy access and efficient number crunching: class hierarchies are kept flat, and coordinates are stored in two- or (in case of trajectories) three-dimensional Numeric arrays. The manipulation of structures and other objects is modeled on the efficient handling of arrays in the Numeric Python module. The extraction, concatenation, reordering, or comparison of selected atoms, residues, chains, or time frames is thus a matter of simple commands that are easily combined.
|
External or internal calculations are readily wrapped into a consistent parallelization scheme based on the Parallel Virtual Machine (PVM) library to be distributed across a large number of computers. Data objects can be saved as pickled python objects and links between them persist between separate files. This avoids redundancies, speeds up data handling, and facilitates complex multi-step workflows with planned or accidental interruptions.
Biskit comes with various scripts (programs) for reproducing published workflows. These scripts are good starting points for anyone interested in exploring the package. The Biskit code is well-documented. A complete module reference can be found at http://biskit.sf.net. Almost every module moreover contains a dedicated test class that further exemplifies its use. The tests are combined to package-wide quality-control suites for the Python UnitTest framework.
The use of third-party programs inevitably raises installation and maintenance issues (comparable with those of Biopython and similar packages). Several measures aim to minimize such problems: (i) we streamlined the wrapping of external programs under a common class (Executor); (ii) we outsourced command line and run time environment parameters into dedicated and customizable setting files; (iii) test suits help to quickly spot problems with wrappers or installations; (iv) our web site provides detailed installation and troubleshooting instructions. Moreover, Biskit does not depend on any helper program and we expect that most installations will only have a subset of all possible external applications installed.
| 2 CODE EXAMPLE |
|---|
|
|
|---|
The following code calculates the rmsd between surface exposed backbone atoms of two closely related, but not necessarily identical protein structures. This very simple example demonstrates some typical structure manipulations as well as the interaction with Numeric methods and an external program.
import Biskit as B, Numeric as N
m1 = B.PDBModel("your/structure.pdb")
m2 = B.PDBModel("your/related/structure.pdb")
## align models to the same residue and atom content
i1, i2 = m1.compareAtoms(m2) # -> 2 lists of indices
m1 = m1.take(i1) # take atoms common with m2
m2 = m2.take(i2) # take atoms common with m1
## add surface & curvature data from SurfaceRacer program
d = B.PDBDope(m1)
d.addSurfaceRacer()
## mask for atoms with a relative exposure of >50%
surf_mask = N.greater(m1.profile(relMS), 50)
## get backbone-only, surface-exposed structures
bb1 = m1.compress(m1.maskBB() * surf_mask)
bb2 = m2.compress(m2.maskBB() * surf_mask)
## calculate the rmsd, superimpose first
rms = bb1.rms(bb2, fit=1)
The Biskit package and web site are a constant work in progress. We invite everyone to contribute corrections, improvements or their own modules and workflows.
| ACKNOWLEDGEMENT |
|---|
|
|
|---|
J.L. and R.G. were supported by fellowships from the Knut and Alice Wallenberg Foundation and the Boehringer Ingelheim Fonds, respectively. We thank Wolfgang Rieping, Michael Habeck, Olivier Perrin and David Giganti for discussions and code contributions.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontano
Received on October 20, 2006; revised on December 21, 2006; accepted on December 21, 2006
| REFERENCES |
|---|
|
|
|---|
Grünberg R, Leckner J, Nilges M. Complementarity of structure ensembles in protein-protein binding. In: Structure (2004) 12:2125–2136.[Medline]
Grünberg R, Nilges M, Leckner J. Flexibility and conformational entropy in protein-protein binding. In: Structure (2006) 14:683–693.[Medline]
Hamelryck T, Manderick B. Pdb file parser and structure class implemented in python. Bioinformatics (2003) 19:2308–2310.
Hinsen K. The molecular modeling toolkit: a new approach to molecular simulations. J. Comput. Chem. (2000) 21:79–85.[CrossRef][Web of Science]
Ritchie DW, Kemp GJ. Protein docking using spherical polar fourier correlations. In: Proteins (2000) 39:178–194.[CrossRef][Web of Science][Medline]
Sanner MF. Python: a programming language for software integration and development. J. Mol. Graph Model (1999) 17:57–61.[Web of Science][Medline]
Wang W, Donini O, Reyes CM, Kollman PA. Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. In: Annu. Rev. Biophys. Biomol. Struct. (2001) 30:211–243.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

