Skip Navigation


Bioinformatics Advance Access originally published online on January 18, 2007
Bioinformatics 2007 23(6):769-770; doi:10.1093/bioinformatics/btl655
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/6/769    most recent
btl655v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grünberg, R.
Right arrow Articles by Leckner, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grünberg, R.
Right arrow Articles by Leckner, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Biskit—A software platform for structural bioinformatics

Raik Grünberg 1,*, Michael Nilges 2 and Johan Leckner 3*

1CRG, Systems Biology program, Dr Aiguader 88, E-08003 Barcelona, Spain, 2Unité de Bio–informatique structurale, CNRS URA 2185, Institut Pasteur, F-75724 Paris CEDEX 15, France and 3Swedish NMR Centre at Göteborg University, P.O. Box 465, SE-405 30 Göteborg, Sweden

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CODE EXAMPLE
 ACKNOWLEDGEMENT
 REFERENCES
 

Summary: Biskit is a modular, object-oriented python library that provides intuitive classes for many typical tasks of structural bioinformatics research. It facilitates the manipulation and analysis of macromolecular structures, protein complexes and molecular dynamics trajectories. At the same time, Biskit offers a software platform for the rapid integration of external programs and new algorithms into complex structural bioinformatics workflows. Calculations are thus often delegated to established programs like Xplor, Amber, Hex, Prosa, Hmmer and Modeller; interfaces to further software can be easily added. Moreover, Biskit simplifies the parallelization of time consuming calculations via PVM (Parallel Virtual Machine).

Availability: The latest snapshot of Biskit, documentation and examples are freely available under the GNU General Public License at http://biskit.sf.net (alternate url http://biskit.pasteur.fr).

Contact: johan.leckner{at}nmr.se, raik.gruenberg{at}crg.es


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CODE EXAMPLE
 ACKNOWLEDGEMENT
 REFERENCES
 
Typical structural bioinformatics projects combine independent external programs with home made programming. This involves shuttling data between different software packages, each of which has specific demands on input formats and, in turn, often yields results in proprietary output formats. For small projects, converting data back and forth and executing various programs is usually manageable by hand, albeit being tedious work. Larger projects require automation and hence often confine themselves to in-house developed software only. Biskit was created out of our need for automating and parallelizing complex structural bioinformatics workflows—see Fig. 1 for some examples. It allows the rapid implementation of new algorithms and strategies without the need of reinventing the wheel for more established tasks.


Figure 1
View larger version (53K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Example of some possible Biskit workflows. The starting point usually is a structure file or even a sequence. Structures are stored in PDBModel objects that can be subjected to many different fates; additional information can be added to the model via PDBDope; the structure can be passed on to molecular dynamics simulations, docking, etc.

 
By way of example, we have used Biskit for tasks like:
  1. analyzing molecular structures, complexes and molecular dynamics trajectories
  2. automated conformal sampling (Grünberg et al., 2004, 2006)
  3. automated homology modeling (to be published)
  4. protein-protein ensemble docking (Grünberg et al., 2004)
  5. quasiharmonic entropy calculations (Grünberg et al., 2006)

Biskit is a modular object-oriented python library. It currently runs on Unix-like systems (Unix, Linux, Mac OS X) but should also be portable to other platforms. The Python programming language allows for rapid development, is highly readable yet concise and comes with an extensive standard library. This is probably a reason why there already are a few related Python software projects, for example MMTK (Hinsen, 2000), MGLTools (Sanner, 1999) and Biopython (Hamelryck and manderick, 2003) although these projects have a somewhat different focus. Biskit concentrates on the structure-centered integration of diverse data and algorithms ranging from molecular dynamics simulations (Wang et al., 2001) and protein–protein docking (Ritchie and Kemp, 2000) to sequence searches and visualization. It seamlessly integrates about 15 popular external programs, such as Amber, Blast, Hex, Hmmer, Fold-X, Modeller, Prosa, Pymol, T-Coffee and Xplor. The complete list is given on the Biskit home page and further software can be easily added using a uniform interface (also see subsequent text).

Nevertheless, many calculations are performed within the Biskit modules themselves and our data models (Fig. 2) were tailored for easy access and efficient number crunching: class hierarchies are kept flat, and coordinates are stored in two- or (in case of trajectories) three-dimensional Numeric arrays. The manipulation of structures and other objects is modeled on the efficient handling of arrays in the Numeric Python module. The extraction, concatenation, reordering, or comparison of selected atoms, residues, chains, or time frames is thus a matter of simple commands that are easily combined.


Figure 2
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. The four main data objects in Biskit. (A) A PDBModel encapsulates all atom and residue data from the pdb-file and a coordinate matrix. Any number of ‘profiles’ hold arbitrary additional atom- or residue centered data (e.g. surface area or conservation score). Profiles are automatically kept consistent with the underlying structure and follow, for example, the deletion or reordering of atoms, residues, or chains. (B) A Complex is basically two PDBModels with associated rotation/translation matrix, a matrix for intramolecular contacts and any number of additional fields. (C) A Trajectory is best described as a PDBModel with an additional time dimension. Hence, the 2D coordinate array of PDBModel turns into a 3D array. Profiles of arbitrary data can be assigned to the new time axis. (D) The simplest form of a ComplexList (used for rigid-body docking results) is a Complex with many different ligand orientations and contact matrices. There are more evolved forms of complex lists that handle complexes with changing coordinates.

 
External or internal calculations are readily wrapped into a consistent parallelization scheme based on the Parallel Virtual Machine (PVM) library to be distributed across a large number of computers. Data objects can be saved as ‘pickled’ python objects and links between them persist between separate files. This avoids redundancies, speeds up data handling, and facilitates complex multi-step workflows with planned or accidental interruptions.

Biskit comes with various scripts (programs) for reproducing published workflows. These scripts are good starting points for anyone interested in exploring the package. The Biskit code is well-documented. A complete module reference can be found at http://biskit.sf.net. Almost every module moreover contains a dedicated test class that further exemplifies its use. The tests are combined to package-wide quality-control suites for the Python UnitTest framework.

The use of third-party programs inevitably raises installation and maintenance issues (comparable with those of Biopython and similar packages). Several measures aim to minimize such problems: (i) we streamlined the wrapping of external programs under a common class (Executor); (ii) we outsourced command line and run time environment parameters into dedicated and customizable setting files; (iii) test suits help to quickly spot problems with wrappers or installations; (iv) our web site provides detailed installation and troubleshooting instructions. Moreover, Biskit does not depend on any helper program and we expect that most installations will only have a subset of all possible external applications installed.


    2 CODE EXAMPLE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CODE EXAMPLE
 ACKNOWLEDGEMENT
 REFERENCES
 
The following code calculates the rmsd between surface exposed backbone atoms of two closely related, but not necessarily identical protein structures. This very simple example demonstrates some typical structure manipulations as well as the interaction with Numeric methods and an external program.

import Biskit as B, Numeric as N

m1 = B.PDBModel("your/structure.pdb")

m2 = B.PDBModel("your/related/structure.pdb")

## align models to the same residue and atom content

i1, i2 = m1.compareAtoms(m2) # -> 2 lists of indices

m1 = m1.take(i1) # take atoms common with m2

m2 = m2.take(i2) # take atoms common with m1

## add surface & curvature data from SurfaceRacer program

d = B.PDBDope(m1)

d.addSurfaceRacer()

## mask for atoms with a relative exposure of >50%

surf_mask = N.greater(m1.profile(‘relMS’), 50)

## get backbone-only, surface-exposed structures

bb1 = m1.compress(m1.maskBB() * surf_mask)

bb2 = m2.compress(m2.maskBB() * surf_mask)

## calculate the rmsd, superimpose first

rms = bb1.rms(bb2, fit=1)

The Biskit package and web site are a constant work in progress. We invite everyone to contribute corrections, improvements or their own modules and workflows.


    ACKNOWLEDGEMENT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CODE EXAMPLE
 ACKNOWLEDGEMENT
 REFERENCES
 
J.L. and R.G. were supported by fellowships from the Knut and Alice Wallenberg Foundation and the Boehringer Ingelheim Fonds, respectively. We thank Wolfgang Rieping, Michael Habeck, Olivier Perrin and David Giganti for discussions and code contributions.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on October 20, 2006; revised on December 21, 2006; accepted on December 21, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CODE EXAMPLE
 ACKNOWLEDGEMENT
 REFERENCES
 

    Grünberg R, Leckner J, Nilges M. Complementarity of structure ensembles in protein-protein binding. In: Structure, ( (2004) ) 12, : 2125–2136.[Medline].

    Grünberg R, Nilges M, Leckner J. Flexibility and conformational entropy in protein-protein binding. In: Structure, ( (2006) ) 14, : 683–693.[Medline].

    Hamelryck T, Manderick B. Pdb file parser and structure class implemented in python. Bioinformatics, ( (2003) ) 19, : 2308–2310.[Abstract/Free Full Text].

    Hinsen K. The molecular modeling toolkit: a new approach to molecular simulations. J. Comput. Chem., ( (2000) ) 21, : 79–85.[CrossRef][ISI].

    Ritchie DW, Kemp GJ. Protein docking using spherical polar fourier correlations. In: Proteins, ( (2000) ) 39, : 178–194.[CrossRef][ISI][Medline].

    Sanner MF. Python: a programming language for software integration and development. J. Mol. Graph Model, ( (1999) ) 17, : 57–61.[ISI][Medline].

    Wang W, Donini O, Reyes CM, Kollman PA. Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. In: Annu. Rev. Biophys. Biomol. Struct., ( (2001) ) 30, : 211–243.[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/6/769    most recent
btl655v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grünberg, R.
Right arrow Articles by Leckner, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grünberg, R.
Right arrow Articles by Leckner, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?