Bioinformatics Advance Access originally published online on September 14, 2007
Bioinformatics 2007 23(24):3391-3393; doi:10.1093/bioinformatics/btm459
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EGEETomo: a user-friendly, fault-tolerant and grid-enabled application for 3D reconstruction in electron tomography
Department of Computer Architecture and Electronics, University of Almería, 04120 Almería, Spain
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Electron tomography is the leading technique to elucidate the structure of complex biological specimens. Due to the resolution needs, huge reconstructions are required. Grid computing has the potential to face the significant computational demands involved. However, there are a number of key issues, such as stability or difficult user-grid interaction, that currently preclude fully exploitation of its potential. EGEETomo is a user-friendly application that facilitates the interaction with the grid for the non-specialized user and automates job submission and supervision. In addition, EGEETomo is supplied with an automated fault recovery mechanism, which is key to make all the work transparent to the user. EGEETomo significantly accelerates tomographic reconstruction by exploiting the computational resources in the EGEE grid with minimal user intervention.
Availability: http://www.ace.ual.es/~jrbcast/EGEETomo.tar.gz
Contact: jrbcast{at}ace.ual.es or jose{at}ace.ual.es
| 1 INTRODUCTION |
|---|
|
|
|---|
Electron tomography (ET) has a unique potential to elucidate the structure of complex biological specimens at molecular resolution (Lucic et al., 2005). In ET, projection images from an individual specimen is acquired at different orientations with an electron microscope. The structure can then be derived by 3D reconstruction algorithms. Due to the resolution needs, large projection images (typically 1K x 1K, 2K x 2K or even 4K x 4K pixels) are required. ET on this scale yields large reconstructed volumes and requires an extensive use of computational resources and considerable processing time. Processing such huge reconstruction problems can take several computation days on a state-of-the-art workstation. There are other tasks in 3D electron microscopy with similar burden due to the large number of reconstructions required, such as angular assignment (Ogura and Sato, 2006) or parameter optimization (Bilbao-Castro et al., 2007) in single particle reconstruction.
Grid computing emerges as a powerful infrastructure where computation and storage can be distributed across a myriad of geographically dispersed machines (Foster and Kesselman, 2003). The application of grid computing to ET allows the memory requirements to be fulfilled and the computations to be performed in parallel, resulting in a much shorter overall reconstruction time (Fernández et al., 2007). On the other hand, using the grid has a cost. Due to its distributed nature and huge dimensions, it is an inherently unstable infrastructure. Thus, human intervention on the reconstruction process is necessary in order to monitor jobs execution and take decisions when something fails. This extra effort can be difficult to justify in terms of productive work. Also, some prior training is required for users to efficiently use the grid.
There have been previous studies on tomographic reconstruction over a computational grid that were mainly aimed to evaluate the performance (Fernández et al., 2007), obtaining high speedup factors. It was also shown that the interaction with the grid is extremely difficult from the user's point of view, and confirmed that continuous manual supervision of the jobs is required due to the grid failures.
EGEETomo has been developed to address the need of a user-friendly environment to facilitate interaction with the grid, hiding all the grid complexity to the user. EGEETomo has been implemented with special emphasis on stability, robustness and usability. EGEETomo works on the EGEE (Enabling Grids for E-sciencE) grid infrastructure (Gagliardi et al., 2005), but could be easily adapted to run on other different grids. The application is provided with intuitive, easy-to-use, graphical user interface (GUI) and with modules for automatic job and data management as well as fault tolerance and recovery. With all these features implemented, the user only needs to enter the grid user password and everything else will run transparently as a locally run program.
For the development of EGEETomo, some technologies were studied. Grid portal tools were considered as an attractive alternative since they ease development of grid applications. Nevertheless, they do not allow some wanted, specific capabilities, such as detailed execution statistics. Therefore, EGEE-Tomo was finally implemented at a programming level using C/C++ languages, Qt (Blanchette and Summerfield, 2006) as the GUI development toolkit due to its cross-platform compatibility, and lcg/edg commands. It has been tested to work on Unix/Linux systems where Qt is present and where a User Interface (UI, the machine that provides access to the grid) is properly configured. EGEETomo constitutes a base software platform to develop other grid-enabled applications that can take advantage of its reusable components. For example, grid jobs management and automatic data replication are implemented as independent and reusable modules. Also, some user interface components, like grid jobs monitoring, are also highly reusable.
| 2 METHODS |
|---|
|
|
|---|
2.1 Job submission and execution
From the user's point of view, using EGEETomo is as easy as providing the user's grid certificate (a kind of grid password), the files representing the electron microscope projections—in form of sinograms—and selecting some parameters as the reconstruction algorithm. For example, the user can select between WBP, SIRT or ART reconstruction algorithms. In the latter cases, the number of iterations for iterative algorithms should be also provided.
The single-tilt axis data acquisition geometry used in ET allows a data decomposition consisting of dividing the whole volume to be reconstructed into independent sets of slices (slabs) orthogonal to the tilt axis. The slabs can then be reconstructed in parallel. The user can select the size of the slabs, defining how to divide the whole reconstruction job into multiple smaller reconstruction subtasks. This is done through the slab size parameter.
For each slab, the application generates a grid job description file and an executable script. The job description file defines job requirements and is needed by the grid Resource Brokers (RB, a grid machine configured to act as a broker of grid resources, finding out which of all the available resources fits the best to accomplish the job) to assign resources to jobs. The executable script is what is really executed on a Worker Node (WN, a grid machine configured to run jobs). Among other things, the script will download slab data and the reconstruction program to the WN, will run the reconstruction program with the correct parameters and, if everything goes well, will replicate the results to some different Storage Elements (SE, a grid machine configured to act as grid user data storage).
As the execution evolves, the user gets run-time feedback on reconstructions progression. Information is presented as a table where, for each job (a reconstruction subtask), shows its current status (submitted, running, done, etc.). When the execution is finished, the application recovers the reconstruction results from the grid and transfers them to the user machine, completing the whole reconstruction process.
2.2 Fault tolerance
Fault tolerance is crucial on a dynamic environment like the grid. A grid like EGEE is a megainfrastructure where tens of thousands of computers are distributed worldwide; many different organizations configure and maintain their machines belonging to the grid; thousands of network links and terabytes of distributed storage co-exist in a heterogeneous environment subject to occasional failures.
EGEETomo provides fault tolerance in different ways. Before a reconstruction task can be performed on a WN, data representing ET projections must be uploaded from the UI to the grid SEs. Each data file, including the program that performs a reconstruction, is replicated across different SEs on the grid. Although EGEE, as well as other grid infrastructures, provides file replication services to allow more than one copy of the same file (Kunszt et al., 2005), the replication process must still be user-driven. EGEETomo automatically replicates the file to as many SEs as requested, taking care of possible failures and solving them (see Fig. 1). The number of replicas can be chosen by the user but, to reduce the odds of multiple SEs failure, a minimum of three per file is recommended. This allows data to be accessible even when one or more SEs are down. Also, reconstruction results are replicated back to different SEs so that they still can be recovered when SEs fail.
|
Even with file replication, it might happen that a file is corrupted and this is not detected during file transfer. To avoid the use of a corrupted file, MD5 checksumming is used (Rivest, 1992). MD5 is calculated for each file on the UI machine. Then, when data is downloaded from a SE to a WN to be used, its MD5 is calculated and compared to the original one. If they are not equal, another SE is chosen to download the file from it.
Fault tolerance for process recovery has also been included in EGEETomo. As described, the whole tomographic reconstruction is split into multiple smaller subtasks that are launched as independent grid jobs. Automatic job management on the grid is an important task. Jobs lifetime can be long and many failures can arise during it. From erroneous job submission to memory shortages on the WN can make the job fail. Although grids usually provide simple resubmission techniques, those are very limited. EGEE, e.g. allows the job to be automatically resubmitted for a limited number of times. EGEETomo provides lifetime job monitoring, performing necessary tasks to ensure the job is correctly finished and the results are correctly published to SEs. Submission, queuing and running time-outs are provided to prevent the job to remain stalled for ever. Job resubmission is performed when needed and status monitoring is done with a user-defined periodicity.
2.3 Statistical data
EGEETomo provides comprehensive execution statistics at the end of the whole reconstruction process. Such statistics show grid infrastructure behaviour like grid-introduced overheads, transfer time, job waiting and running times, throughput, etc. Such statistics may be used to implement advanced and adaptive scheduling techniques that should result in a more efficient process, with smaller overheads and overall reduced execution times.
| 3 CONCLUSION |
|---|
|
|
|---|
Grid technologies have emerged as promising alternatives to expensive supercomputers for distributed processing. However, there are a number of key issues, such as stability or the lack of intuitive interfaces, that currently preclude fully exploitation of its potential from non grid-trained scientists. EGEETomo provides a user-friendly interface that facilitates the interaction with the grid, automates job submission and supervision, and is supplied with fault-tolerance mechanisms to deal with different sources of failure in the grid. EGEETomo allows significant acceleration of huge reconstruction processes in electron tomography by exploiting the computational resources in the EGEE grid, with minimal user intervention.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Biocomputing Unit at the Centro Nacional de Biotecnologia (CSIC) for technical support and access to the EGEE grid infrastructure. TIN2005-00447 (Spanish MEC), P06-TIC-01426 (Junta de Andalucia), IST-2003-508833 and LSHG-CT-2004-502828 (EU).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontano
Received on June 28, 2007; revised on August 16, 2007; accepted on September 3, 2007
| REFERENCES |
|---|
|
|
|---|
Bilbao-Castro JR, et al. Parameter optimization in 3D reconstruction on a large scale grid. Parallel. Comput (2007) 33:250–263.[CrossRef]
Blanchette M, Summerfield M. C++ GUI Programming with Qt 4 (2006) Prentice Hall.
Fernández JJ, et al. Electron tomography of complex biological specimens on the Grid. Future. Gen. Comput. Syst (2007) 23:435–446.[CrossRef]
Foster I, Kesselman C. The Grid 2: Blueprint for a New Computing Infrastructure. The Elsevier Series in Grid Computing (2003) 2nd edn. Morgan Kaufmann.
Gagliardi F, et al. Building an infrastructure for scientific Grid computing: status and goals of the EGEE project. Philos. Trans. R. Soc. A-Math. Phys. Eng. Sci (2005) 363:1729–1742.[CrossRef]
Kunszt P, et al. File-based replica management. Futur. Gener. Comput. Syst (2005) 21:115–123.[CrossRef]
Lucic V, et al. Structural studies by electron tomography: from cells to molecules. Annu. Rev. Biochem (2005) 74:833–865.[CrossRef][Web of Science][Medline]
Ogura T, Sato C. A fully automatic 3D reconstruction method using simulated annealing enables accurate posterioric angular assignment of protein projections. J. Struct. Biol (2006) 156:371–386.[CrossRef][Web of Science][Medline]
Rivest RL. The MD5 message digest algorithm. RFC 1312 (1992) http://dret.net/rfc-index/reference/RFC1321.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
