Skip Navigation


Bioinformatics Advance Access originally published online on January 28, 2008
Bioinformatics 2008 24(5):727-728; doi:10.1093/bioinformatics/btn006
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/5/727    most recent
btn006v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schuemie, M. J.
Right arrow Articles by Kors, J. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schuemie, M. J.
Right arrow Articles by Kors, J. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Jane: suggesting journals, finding experts

Martijn J. Schuemie * and Jan A. Kors

Department of Medical Informatics, Erasmus University Medical Center Rotterdam, 3000 CA, Rotterdam, The Netherlands

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: With an exponentially growing number of articles being published every year, scientists can use some help in determining which journal is most appropriate for publishing their results, and which other scientists can be called upon to review their work.

Jane (Journal/Author Name Estimator) is a freely available web-based application that, on the basis of a sample text (e.g. the title and abstract of a manuscript), can suggest journals and experts who have published similar articles.

Availability: http://biosemantics.org/jane

Contact: m.schuemie{at}erasmusmc.nl


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
PubMed (Wheeler et al., 2007) is growing exponentially. In 1996, 520 148 articles were published versus 793 919 in 2006. Interestingly, the number of different journals in which these articles were published did not show a similar growth: 5006 in 1996 versus 5100 in 2006. There is a steady turnover: according to the PubMed Journals database, 1707 journals were started between 1996 and 2006. The number of authors publishing one or more papers every year does increase rapidly: 543 974 in 1996 versus 867 919 in 2006.

For all these authors, finding the appropriate journal to publish their work becomes increasingly difficult: many journals deal with a wide diversity of topics, and many articles are multi-disciplinary, leading for instance to computer scientists publishing in biomedical journals. At the same time, finding reviewers among the growing number of peers also becomes more of a problem. We developed Jane (Journal/Author Name Estimator) to help with both tasks.


    2 USING JANE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Finding journals and authors
The user starts by entering a piece of text as query (Fig. 1). Typically, this will be the title and abstract of the article for which the user wants to find a suitable journal or reviewer. The application will return an ordered list of results, with a confidence score for each item. Furthermore, it is possible to show the articles on which the score of a specific journal or author was based, as well as other similar articles. This can help a user to evaluate whether the journal is really the suitable medium for publishing his or her findings, or whether the selected author is really knowledgeable about the topic of the article used as input.


Figure 1
View larger version (69K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Screenshots of Jane. From right to left: (1) Starting screen: you can enter the text of your title and abstract, select additional options, and choose whether you want to find journals or authors; (2) Results screen: the application returns an ordered list of journals or authors. For each item, a confidence score is given, and an option to show the articles on which the score is based; (3) Results screen showing the articles for a journal: The user can choose to view these and other similar articles in PubMed.

 
2.2 Extra features
Users can refine their search by selecting specific languages and types of publications. The search algorithm will then compare the input text only to those articles that meet these specifications. For instance, by selecting ‘Japanese’ and the publication type ‘review’, the system will return those journals containing the most similar Japanese review articles.

Some authors may be hesitant to send an abstract of their latest research to an unknown server. Therefore, we have included an option to scramble the input before submission. Scrambling simply entails putting the words in the text in alphabetic order, which makes it next to impossible to reconstruct the original text, but has no effect on the search.


    3 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The open source search engine Lucene (Gospodnetic and Hatcher, 2005) is used to find articles that are similar to the input query. Texts are tokenized using the standard Lucene tokenizer, and are subsequently compared using the Lucene MoreLikeThis algorithm, a very efficient implementation of the traditional TF*IDF vector space model.

After retrieving the ordered list of most similar records, a weighted k-nearest neighbor approach is used to determine the journal or author list. For each item (i.e. a journal or author), we add the Lucene similarity scores for the articles belonging to this item in the k top-ranking records. To produce confidence scores, these sums are then normalized so that the scores add up to 100%. Results are ordered by confidence score. A leave-one-out evaluation showed that the best performance was achieved using k = 50.

We indexed all 4 171 368 articles from 4513 journals in Medline that

  • contained an abstract,
  • were published in the last 10 years,
  • did not belong to one of these categories: comment, editorial, news, historical article, congresses, biography, newspaper article, practice guideline, interview, bibliography, legal cases, lectures, consensus development conference, addresses, clinical conference, patient education handout, directory, technical report, festschrift, retraction of publication, retracted publication, duplicate publication, scientific integrity review, published erratum, periodical index, dictionary, legislation or government publication and
  • belonged to a journal with at least 25 publications in the last 10 years, and at least one publication in the last 12 months.


    4 COMPARISON WITH OTHER TOOLS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
PubMed itself offers the possibility to search for ‘similar articles’, but only existing Medline records can be used as queries. There are many other systems that offer some means of finding authors and/or journals, but they all use a boolean keyword-based query, for instance GoPubMed (Doms and Schroeder, 2005), and HubMed (Eaton, 2006).

One system, called eTBLAST (Errami et al., 2007), does accept full abstracts to search for journals and authors. It retrieves the 400 most similar articles using a vector-space approach, and for these articles a text-alignment score is calculated and aggregated per journal or author. We compared the performance of Jane to eTBLAST using a random set of 1000 citations entered into PubMed in the 3 days before the test, and were consequently not in the training sets of Jane and eTBLAST at that time. For each citation, we tested how well the systems predicted the authors of the paper, and the journal in which the paper was published.

Figure 2 shows that Jane outperforms eTBLAST (P < 0.001 and P = 0.010 for journals and authors, respectively, using a sign test to compare ranks). Furthermore, even though eTBLAST runs on a 20 CPU Linux cluster and Jane was tested on a dual CPU system, eTBLAST searches were much slower than Jane searches: the average search times were 114.0 and 0.6 seconds, respectively. Because eTBLAST currently has more users than Jane, we simulated an extra average load of 100 000 queries per day on our server whilst determining our search time.


Figure 2
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Cumulative histogram of the rank of the correct journal and the highest ranking correct author in the result lists of eTBLAST and Jane for a test set of 1000 abstracts (e.g. for Jane, the correct journal appeared at the top of the list for 23% of the abstracts, it appeared in the top 2 for 36% of the abstracts, etc.).

 

    5 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Jane is a simple, fast and accurate tool for finding journals and authors, as compared to other such tools.

We tested how well Jane predicts the journal in which a paper was published, assuming that this journal was the most appropriate one. Obviously, this may not always be the case since many journals overlap considerably and journal choice may be influenced by many factors. In a qualitative analysis of a small sample of the abstracts for which the correct journal did not appear in the top 10, we believe that the abstracts would also have been appropriate for many of the top-ranking journals returned by Jane. The same holds true for authors: although we can assume that an author is knowledgeable about the paper (s)he wrote, other, more experienced authors might qualify as better experts.

Jane is freely available. The underlying database of indexed abstracts will regularly be updated.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This study was supported by the Biorange project sp 4.1.1. of the Netherlands Bioinformatics Centre.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on October 31, 2007; revised on December 15, 2007; accepted on January 2, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 USING JANE
 3 IMPLEMENTATION
 4 COMPARISON WITH OTHER...
 5 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Doms A, Schroeder M. GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res (2005) 33:W783–W786.[Abstract/Free Full Text]

    Eaton AD. HubMed: a web-based biomedical literature search interface. Nucleic Acids Res (2006) 34:W745–W747.[Abstract/Free Full Text]

    Errami M, et al. eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications. Nucleic Acids Res (2007) 35:W12–W15.[Abstract/Free Full Text]

    Gospodnetic O, Hatcher E. Lucene in Action. (2005) Greenwich: Manning Publications.

    Wheeler DL, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2007) 35:D5–D12.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Z. Lu, N. Xie, and W. J. Wilbur
Identifying related journals through log analysis
Bioinformatics, November 15, 2009; 25(22): 3038 - 3039.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/5/727    most recent
btn006v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schuemie, M. J.
Right arrow Articles by Kors, J. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schuemie, M. J.
Right arrow Articles by Kors, J. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?