Career Advice on PubChase

We are launching a “Career Advice” forum on PubChase. There is an interface to anonymously submit questions and we assembled a panel of good mentors so that a single question can have 2-3 experts replying to it. This is not a Q&A forum for research questions. Rather, it is a space to get mentoring help for science life-related issues (problem with advisor, how to transition to industry, what to do with competitive labmate, etc.).

The motivation for it is a serious problem in academia. Academic faculty appointments at universities do not select for teaching or managing ability. We look for talented scientists, not mentors or teachers. And as with teaching, there is a natural distribution – some mentors are gems, most are mediocre, and some are nightmares.

If you are a group leader at Merck or Novartis, you will be trained on how to manage people and be a boss. Alas, no such mandatory training exists for professors. The tragedy of this is that a researcher at Merck, even if the boss is a disaster, can switch to Sanofi, Genentech, and so on (this researcher already has a PhD and likely has worked in the industry for many years). But a graduate student or postdoc is in a delicate relationship where a switch from an abusive mentor is far from easy. Advisors hold a power over their lab members that can be devastating when misapplied.

And even if you have a terrific mentor, some times you need advice from a biotech founder, from a pregnant graduate student, from an editor, from a junior professor in a university across the ocean, and so on. Therefore, the responses on this forum will be semi-crowdsourced, with a mixture of replies from our mentors and other users of PubChase.

We welcome all to contribute not just questions, but experiences and suggestions for dealing with difficult situations. So please, ask, answer, and comment!

P.S. Several scientists have asked why a new forum on PubChase instead of simply using Twitter. Here are the main differences:

1. If you don’t have a twitter account, you can start one and ask a question. But you will have zero followers and will get zero answers. On PubChase, you will get answers right away.

2. I want to ask a question about a problem with my advisor. But my advisor follows my twitter feed. Not a good idea. Twitter is not anonymous. In contrast, on PubChase, you will have the option of asking the question non-anonymously, but it will be up to you. Same in reverse – to answer, if it’s not controversial, you will display your name and be seen in the advising role. But if it’s a touchy subject, you will be able to post the reply anonymously.

3. There is no mechanism on Twitter for the good replies to float to the top. We will have StackExchange-like voting to make it easy to get the good replies instantly.

4. Twitter is ephemeral. Probably 80% of the time we all have the same questions and problems in science. On Twitter, there is no way to see answers to your question. On PubChase, it will be tagged, searchable and visible, even if the answer to your question was supplied a year ago.

5. When a student comes to you to talk about anti-gay slurs and jokes from the lab mate, is your advice and response going to be 140 characters? True mentoring  needs to be unconstrained by character counts.


Posted in Uncategorized | 3 Comments

PubChase and Mendeley Synchronization

Since October, we have been working with Mendeley to implement synchronization with the users’ PubChase libraries.  In your “library settings” on PubChase, you can now turn on the Mendeley sync. Any article you add to PubChase will show up in your Mendeley library, and vice versa. Also, inside the PubChase setting on our mobile suite Bench Tools, you can sync from your iOS and Android devices.

It may seem like we are just announcing a “button”, but in reality, this is big news for the following reasons:

  •  The synchronization of PubChase and Mendeley means that there is now a completely free system for discovering new biomedical papers, managing them, and then easily citing them while writing a manuscript. You can get our personalized recommendations based on your PubChase/Mendeley library. You can search PubMed in a smarter way, based on your library. And you can do it from  any iOS or Android device (iPad, iPhone, HTC, Kindle…) in addition to the web.

  • In a matter of seconds, Mendeley users can open a free PubChase account, turn on the sync, and start getting personalized recommendations the very next day.

  • The forte of ZappyLab (the company that built PubChase) is mobile and web technology. We have no plans to build citation-management plugins. So, from the beginning, we wanted to make it easy for our users to import their libraries from Mendeley, Papers, Endnote, and other software. However, adding newly recommended papers to your PubChase library means that you then have to export your PubChase library to the other software when writing a paper. The Mendeley sync greatly simplifies this process.

  • Though our Endnote and Papers users keep asking for a similar integration, those tools do not have open APIs, so it is not an option for us. Therefore, we will be focusing our developmental efforts on deeper integration with Mendeley.  Folder organization, and PDF accessibility are the next steps.

*** Keep in mind that this is true synchronization – anything you delete from PubChase will also disappear from your Mendeley library.

*** Because PubChase is for biomedical literature, only articles for which we can find a PubMed id will make it from Mendeley into your PubChase library.

Posted in Uncategorized | 3 Comments

Blog and Tweet to Market Your Research!

Marketing your final product is essential, be it in a start-up or a research group. As a partner at a top venture capital firm emphasized to us, viral adoption of apps and software is a myth1. Entrepreneurs either fail or quickly learn this fact. Similarly, believing that your research paper will magically find the target readers just because you published it is naïve. Yet, when talking to scientists about the PubChase article-level blogging platform, half think that it is a great idea and half are very skeptical and resistant, arguing:

  1. Blogging is a waste of time
  2. This is self-promotion, akin to advertising
  3. The process of publishing a paper is demoralizing and once out, they don’t want to think about it again

The editor-in-chief of a famous academic science journal told me, “Lenny, blogging is the end of science. If my students have the time to blog about their work instead of doing the research itself, they must have run out of things to figure out.” Also, once their paper is out, scientists often no longer want to think about it and are eager to move on to the next exciting discovery. I understand this feeling very well, as the excruciatingly-long delay between writing and publishing means that by the time the paper is in print, you are deep into another project. However, these attitudes strike me like neglecting your child after the birth. The reason we publish research is not because it looks pretty after typesetting, but because we want to advance knowledge and contribute to progress. And if no one knows about your new publication, you have advanced nothing. If blogging and tweeting helps to bring readers to your research, then it is crazy not to do so (and judging by our data, blogging is a great way to increase exposure for the underlying research – depending on the essay, half or more of the essay readers go on to read the article itself2 ).


Essay readers also read the research article – counts of essay and corresponding article views for Jasper Rine’s (top) and Lenny Teytelman’s (bottom) posts.

The hesitation to use social media and blogging to promote one’s research because of the way it is perceived by peers is a bit harder to address. One professor who blogs a lot about her research to a huge following said to me that if she writes an essay for PubChase, she has to be careful to ensure that it does not come across as an advertisement. Another professor said that use of Twitter to announce publications is frowned upon by faculty as “self-promotion”. A few weeks later, I asked the same professor why he spends 20-30% of his time travelling, and he replied, “The main reason I travel so much is to give talks, and the reason for that is to get the word out about my research.” Similarly, one of my mentors from graduate school detests the idea of tweeting one’s research, but he advised me to forward my newly published papers directly to scientists who may find it useful. So for scientists who clearly see the value in advertising their findings, why is it okay to promote the research at a conference but not on Twitter?

The perception is that a talk at a conference targets the right audience, but a tweet goes to all of your followers and is more like an annoying television ad. This is incorrect. The beauty of Twitter is that users freely decide whether or not to follow someone. And if scientists subscribe to my feeds, it is reasonable to assume that many of them are interested in exactly the work that I do. Also, no matter how specialized the conference, you are almost guaranteed that many, if not most, of the people attending your talk are not particularly interested in what you have to say. The great aspect of twitter is that over time, people not interested in your tweets unfollow you, and you end up with exactly the audience that is curious about your research. In the same way, when you write an essay on PubChase, it is likely to be seen by the right scientists. Most of the readers come to the essays via Twitter, which targets the appropriate audience. Moreover, since each essay is tied to a given publication and is visible whenever the article comes up in search results or recommendations, the essay is not randomly being forced on disinterested users.

Of course, the very act of publishing is marketing. The reason so many invest extraordinary efforts to publish research in top journals is to maximize the visibility of their work. This investment is costly to science in many ways and wastes a staggering amount of energy and time on behalf of scientists. The publication process is typically depressing and demoralizing345. As a consequence, once published, many authors are so scarred by the process that the “story behind the paper” becomes a tale of rejections and rebuttals. This is the third most common reason for declining an invitation to write an essay about an exciting publication. Hopefully, by treating blogging and social media promotion of research as a digital conference, scientists can be more effective in their marketing and can waste less of their time on struggling to publish in the glossy journals.

  1. VC Partner: “The Strava app for cyclists was flawlessly implemented. When you are on a bike, you are not caring a computer. There was virtually no competition. Yet the app got nowhere until Lance Armstrong started promoting them.” []
  2. Viewing counts are for Jasper Rine’s and Lenny Teytelman’s essays []
  3. Professor at University of Washington: “Thanks for the invitation to share a story.  I’ll see if I can come up with one, but to be honest, publishing has become such a war that once a paper is out, I think I try to forget the details of what went on as soon as I can.” []
  4. Professor at Brandeis: “Not sure what dirt I want to dish about my papers. I could tell how it took 6 tries to publish our recent manuscript or how other of our most frequently cited papers were rejected from various journals.” []
  5. Professor Arjun Raj, in his PubChase essay: “…But this publication process did leave some serious scars… It’s not the fact that it got rejected, it’s the way it went down. It really made me doubt myself as a scientist for many months, and it definitely somewhat tempered my personal excitement once we finally published it, along with the fact that many important results ended up in the supplement with only the scarcest of mentions after the paper got demoted to a “Brief Communication”. What’s even worse about it is that I think the process really turned Marshall [Arjun’s student] off of academic science. Perhaps he should have a thicker skin about it, and perhaps it’s a lesson about how the world is not always a fair place. Fine, but I think we should strive to make things better in the scientific process rather than just shrug and tell people to ‘suck it up'”. []
Posted in Uncategorized | 5 Comments

What hurts science – rejection of good or acceptance of bad?

Yesterday, Science published a story by John Bohannon about acceptance of a fake and deeply-flawed paper at open access journals, despite peer review. Disturbingly, 157 journals accepted the bogus article and only 98 rejected it. Scientists and some journalists swiftly pointed out the grave problems with this attack on open access – this sting operation highlights problems with traditional peer review, but it says very little about open access, as the same experiment was not performed on subscription journals1.

To me, the stunning part of this is that a journal with the title “SCIENCE” published a fake study, without a control, about the problem of accepting fake studies. But the bigger question is – how much damage is there from publication of poor science? Does anyone really read the journals that accepted this?

I took the 157 journals that accepted Bohannon’s fake paper and asked how many articles are in the libraries of the PubChase users from them. The answer is that out of over 75,000 articles of our users, only 5 are from this set (all five are from the single journal Bioinformation). In contrast, our users have 1,631 articles from 12 of the 98 journals that rejected the paper2.

The real problem in science is not that bad papers get published; that has always been and will continue to be the case. The real problem is that good and important papers are rejected and delayed from publication by journals such as Science. These delays hurt the progress of science and they demoralize and ruin careers.

Finally, when it comes to publishing bad research, Science is not the journal that should be pointing fingers. The 2011 editorial “Retracted Science and the Retraction Index” showed unambiguously that the higher the journal’s impact factor, the higher its retraction rate. Not surprisingly, Science had the second worst retraction rate of all the journals considered in that editorial.

  1. Great list of the responses here. []
  2. Of the 98 journals that rejected the fake paper, PubChase users have articles from 12 of them: PLOS One, mBio, Neurosurgical focus, International journal of biological sciences, Chinese medical journal, American journal of nuclear medicine and molecular imaging, Carcinogenesis, Yonsei medical journal, Current issues in molecular biology, Anti-cancer drugs, Immunome research, Environmental health perspectives. []
Posted in PubChase | Tagged , , | 20 Comments

Staying Current on Publications: PubChase or RSS?

We have just published an excellent guide by James Fraser on using RSS to find relevant new research papers. James does not believe that staying current on publications is really a problem that requires a service like PubChase to tackle it. His proposed solution relies on identifying about a dozen most pertinent journals and scanning new publications in them with RSS, along with keyword and citation-based searches.

Below are our responses.


[By Matt Davis]

Some people are both keenly observant of street signs and also have an exceptional sense of direction. When they go to a new city, they just need a quick initial glance at the city map, and then have no need for Google Maps on their iPhone to get around.

But if the goal is to keep people from being lost, then Google Maps certainly helps advance that goal.

So, one could do everything you’ve described, and if one had your reportedly well-above-average ability to read and absorb material, and the dedication to stare at his/her iPhone during each spare moment that you describe, then one would find it as easy as you do to keep up.

But most people don’t meet those prerequisites, so why not make it easier for them by using a little statistics to prioritize things for them?


[By Lenny Teytelman]

Though my opinion as a co-creator or PubChase is clearly very biased, I think there are a number of strong arguments against the RSS approach as proposed by James.

1. As Matt Davis (the original force behind PubChase creation and recommendations) has written before, he wanted to build PubChase to move scientists away from relying on tables of contents (TOC) of high impact factor journals. What’s wrong with focusing on the TOC of the most popular journals? The problem is that the average PubChase user has articles from 43 different journals in her library, and 15% of our users have over 100 journals. In my small personal library, the 248 papers that I have are from 69 journals.  Counting the total articles from my top dozen journals captures only 144 (58%) of the papers in my library.

2. To supplement the TOC feeds, James recommends setting up a series of author, citation, and keyword-based RSS feeds. To me, this is an onerous setup process. I have to think of all the relevant keywords, but not too many or I will be overwhelmed with matches. Which keywords? Which authors? Why do I need to spend time setting this up, if an automated algorithm can capture my interests automatically? I have not gone through the steps in Jaime’s guide to actually set up the feeds, but it does not look like a 5-minute process. In contrast, to quickly set up personalized PubChase recommendations, I just need to import my citation library as a .bib file from Endnote, Refworks, Zotero, or any other references manager. This can be done in 1-2 minutes.

3. Jaime writes, “I find it very easy to stay on top of the literature. I think the “easy” may be due to how fast of a reader Jaime is. He writes that when he reads a paper it takes him 3-15 minutes per paper. Knowing James, I am sure that this estimate is correct. However, when I read a paper, it is usually 30-180 minutes. So, if I read 10 times slower than James, then the 20-minutes of daily TOC scanning would be 3 hours for me.

4. There is also a moral argument against the tables of contents of the top journals – doing so props up the reign of the impact factor. If articles become discoverable no matter where they are published, there will be much less reason to waste months and years on trying to publish in the journals with the highest impact factor. This would be a tremendous boon for reading, publishing, and in general, being a scientist.

Posted in Uncategorized | 1 Comment

RSS Guide by James Fraser

[Also published on the lab website of James Fraser]

Yesterday I got an email from a colleague:

“Jaime – should I use RSS or Twitter to stay tuned into what is being published in different journals?”

I was already primed to rant about this issue because I had participated in a follow up interview with a staff member at PLoS about their new structured post-publication review tool earlier in the day.  One of the ideas that emerged in the interview – an idea that got me a bit fired up – was that PLoS is motivated to create this tool to help researchers sort through the massive and overwhelming scientific literature. This is a widely held idea, but it rings completely false to me. This idea is even part of the motivation for a company started by some friends of mine (PubChase, which is part of ZappyLab).

I find it very easy to stay on top of the literature. I consider scanning and reading papers  to be one of my most important and enjoyable responsibilities as a scientist. It doesn’t take that much time (a few hours per week, most of which is while walking or waiting) and I almost never miss papers that I should have read.

First, you need to define a set of journals that have most of the papers you want to read. Therefore, the first level of finding most (but not all) of the relevant papers is using RSS to follow the big weekly journals, a few bimonthly journals, and about a dozen monthly journals.  Staying on top of the literature leverages on our current journal stratification system.  Despite its flaws, I know that I’m more likely to want to read a paper in Nature than a paper in PLoS1 (sorry, Mike!). Second, to bring your attention to papers in the journals that you otherwise wouldn’t read, you can use a series of RSS feeds for keyword, author, and citation-based searches.

Ultimately if PLoS1 is successful in absorbing all of Pubmed, sorting through the literature will become a bit harder. When this happens, it might necessitate the type of post-publication ranking systems that PLoS is trying to implement now.  That day is not here, yet.  So, while I am happy that PLoS is thinking about how we will struggle, in the future, to find and prioritize papers worth reading, I thought it would be helpful to outline my strategy for using RSS feeds for journals, authors, keywords, and citations.

I use an RSS reader on my iPhone to scan through the titles of papers almost anytime I am walking within our building.  I read abstracts on my phone anytime I am sitting down waiting for something to start. I read papers on my computer in the morning when I’m drinking coffee. I probably spend 10-20 minutes a day engaged in the literature, almost all of it while waiting or walking. How do I set this up?  Read on…

What is RSS?

RSS is a web syndication format that allows publishers to automatically update their audience of changes in content.  For the audience (us) this eliminates the need to check the website for updates – any new content is delivered via an RSS feed.  For scientific publishing, each journal generally has its own feed and each article is a separate item in the feed.  By directing all of these feeds to the same place, we can aggregate all the articles you would want to read. Google Reader was a very popular aggregator, but Google killed it.  I use Feedly now and find that it is pretty reliable and easy to use. It has a little more style than I would like, because I find the uneven formatting distracting.   I’ll detail more on how to use Feedly and my general workflow below.

Most newspaper sites, blogs, etc also have RSS feeds – but I keep my science and personal RSS feeds separate.

The RSS logo is this:

More information can be found on Wikipedia:

Why RSS instead of emailed TOC or Twitter?

You can use RSS feeds to automatically collect all the new articles in one place, read through the titles rapidly, save the titles that look interesting, and then read only a limited subset of those articles. Like receiving table of contents (TOC) emails, RSS feeds will notify you when a journal publishes a new issue and give you titles of all the articles.

There are two major advantages over email TOC: 1) in the RSS feed, articles are discretized (often with the abstract) – meaning you can quickly sort through the entire issue and save the relevant portions for later, 2) email is for back and forth communication – I am often overwhelmed by the amount of email that I get and have set up several filters to try to reserve my inbox to actual back and forth communication, leaving all non-urgent notifications to sort through at a later time. Separating the literature into RSS feeds (away from email) also creates a psychological divide between one of my most joyful activities as a scientist (reading the literature) from one of my most hated (email).

Why not Twitter? RSS feeds will give you the opportunity to (quickly) sort through every article. If you buy the premise that the literature is overwhelming, then Twitter might be good for you. I worry that the signal-to-noise on literature commenting/highlighting on Twitter will be very low – but I haven’t really used it much.

How to set up RSS feeds:

1) Create a feedly account

Feedly is a good RSS aggregator. Right now you will first have to add a feed.  Search for “Plos biology” and click “add to my feedly” to be prompted to create an account. Link the account to your google account.

2) Set up chrome to add RSS feeds properly.
Install the Google chrome extension:

After it is installed, alter the options for this extension in Chrome:

  • Window->Extensions
  • Find RSS Subscription Extension (by Google) and click Options
  • In “RSS Subscription options” click add
  • Enter:
  • Description: Feedly
  • URL:
  • Make sure this is the default selection, and click “Always use my default reader when subscribing to feeds.”

Now, anytime you are on a website with an RSS feed, the RSS logo will appear at the right of the location/search bar.  Clicking on the logo will take you to feedly to add the RSS feed to your list.

3) Add RSS feeds for journals.

This is a just  first step – don’t worry you will get a broader range of journals through keyword, author and citation based searches!

Now that the Chrome extension is set up, go the websites of your favourite journals – if there is an RSS logo in the location bar, add it.  If not, search for RSS in the text (this is especially bad for the Cell Press family).  I suggest making a quick list first of the journals you want to hit first.  Below is my list, followed by the url for the RSS feed so you can just copy and paste it into the “add content” bar in feedly.

The big/weekly/general interest journals (everyone cares):

biweekly/npg/baby plos/etc (most people probably care):

a few specific to my field (you and your colleagues care):

*note: for the Cell Press Journals, I use the ScienceDirect Feeds rather than the ones that go to the journal itself.  This is because Cell stopped updating their RSS feeds for a brief period earlier in 2013 and UCSF only has access through ScienceDirect anyway.

4) Add custom searches from pubmed

* For some journals, I really only care about a subset of the stuff that is published. For example, there is a lot of organic chemistry in JACS that I don’t care about, but I generally want  to read anything in JACS that has the word protein in it. Narrower searches could be useful for PLoS1, etc. It just requires a little sleuthing to figure out the Pubmed Abbreviation for the Journal. To set this up search described above, enter into the pubmed seach bar: “”J Am Chem Soc”[Journal] AND protein” and click on the RSS logo under the pubmed search bar (not on the location bar). Then, add the url for that search to feedly:

* Similarly, you can use Pubmed to set up RSS feeds to follow your favourite scientists and friends. Just search pubmed for their name “Fraser JS” for example and click on the RSS logo.  I have feeds for most of the people in my department, my collaborators, people whose work I really admire, etc.

* Finally, keywords in pubmed. I recommend trying out a few searches first. But anything that has less than 20 new papers published per week is pretty manageable.

5) Add citation-based searches.

This is a bit trickier and relies on ISI Web of Knowledge.  I have these set up for most of my papers and papers published by others that I care a lot about. Instructions are here:

 I wish google would implement this for google scholar – they have email alerts, but not RSS.

6) Isn’t there some redundancy?

Yes – but that presents more opportunities to actually read an important paper.

7) How long does this take to set-up?

~15 minutes. Plus, anytime you want to add a new search for a new author or new journal, you can just add the feed.

 How to move through RSS feeds efficiently:

Separate reading titles, sorting abstracts, and reading papers into distinct tasks at distinct times

Titles, while walking around:

I scan through the titles from RSS feeds very quickly. I have a good RSS reader on my phone (Byline, which I prefer to Feedly’s native app) that syncs with feedly. Whenever I am walking, I use byline to swipe through article titles and star the ones with interesting sounding abstracts. I generally sneak a peek at the author list too and save based on authors.

Similarly, when I’m on my computer, I scan through titles very quickly using keyboard shortcuts (j for next, k for previous) and save articles for later (keyboard shortcut “s”).

Abstracts, while sitting and waiting:

When I have small pockets of time (usually waiting for someone or sitting somewhere), I go into my saved folder (on the phone or computer) and read the abstracts.  If I have a lot of time (more than 5 minutes) and am at my computer, I will open the interesting ones in the background

 (install this extension: and then the keyboard shortcut is “;”) –

This keeps the focus on the abstracts and doesn’t require touching the mouse or trackpad.  I get into a rhythm of hitting “j” (then “;”) then “s” to advance (, open) and unstar an article.  If I’m on my phone or don’t have a lot of time, I just unstar the ones that no longer interest me.

Reading, while you have >5 minutes at your computer:

When I have time (generally first thing in the AM and on weekends) – I open up background tabs for all the articles have survived in my saved folder and start scanning through them.  I’m a fairly fast reader and have learned to get a lot out of scanning articles.  I don’t read many articles carefully, but I get a good idea of what is going on in the literature based on…

My 80-20 theory of the literature:

I scan through all the articles in my RSS feeds and save ~20% of the titles to read the abstracts (1-3 seconds per paper). This is done on my phone or on my computer.

Of the papers I read the abstracts for, I open the full article for ~20% (5-10 seconds per paper).  I read abstracts on my phone or my computer. If I’m on my computer, I will often go through the rest of the steps. If I’m on my phone, I simply unstar the ones I don’t want to read and come back to the full text when I’m at my computer.

Of the papers I open to scan through the figures and headings, I decide that ~20% of the articles are actually worth reading (15-45 seconds per paper).

Of the papers I read, I bail part way through most of them and read every word for about 20% of the papers (1-3 minutes when I bail, 3-15 minutes when I read the whole paper).

Of those, I study about 20% of those papers by reading them multiple times and looking through references, thinking critically about the figures/controls, etc (20 minutes-2 hours per paper – only 1-2 papers per week get to this stage). Occasionally I will print these papers on to paper and/or save the PDF to read later.

 Of the references I look up, I probably end up scanning 20%, reading 20% of that, etc. This allows me to engage with past literature and pick up on important papers I may have missed otherwise.

This strategy keeps me aware of what is being published and gives me a partial knowledge of what many of the papers in look like, what kind of conclusions people are drawing, what techniques are emerging, etc.  I think it helps me develop a broad knowledge of what is going on in the literature and allows me to focus on a small subset of papers to read deeply.

 Is the 80/20 number totally made up? Yes. Many weeks it might end up as 95/5 in some steps and 50/50 in others.  It’s just a guideline that makes me very comfortable in abandoning papers at any point in the process.

Why I don’t use Mendeley or Papers or anything else.

I’ve never been the type to take notes on physical paper or even on PDFs.  For this purpose, I imagine these applications are quite useful. I also think the amount of time people spend categorizing and organizing papers is better spent reading titles, abstracts, and papers. With pubmed searches and a good memory for author names, you are never more than 5 seconds from finding the paper anyway. I like the idea of Pubchase using this information to prioritize a feed of articles – but wish they would embrace RSS feeds as a way to stay on top of the literature.

 Ok – I actually do use Papers.

-but only for citations (it is much better than endnote).  I don’t pay attention to my Library at all.  I just search in a browser window for what I want, copy the pubmed ID and then import based on that into my library.  Then I generally use PMID for finding and inserting the references.

Posted in Uncategorized | 1 Comment

Lots of Open Access publishers, but how many readers?

When I started graduate school in 2003, PLOS Biology was just about to launch. There was no PLOS One and the fraction of publications in open access (OA) journals was miniscule. It is remarkable what the progress has been – a few days ago, a study by Science-Metrix for the European Commission indicated that over 50% of all 2011 papers are open access, calling this moment a “tipping point” towards open access (though I can’t wait for the moment when 100% of research is open, the progress is truly astounding). However, the real measure of success for the open movement is not just a count of freely available papers from a year ago. The real question is – what fraction of papers that a researcher needs today are openly available?

Regulatory requirements force many non-OA journals to make articles freely available after 6 or 12 months. But no scientist is going to wait a year to read a key discovery in her field, so libraries must continue to pay subscriptions. Also, when I mention how successful the OA movement has been in transforming the publishing landscape, I often hear in reply, “PLoS One may publish a lot, but who reads those articles? How many of the important papers are open access?”

Valid question, and one that we can actually answer with the data in PubChase. Our users save articles to their libraries, and that lets us ask what fraction of the saved (and by implication important/read) papers are indeed OA. We do not consider delayed OA where articles become freely available months after publication to be actually open. Therefore, in this analysis, we only included as OA journals that make all of the research articles instantly available upon publication1.

The answer for PubChase users is that 22% of the 2013 articles saved to libraries are open access.2 We can also plot how this fraction has grown over the last decade3.



Perhaps not a tipping point yet, but a monumental achievement nevertheless.

As in our previous post, caveats apply. PubChase users may not be representative of biologists as a whole. For the current year, the number is still in flux because the year is not over. Also, due to hybrid journals that publish a mix of OA and non-OA articles, our approach underestimates instantly-OA publications.

There are three conclusions we can draw from this analysis. First, to OA advocates – there is still a lot of work to do. Second, to the critics dismissing the importance and relevance of open access journals – you are wrong – the articles are important and read widely. Third,  if you are a subscription journal clinging on to the subscription model – now is the time to prepare for the imminent 100% open access future.

(Post by Lenny Teytelman. Analysis by Matt Davis, Alexei Stoliartchouk, and Lenny Teytelman.)

P.S. If you are curious about PLOS One articles specifically, yes they are widely read. In total, a fifth of all the OA articles in the libraries of our users are from PLOS One.

  1. Specifically, we count as OA journals from the PubMed Central list that have “Immediate” or “0 months” access type, and whose participation level is “Full” []
  2. Articles do not have to be unique; if 100 users save the same article to their libraries, the count for that journal is increased by 100 []
  3. The PubChase by-year journal table used for this plot is available here []
Posted in PubChase | Leave a comment

What makes a journal important?

The scientific community values citations, and with good reason. But, in other publishing arenas, the evaluation is based on how often the publication is read, not how often it is cited. This is true in traditional publishing with best-seller lists, and also in modern web publishing with metrics for visits and click-throughs. So, I was curious what relationship exists between how often works from a journal are cited, and how often articles from those journals are read by our users. Read on to find out:

A few years ago, I wrote a set of python scripts to improve on the services offered by PubCrawler and MyNCBI. Using some enrichment statistics and the Entrez API, I began emailing myself article alerts once a week without the false positive results inherent when searching PubMed for an author name or keyword. And when I first considered expanding my little set of python scripts into what is now PubChase, I began collecting a little polling data. At the department social hour, or at dinner with a seminar speaker, I asked how people found new literature to read. Answers involved RSS feeds and various heuristics for PubMed searches, journal club discussions, and dedicated blog reading. I suspect these days it would involve a fair amount of Twitter-gazing. But the most common answer in these informal surveys was to simply read the table of contents of a few journals each week. To me, the obvious follow-up question was how one should select which few journals of the thousands available to consider. In other words: what makes a journal important?

Now, it’s no mystery which three journals were invariantly listed in the response. After all, there is a pervasive (if dubious) belief that the awards of fellowships, faculty positions, and tenure are all predicated on “SNC” publications1. And in fairness, a lot of folks told me that they also took time to look in one or two other journals, typically specific to their research interests or a “second tier” journal. But the lesson was that a frightening proportion of scientists were limiting themselves to Science, Nature, and Cell, maybe with a smattering of Genetics, Neuron, NSMB, etc.

And it’s also no mystery why these are the journals exalted to the lofty status of “worth-one’s-time-to-read-stuff-in-them.” Articles in these journals are, on average,2 cited more often, and accordingly their Impact Factor is high. Citation count and Impact Factor are really easy statistics to understand: journals with high ones are the “winners.” But, if Impact Factor is the all-mighty reflection of journal quality,3 then why is it that Science, Nature, and Cell are not the three top journals by this metric?

Any good graduate student knows the answer to this question too: the other journals with high impact factors can be dismissed for targeting a clinical audience, for being a “review” journal, or for being a physics journal that got lost in the heavily biology-biased list of “science” journals. Because clinicians don’t win Nobel prizes too much these days. And reviews are only c.v. fodder with no value to the scientific community. And physics has too much math. So, after these dismissals, Nature is indeed the top-ranked journal, with Cell and Science close behind (after stumbling through crags of Nature specialty journals). Voila. Justification attained. We can all resume a frenetic SNC-or-Bust pursuit of scientific external validation.

But, before we do, I thought I’d share with you a few observations from the data comprised by our PubChase user base and the articles they chose to read.

The PubChase algorithm uses a “Bayesian-ish” model to predict which articles will be of interest before they accrue citations. This prognostication is required if you, like me, don’t want to wait around for a year or two to find out what to read. And it considers, basically, things that make sense to consider. These considerations include an estimate of the value of each journal, but unlike Impact Factor, which is based on citations, our model is based on who is reading the articles in a particular journal. And when I take a look at our internal rankings of journals in the model, a couple of things really stand out: 1) Science, Nature, and Cell are not the top 3 journals, and 2) maybe we should hit the breaks on dismissing all those reviews.

Here are, as of today, the top 20 journals according to our model:

1 Nature reviews. Genetics
2 Genetics
3 Genome biology
4 Genome research
5 PLoS biology
6 Genes & development
7 Cell
8 Molecular biology and evolution
9 Nature methods
10 Nature reviews. Molecular cell biology
11 Trends in genetics : TIG
12 Developmental biology
13 Journal of molecular biology
14 Science (New York, N.Y.)
15 Yeast (Chichester, England)
16 Development (Cambridge, England)
17 Current opinion in genetics & development
18 Molecular cell
19 Nature
20 Molecular systems biology

So, first things first: Bravo Cell! You made the top 10, unlike your SNC brethren.

Now, I would like to express this caveat: it’s clear to me from many observations that our users are enriched for the more quantitative sub-disciplines of biology, most notably genomics and population genetics. So, this likely explains the heavy representation of journals starting with the letters “G-E-N” in the list.

Interestingly, PLoS Biology takes one of the top spots, which agrees with an undercurrent of my informal polling suggesting that PLoS was on the cusp of joining the vaunted SNC trinity4.

You can also see that developmental genetics is alive and well as a field of research5.

And it’s also quite clear that those c.v. cluttering review articles might actually have some value. I asked how many of the over 22,000,000 articles available in PubMed are from these journals. The answer is 0.92%.6 Then, I asked what percentage of articles read by our users were from these journals. The answer is 4.18%. If you want to know if our users are reading reviews more often than they would be by chance, then you could compare the last two percentages7. And with this many events, it’s not surprising that a difference of roughly 1% and 4% is super-significant.8 So, it turns out that while major advances in science are not published first in review form, a lot of the articles read are reviews. This gets to a bigger question perhaps best saved for conversational pub fodder regarding exactly to whom rewards are given for what in academia, what the roles of discovery versus communication should be, and so forth. But to me at least, this is a clear rejection of the idea writing a review is a valueless endeavor. Lots of people are reading reviews.

So, one last thought: I compared our rankings to journal Impact Factors for last year. I expected some relationship between citation number, which principally drives Impact Factor, and the PubChase score, but I was not sure how strong this relationship would be. In essence, I’m asking what is the relationship between citation events and reading events. Again a caveat: keep in mind this is just a scatter plot of 20 data points, selected from the tail of a distribution. The correlation coefficient for this plot is about 0.2.910 It could be that the PubChase user base is not representative of the biological sciences as a whole. Or, it could be that the relationship between what is read and what is cited is not particularly strong. Or most likely, these data are structured with respect to multiple factors. Nonetheless, the relationship between readership and Impact Factor is mild at best.

Feedback and commentary are certainly welcome.

  1. I was a little horrified, too, to see that some very accomplished researchers actually split out these publications on their lab websites (e.g. this one). It makes me wonder, as a reader, what such authors think of the articles they have published in other journals. []
  2. I’m actually quite mystified why Impact Factor, and basically all other metrics of journal quality continue to use only the mean citations as the only estimator of quality. It seems obvious to me that each journal has a true- and false-positive rate for generating truly high-impact work, and one could, given access to the sort of data that Google Scholar and ISI has amassed, easily explore the other distribution characteristics of citations per article for each journal. It seems intuitively true that the citation counts of articles in any journal are distributed exponentially, thus assuming that the majority of articles are as good as the article of mean quality in a given journal is super duper wrong. I suspect it’s true that in journals of all tiers, there are a minority of good articles that are allowing the remainder of the articles to hitchhike on the backs of the reputations earned by the minority of articles with good citation numbers. []
  3. Even ISI doesn’t think this is true, by the way. In this article from 1994, they explicitly say as much: “Thomson Reuters does not depend on the impact factor alone in assessing the usefulness of a journal, and neither should anyone else.” []
  4. <head spins while contemplating the nested acronym possibilities> []
  5. Though I’ve heard it suggested recently that reading G&D was tres passé. []
  6. I took a quick proxy for how many journals were dedicated to reviews just by asking what percentage contained the word “review” in their title I know this isn’t perfect, but it’s almost certainly an underestimate. The answer is 4.55%. []
  7. Or, rather, their proportions. []
  8. a Fisher’s Exact p-value on these proportions is very closely approximated by the number zero []
  9. I’d be happy to recreate this graph for thousands of journals, if I could get my hands on the full list of Impact Factors, or better yet, Google Scholar citation counts []
  10. Which I tried to do with some foul-minded html scraping, but quickly learned that Google isn’t keen on me scraping Scholar results. []
Posted in PubChase | 10 Comments

PubChase is now a publisher!

No, do not worry, we are not adding one more drop to the bottomless journal sea (though if you use PubChase to find relevant literature, the number of journals should not matter much). Instead, we are inaugurating a weekly author’s essay series with a personal story from Meleah Hickman on her recent ground-breaking discovery of the haploid C. albicans.

We nicknamed it an “anti-journal club”. This is born out of the common frustration with journal clubs, which typically become paper-shredding exercises that focus almost exclusively on ripping the work apart. These meetings may be useful, but they are demoralizing. You come out of a journal club feeling that you once again wasted an hour of your life on a crappy paper.

So, instead of a journal club where the presenter eviscerates someone else’s work, we are doing the opposite – we are inviting the authors to tell us what it took to make this publication happen and why it is special to them.

You spend what feels like eternity on the research, in the end carefully packaging the years of your life into a few published pages. The publication is, as it should be – results and conclusions. But what about the journey to the publication? Did you anticipate the opposite results? Was this an accidental discovery? What was the most daring and risky experiment here? What personal drama lies behind this neat and formal publication?

We hope you will find our anti-journal club inspiring and exciting.

Posted in PubChase | Leave a comment

Organize And Access Your Research Literature

You can now store for free yor article PDFs, for easy access from anywhere through your PubChase library. You can save up to 300 articles at no cost, with a “pay what you wish” subscription if you need more space. Not only can you store the PDFs, but with the library search and the newly-implemented tag organization, finding an article of interest should be instantaneous.

 We have also listened carefully to the feedback from our users and made approximately 287 improvements to the website over the last month. So, test it, use it, enjoy, and e-mail us with comments and suggestions.

Posted in PubChase | Leave a comment