Friday, March 27, 2009

Biologists Don't Know Shit About Computer Programs

I am watching ESPN, chatting with March Hare, petting Dormouse, blogging, and running a program to analyze sets of data all at the same time. How's that for multi-tasking?!

As has happened often recently, I am amazed tonight by what computers and software can do for biologists. Seriously, how else would I be able to align two DNA sequences, each of which is over 120,000 bases long, with each other without going completely blind?

Given the trend toward doing experiments that generate ginormous datasets--microarrays, mass sequencing, etc.--it seems a shame to me that so many biologists (including myself) don't really know much about the programs we use to analyze these data. We know just enough to get the programs to do what we need them to do, but not enough to understand how the algorithms work or how to tweak the parameters to generate better analyses. In this age of increasing reliance on computational biology, perhaps biology graduate programs ought to require some kind of remedial bioinformatics course, no?

This point was driven home earlier today by a paragraph in a paper I was reading. The paper was about a comparison of the efficacy and sensitivity of several different computer programs for detecting repeat elements either in sequencing reads or in assembled contigs. In the discussion section, the authors addressed the fact that they had compared the various programs using the default parameters instead of trying to optimize the settings:

Of note, each tool was evaluated using its default parameters. We chose not to conduct tool optimization because, in our experience, it is very common for biologists to operate bioinformatics tools using default parameters. Some likely reasons why optimization is often avoided are as follows:

  1. Many biologists have little or no understanding of the algorithms and programming behind computational tools and thus do not feel comfortable changing program parameters.

  2. Public domain tools rarely come with documentation that can be easily understood by those lacking experience in computational biology.

  3. Because software developers cannot anticipate every dataset and/or application on which their tool may be used, they often provide only vague suggestions as to how optimization might be conducted.

  4. The act of optimizing a tool for a particular dataset or application can be very difficult and time consuming.

  5. Program default settings often become ‘standards’ to which researchers adhere so that they can directly compare their results with those of researchers who have used the same program in default mode in the past.

  6. What constitutes an ‘optimal result’ differs from user to user prompting some scientists to use the default parameters as a way to limit introduction of their own biases into results/conclusions.

So...numbers 1, 2, 5 and 6 basically translate to (1) biologists don't know shit about computer programs and have absolutely no clue how changing various parameters will alter the result, (2) we could try reading the manual but we wouldn't understand it anyway, (3) um...we'll just do what that other guy did cuz we don't know any better, and (4) if we knew how to tweak the program, we could make it give us any result we wanted...just like with PhotoShop!

I might consider being offended on behalf of biologists if all of these points weren't so true. I mean, I did just spend almost an entire afternoon annotating a genome by copying and pasting each ORF one-by-one into the query box at Blast, right? Sigh.

So instead, I laughed my ass off at seeing our shortcomings in this area so...um...tactfully described in an actual journal paper! Kudos to the authors for having the guts to be so candid. And the program they recommended for finding repeats had better damn well be idiot-proof and work like a charm or I'm going to need some serious help.

11 comments:

ScienceGirl said...

I wonder why the situation is so different in physics, where knowledge of computer science has become the norm. I am not saying that a physicist will know as much CS as a computer scientist (Ha! Job security ;), but they will know a lot, and will consult a CS person when they need to (or on a regular basis). Is this the case because physics has moved towards computation multiple decades earlier than biology?

Gibbiex said...

I'm in the proteomics field. We don't need to know much about the parameters, but when we do we consult our chemistry colleagues. Frankly the analysis of MS data really requires alot of experience and training, not something you can easily pick as a biologist. Still and all, we have a pretty good grasp on what we are seeing. This is critical since there is so much 'noise' it's hard to find the real data unless you know what you are looking for.

Anyway for bioinformatics tools, there are seminars and courses you can take to learn about all the parameters. Mascot comes to mind, as does Blast.

Unbalanced Reaction said...

All I can say to that is: oh, snap.

Mad Hatter said...

ScienceGirl--That's a good question. It's probably partly because physicists moved toward computation earlier. It may also be because physics experiments require computation more than biology experiments. I mean, it is entirely possible to get a PhD in biology without ever using anything much more complicated than the Microsoft Office suite of programs, depending on one's subfield.

Gibbiex--You're absolutely right about MS data. We do have core facilities at my institution that can help us with the analysis, but sometimes even communicating effectively can be a real challenge when the biologists don't understand the computational process and the computer people don't understand the biological experiment. Courses are a good idea, though. I should look into them.

UR--Yeah, right? I thought it was hilarious how delicately they phrased some rather uncomplimentary observations!

microbiologist xx said...

My husband is a software developer and finds this very amusing. Unfortunately he doesn't know shit about science, so when I try to get him to help me it is trying for both of us. All we can agree on is that we are both speaking English.

Mad Hatter said...

MXX--I know exactly what you mean. March Hare is in the computer industry too. Funny, I know a lot of female scientists whose partners are IT/comp sci people. Science nerds and computer geeks must make a good combination, huh? :-)

Thomas Joseph said...

Not to long ago I realized I needed to go a step above and beyond what I normally would with DNA sequencing, and I need to start looking at environmental variables to see if they correlate with changes in the bacterial populations. At any rate, I broke down and bought PC-ORD. For an additional $35 you can buy the TEXT BOOK which explains why you do what you do with the software.

Yah ... the TEXT BOOK. And for kicks, they offer week long seminars you can get training for ... for the SOFTWARE.

*sigh*

Mad Hatter said...

Tom--I guess it's nice they offer week-long seminars on the software, but really, who has time to take a week off for that??? By the way, your comment about changes in bacterial populations made me wonder: are you working on all that cool microbiome stuff with the 16S sequencing?

Thomas Joseph said...

Mad Hatter,

I do a lot of 16S rDNA gene sequencing here at work (along with RISA, T-RFLP, and other means to determine bacterial density and diversity). I'm in agriculture, so I'm not a part of the most popular microbiome project (the human one) but we're in the process of digging up funds to a large scale metagenomic/microbiome soil project.

Thomas Joseph said...

Oh, and no ... I don't have a week to take off for that. I did manage to free up a week so I could take a confocal microscopy course this summer though, and I am heading out to ASM a day early to take a DNA data processing/management workshop they're offering.

I try to fit it in where I can. Of course, as a gov't employee, I'm required to take such training each year where I can as part of my IDP (Independent Development Program). Usually it's just reading a book or two on various subjects (scientific, management, economics) but every so often I can wrangle a real live course to take.

Mad Hatter said...

Tom--That workshop at ASM sounds pretty interesting. It's nice that you get to do this as part of IDP. It always seems as if in academia, reading books or going to training sessions is something you're just supposed to do on your own time.

Post a Comment