Monday 13 September 2010

How to get into cheminformatics

A chemistry undergraduate from South America recently emailed me asking about how to get into cheminformatics:
My area is chemistry and I'm very interested about cheminformatics. Actualy, I'm using Python to develop a software to make some analysis (image analysis applied to chemistry). Here in ----, the college course of chemistry don't have disciplines of informatics related.

Because of this, I got some questions, if you can answer to me, I'll be very grateful:

Have you done chemistry college or some informatics college related?
If you have done the chemistry college, how you started to work with computation applied to chemistry?
Here, in ----, actualy I think that the cheminformatics is not very known, even in the scientific field. What about in other countrys? The most of people that are working with cheminformatics have done chemistry colleges or some computation college related?


I answered as follows:
My own background is a degree in Chemistry, followed by a PhD in Inorganic Computational Chemistry (DFT calculations). In the field most people have chemistry degrees, although there are also a few computer scientists. The types of problems the two work on are often different; the computer scientists may be more interested in developing methods, while the chemists may be more interested in applying and interpreting the results. I think that most chemists would not do any informatics or programming during their degree - they would just teach themselves at the start of their PhD - it sounds like you have already done this.

When I was 12 or 13, I started programming in BASIC on my home computer and got involved in programming competition for high school students. I didn't have a computer while in university, but during my PhD I started programming again, this time in Python. More recently, I've learnt C++ by working on Open Babel.

If you want to gain expertise in the field, I would very much encourage you to get involved in an open source cheminformatics project. You will learn a lot about programming, organising large projects, testing, how to work with other people, and so on. If you're interested in image analysis you could look at OSRA, etc. You may also want to subscribe to the blueobelisk mailing list or ask a question at blueobelisk.shapado.com.

Cheminformatics is not a very well known field - I didn't know what it was until I started doing it, even though I had done computational chemistry during my PhD. The main countries associated with cheminformatics are the UK, US and Germany, it seems to me; these are the countries where a lot of the pharmaceutical companies do drug design. But you can do cheminformatics anywhere - you just need a computer.


What advice would you give? I'm especially keen to hear from cheminformaticians from South America. (I'll point the student to this blog post)

Image credit: Duncan Hull

13 comments:

Rajarshi said...

While programming is a core cheminformatics skill, I'd also suggest that it's useful to get familiar with chemical applications - if possible browse JCIM, JMGM, JCAMD. And I definitely second getting involved in OSS cheminformatics projects.

Noel O'Boyle said...

See also comments on Friendfeed.

Anonymous said...

Hi,
I am currently doing a degree in Computer Science and Engineering and I do not have a degree in Chemistry. But I am interested in Cheminformatics and want to get into this field. Do you think that I would be able to do it with just my Computer Science background or will I need to follow another degree in Chemistry?
Thanks in advance!

Noel O'Boyle said...

@Anonymous: There are many computer scientists in the field particularly in the area of methods development or scientific software development. Of course, the more chemistry you know the better; even a familiarity with interpreting 2D chemical diagrams, and common functional groups in organic chemistry, will help a lot.

medical shiva said...

I am an organic chemistry post graduate from Germany. I am very much interested in cheminformatics. Do I need to pursue any certification course for build a carrer in this field.? I have strong theoretical knowledge on this science. Kindly provide me support

Anonymous said...

hi im sa freshman

Unknown said...

I think cheminformatics is a new platform with brighter future. One of the advantages is that no worry on the ethical issues when doing experiments as well as getting involved in false positive or false negative results. Once proven computational results, we may proceed to prove in wet lab. This would be safer in terms of risks and cost

Anonymous said...

Hi! I am a third-year undergrad student studying chemistry with a biology minor. I have been thinking to become a pharmacologist, but the time goes by I discovered that I am more interested in computational chemistry. And I have been doing undergrad research since freshman year in computational chemistry which kindly made me fall in love with computational chemistry than a wet lab. And I figured cheminformatics can help me to work on what I love designing/discovering drugs without becoming a pharmacologist and still apply computer and chemistry knowledge. I am trying to figure out what to study in Graduate school now that could lead me to become a cheminformatics. Do you have any suggestions on what would be best to do a Ph.D. on?

Nils Weskamp said...

First of all: congratulations on this great career choice! There are plenty of interesting topics in this field, but the "hottest" ones seem to be around applying machine learning / deep learning to problems in chemistry. You should find plenty of articles on predicting molecular properties or generating structures with a given target profile in journals. If I had the chance to start a new PhD, I would probably look at predicting chemical reactions (regioselectivity, optimal conditions, yields) and how to apply this to retrosynthesis. There is a shortage of data on this topic in the public domain, but you might be able to generate your own training data by using classical computational chemistry approaches (e.g. DFT) to calculate what you need.

Pat Walters said...

I think Drew Conway's Data Science Venn Diagram provides solid guidance for a number of fields, including Cheminformatics. According to Conway, one needs three skills.

- Domain knowledge - In Cheminformatics, we work with chemical and biological data. In order to determine whether our models are meaningful, we need to understand the underlying science. For those coming from computer science, it's useful to gain some background in organic chemistry and molecular biology.

- Hacking skills - In order to do Cheminformatics, you need to know how to code. One way to get better at coding is to study and modify code from blog posts or papers. The Journal of Cheminformatics and The Journal of Chemical Information and Modeling have lots of papers with code.

- A knowledge of math and statistics. A lot of modern Cheminformatics involves building machine learning models. A background in statistics enables you to be able to appropriately validate and compare models.

I've put together a set of Jupyter notebooks that I think could provide a good starting point for someone learning Cheminformatics. Finally, ask a lot of questions, there are many enthusiastic Cheminformaticians out there who would be happy to help.

Antonio de la Vega de Leon said...

I studied Biology for my undergraduate and it was not until my MSc in Bonn (Life Science Informatics) that I learnt about chemoinformatics. There are very few places that have formal education on chemoinformatics, the University of Bonn is one of the few I know. Another great resource are the summer schools in Strassbourg organized by Prof. Varnek. There used to be a MSc in the UK but it no longer runs.

But as many people have said, it is not needed and many start when they do a PhD that is related to the topic. Knowing Python already is a great start, if you haven't already learn how to use RDKit, it is a great chemoinformatics toolkit. I also second @Nils comment on machine learning, it is a great technique to know about in the current job market. If you are interested in a research career, doing a PhD is almost necessary whether you want to go to academia or industry.

Probably someone who would be great for you to get in touch with would be Prof. Jose Medina-Franco, he does chemoinformatics at the University of Mexico. He would probably know more about the chemoinformatics scene in Latin America. He is in twitter under https://twitter.com/difacquim.

Chris said...

RSC CICAG have an expanding selection of YouTube videos describing Open-Source tools and resources that would be worth watching https://www.youtube.com/c/RSCCICAG

Noel O'Boyle said...

Note also that computational chemistry is a broader topic than cheminformatics and is closer to the drug discovery process in some ways, in that it involves directly using protein structural information to help guide a medicinal chemistry team. This includes homology modelling, protein-ligand docking and molecular dynamics among other techniques. If this is of interest, you should reach out to an existing group that is publishing in these areas and chat about this.