Saturday 11 August 2007

Move over FORTRAN: Python makes inroads into computational chemistry

Old-school dyed-in-the-wool computational chemists like to use FORTRAN and C a lot, argue about which is better, and create programs whose names are written all in CAPITALS and will only compile on an SGI-IRIX. Although you gotta love them for trying, it's good to see that times are a-changing.

Looking at "Early View" for Wiley's Journal of Computational Chemistry, you will see two "Software News and Updates": pyVib and pyFrag. Yes, you've guessed it - they're both Python programs. And if that wasn't enough, I might as well announce here that a paper on the Python library cclib has just been accepted by the same journal.

It's likely that at least some of this interest in Python for computational chemistry is due to PyQuante, developed by Rick Muller. Who'd have believed that a scripting language could be used to carry out quantum chemical calculations? Of course, what people don't realise is that Python has an extension library for efficient numerical computation. Or maybe they're starting to realise it...

Librarians go Web 2.0

Yes, librarians are doing it too. To begin with, my Greasemonkey userscript for adding bloggers' quotes to journal pages has just gotten an enthusiastic write up by Mark Rabnett, a hospital librarian and blogger.

He learned of this userscript by reading a recent paper in ACS Chemical Biology by D.P. Martinsen, "Scholarly Communication 2.0: Evolution or Design?". This was news to me, so I checked it up. It turns out that it's pretty much a review of the Spring ACS sessions on Web 2.0. He begins by giving a good description of what the term Web 2.0 means, and why scientists should know about it. Then he goes on to discuss the presentations by Nick Day, Henzy Rzepa and Colin Batchelor among others (these are just the people I know or know of).

Then we come to the good bit. At the end of page 370 it says:
Two additional items, unrelated to the ACS meeting, are significant. Using Greasemonkey, a Firefox extension that allows anyone to write scripts that can change the way a web page looks, the Blue Obelisk group, a community of chemists who develop open source applications and databases in chemistry [ref to BO paper], has created several such scripts to enable chemistry-related features. One of these tools will insert links to blog stories about journal articles into the tables of contents of any ACS, RSC, Wiley, or NPG journal [ref to old BO wiki]. This enhancement to a journal’s table of contents is completely independent of the journal publisher.
That's a pretty lucid summing up of the userscript and its significance. Somewhere I suspect PMR's hand in this. :-)

Friday 10 August 2007

Access embedded molecular information in images

Recently Rich Apodaca has been discussing (here, here and here) embedding molecular information in images of molecules, such as a PNG file depicting a 2D structure.

I'm going to show how to extract this type of embedded metadata using Python.

First of all, you'll need an image to work with. Grab the PNG file, rosiglitazone.png, from Rich's post.

Next, you'll need the Python Imaging Library (PIL), a 3rd-party Python extension library available from Pythonware.

Here's the text of an interactive Python session showing how to access the image metadata:

C:\>python
ActivePython 2.4.1 Build 247 (ActiveState Corp.) based on
Python 2.4.1 (#65, Jun 20 2005, 17:01:55) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more inf
ormation.
>>> import Image
>>> myimage = Image.open("rosiglitazone.png")
>>> dir(myimage)
['_Image__transformer', '_PngImageFile__idat', '__doc__', '__
init__', '__module__', '_copy', '_dump', '_expand', '_makesel
...
im', 'getpalette', 'getpixel', 'getprojection', 'histogram',
'im', 'info', 'load', 'load_end', 'load_prepare', 'load_read'
, 'mode', 'offset', 'palette', 'paste', 'png', 'point', 'puta
lpha', 'putdata', 'putpalette', 'putpixel', 'quantize', 'read
...
transform', 'transpose', 'verify']
>>> myimage.info
{'molfile': 'name\nparams\ncomments\n 25 27 0 0 0 0 0 0
0 0 0 V2000\n 1.6910 -6.1636 0.0000 C 0 0 0
0 0 0 0 0 0 0 0 0\n 2.5571 -6.6636 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0\n 3.4231 -6.1636
0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0\n 3.4231 -
5.1636 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0\n
2.5571 -4.6636 0.0000 C 0 0 0 0 0 0 0 0 0 0
n 7 8 1 0 0 0 0\n 8 9 1 0 0 0 0\n 7 10 1 0
...
...
0 0 0\n 9 11 1 0 0 0 0\n 11 12 1 0 0 0 0\n 12 13
2 0 0 0 0\n 13 14 1 0 0 0 0\n 14 15 2 0 0 0 0
\n 15 16 1 0 0 0 0\n 16 17 2 0 0 0 0\n 17 12 1 0
0 0 0\n 15 18 1 0 0 0 0\n 18 19 1 0 0 0 0\n 19 2
0 1 0 0 0 0\n 20 21 1 0 0 0 0\n 21 22 1 0 0 0
0\n 22 23 1 0 0 0 0\n 23 19 1 0 0 0 0\n 22 24 2 0
0 0 0\n 20 25 2 0 0 0 0\nM END', 'aspect': (1, 1)}
>>> moldata = myimage.info['molfile']


Cool. So now we could write this information to a file (print >> open("myoutputfile.mol"), moldata), or convert it into an OpenBabel molecule and calculate some properties:


>>> import pybel
>>> mymol = pybel.readstring("MOL", moldata)
>>> print mymol.molwt
357.42676