I assumed somewhat rashly that data in these files would be nicely linear as well. A tip from a colleague and some googling came up with the following:
The Amersham Biosciences' 16-bit GEL file data are stored in square root scale rather than a linear scale to preserve and provide more resolution at the low-end without sacrificing the accuracy of the high-end data.Moreover:
Only the software packages that support the Amersham Biosciences' GEL file format can be used for quantitative data analysis (see the list of the software products listed in Chapter 4).(archived discussion, citing a GE manual)
Right. We'll see about that, then. Let's have a look at those private tags - luckily, the specs are online. We're interested in the first two private tags:
- MD_FILETAG (Tag 33445 or 'TAG0x82a5'): sets the data format. Values are 128 for linear and 2 for square root scale.
- MD_SCALEPIXEL (Tag 33446 or 'TAG0x82a6'): this is a constant scaling prefactor. It is stored as a long integer (numerator, denominator) tuple.
All wrapped up in Python (needs PyLibTiff and puts the file data into a numpy array):
from libtiff import TIFF, TIFFfile filename='somefile.gel' tif=TIFF.open(filename) tiffile=TIFFfile(filename) filetag=tiffile.IFD.entries_dict['TAG0x82a5'].value scalepixel=tiffile.IFD.entries_dict['TAG0x82a6'].value scale=float(scalepixel)/float(scalepixel) data=tif.read_image().astype(float) if filetag == 2: print "square root scaling, scale %.05g"%scale data = scale*data**2 elif filetag == 128: print "linear scaling, scale %.05g"%scale data = scale*data else: print "Warning, no scaling detected, scale %.05g"%scale data = scale*data