Friday, 15 June 2012

Plotting spectral maps or spectrograms in Gnuplot

In chemistry, optics, laser physics, and so on, there is often a need to present spectra at a range of different conditions... for example, what does the emission from a laser look like at a range of different drive voltages? Or what does the optical absorption of a protein look like at different temperatures?

Spectra, waterfall plots and spectrograms

Four distinct lines on a graph, each representing the spectrum at a different voltage. Each line is offset vertically so that they do not overlap.Often, people present a plot that looks something like the one on the right. Here, I measured the emission from a laser, using four different drive voltages. The spectra are a bit different each time, but they are all centred around the same frequency (3.31 THz). I've plotted them all on the same axes by normalising each spectrum (i.e., setting the peak amplitude of each spectrum to 1 unit), and then showing each spectrum offset against the others.

I think most people are perfectly happy with reading data from figures like this one, but there are a couple of issues that I've never been entirely comfortable with:

  1. The vertical scale on the graph is fairly meaningless: the spacing between the 14 V and 16.4 V lines is identical to the spacing between the 16.4 V and 18 V lines, even though the actual step in voltage is bigger in the first case.
  2. The lines in the graph don't intrinsically tell you the full story... for example, you need to look at the legend to understand that "Red line = 16.4 V". Without the legend, the graph would be useless. With the legend, you have more clutter in the image, and an extra mental step to complete before understanding the data.
  3. It's sometimes a bit tricky to see trends in the data. I don't know much about the psychology of this... I guess it's something to do with reading each curve separately and then mentally "post-processing" it to spot a pattern.
One solution is to use a "waterfall plot" or "fence plot" (see MATLAB documentation). This represents the data in three-dimensions, with the curves being stacked in front of each other at a position corresponding to the voltage/temperature/whatever. Pretty useful, but again there are some problems: principally, the figures contain lots of lines, making it quite difficult to represent the data neatly in a journal article where very small figure sizes are needed. Secondly, it's a bit tricky to read the data from a 3D plot: figuring out exactly what the frequency of a peak, and the voltage used isn't easy!

So, I prefer using a kind of spectrogram to represent this sort of data, such as the image to the right. Here, the voltage is shown across the horizontal axis (in a normal spectrogram it would be time), and the frequency of the laser is shown vertically. The brightness of each region of the spectral map shows how intense the emission is at a given frequency, when the laser is driven at the specified voltage.

I'm not advocating this sort of visualisation as being intrinsically better than others, but (a) it's more colourful and can liven up a presentation a bit and (b) personally, I find it quicker to read the data and spot trends. On the down side, it is important to note that the horizontal scale only actually represents four voltage readings. It's easy to make wrong assumptions about the behaviour at other voltages because the colours are painted continuously across the chart.  For example, the chart indicates that the brightest emission line at 15 V will be somewhere just below 3.3 THz. In reality, it could be entirely different; the graph is just filling in the gap in our knowledge with the data we acquired at 14 V.

How to plot spectrograms in Gnuplot

Right, enough waffle. How did I actually generate the image above using Gnuplot?

First, I arranged my data in a file called "spectra-vs-voltage.dat" in the following form:

14 4.49931 0.85137
14 4.49819 0.82508
14 4.49708 0.78786
14 4.49597 0.73671
14 4.49485 0.67282

16.4 4.49931 0.75230
16.4 4.49819 0.84287
16.4 4.49708 0.79419
16.4 4.49597 0.65894
16.4 4.49485 0.48719

So, it's in three columns, containing
  1. Bias voltage (or temperature, or time, or whatever you want on the horizontal scale)
  2. Frequency
  3. Spectral intensity
Note that I have placed a line-break between the data set for each voltage. This is important! Note also that I have only shown the first five values for the first two data sets. You wouldn't want to read the thousand or so frequency points for each of the four spectra! Finally, note that the frequencies are in reverse order in the data sets (i.e., starting at the highest frequency of 4.49931 THz and working backwards.). This is because our spectrometer measures in wavenumbers, and therefore the data appears in reverse in terms of frequency. This isn't important; the plotting method I use doesn't care which way round the data is presented.

I then made a little Gnuplot script in a text editor (VIM), and saved it as "spectral-map.gnuplot". It reads as follows:
#! /usr/bin/gnuplot
set pm3d map
splot 'spectra-vs-voltage.dat'
set terminal png crop
set output 'spectral-map.png'
set xlabel 'Pulse generator voltage [V]'
set ylabel 'Frequency [THz]'
set cblabel 'Intensity [a.u.]'
set cbrange [0 to GPVAL_DATA_Z_MAX]
set xrange [GPVAL_DATA_X_MIN to GPVAL_DATA_X_MAX]
set yrange [3.2 to 3.45]
unset key
replot
unset output

I made the script executable using "chmod +x spectral-map.gnuplot", and then generated the plot by running the script: "./spectral-map.gnuplot". If you care how the script works, or want to modify it, read on. Otherwise, happy plotting :)

Explanation of script

  • The first line is a standard instruction (the "shebang" line), which tells UNIX that this file can be interpreted by Gnuplot.
  • The set pm3d map line sets the plotting style as a 2D colour map of some three-dimensional data
  • The splot 'spectra-vs-voltage.dat' line generates a preliminary plot of the data, without any special formatting. By default, this will normally flash up on your screen, and then disappear when Gnuplot finishes. In fact, I only did this preliminary plot as a bit of a hack so that Gnuplot can figure out the range of the data in the input file. It works fine, but I'm sure there is a better way to do this... flashing unneeded windows around on the screen feels ugly!
  • The set terminal png crop line says that we want the final image to be rendered to a PNG image file, and for any whitespace around the edges to be cropped away.
  • The set output 'spectral-map.png' line instructs Gnuplot to open a file called "spectral-map.png", ready for us to write the image.
  • The set xlabel line (and the following ylabel and cblabel lines) set the labels on the x-axis, the y-axis and the colourbar.
  • The set cbrange line sets the limits of the data to appear in the colourbar. Anything intensities lower than 0 will appear black in the image. The maximum value is obtained using the GPVAL_DATA_Z_MAX variable, which corresponds to the highest intensity value in the preliminary plot we drew in line 3.
  • Similarly, the xrange and yrange commands set the range of data on the x and y-axes.
  • The unset key command hides an annoying line of text in the image containing the data filename.
  • The replot command regenerates the plot, this time using the desired formatting (ranges, labels etc). Note that this time, the plot is written into our PNG output file, rather than to screen because we changed the terminal in line 4.
  • Finally, we have finished writing the image into our PNG output file, so we tell Gnuplot to close it by writing unset output.
That concludes the explanation. Let me know if anything needs clarification!

Friday, 8 June 2012

Converting lab equipment data into plain text

I find it mildly irritating that lots of lab equipment (oscilloscopes, network analysers etc) outputs its data in a complex format.  Usually, I just want the data in a plain text format so that I can read and plot it easily.  Often, the data files will come with a load of metadata at the start, and present the "useful bit" of data in a slightly obscure way.

Most plotting packages (Origin, QtiPlot, SciDaVis etc) will let you strip away the unwanted stuff in the file and will accept various data formats (like CSV etc). However, this is still a minor annoyance because each piece of equipment gives data in a different format and you need to configure the input parser differently whenever you read a file from a different machine.

For me, it's even more annoying because my favourite plotting package, xmgrace, only really likes plain text input so the data needs to be parsed into that format before I can open it for plotting.

In this post, I'll give an example of the kind of horrible data that is output from lab equipment, and show how it can be translated into something tidier by using a simple script

Horrible data: an example

As an example, consider the following excerpt of data from one of our Tektronix oscilloscope:

Record Length,2.500000e+03,,  -0.025000000000,   5.40000,
Sample Interval,2.000000e-05,,  -0.024980000000,   5.40000,
Trigger Point,1.250000000000e+03,,  -0.024960000000,   5.40000,
,,,  -0.024940000000,   5.40000,
,,,  -0.024920000000,   5.40000,
,,,  -0.024900000000,   5.40000,
Source,CH1,,  -0.024880000000,   5.40000,
Vertical Units,V,,  -0.024860000000,   5.40000,
Vertical Scale,1.000000e+00,,  -0.024840000000,   5.40000,
Vertical Offset,-3.400000e+00,,  -0.024820000000,   5.40000,
Horizontal Units,s,,  -0.024800000000,   5.40000,
Horizontal Scale,5.000000e-03,,  -0.024780000000,   5.40000,
Pt Fmt,Y,,  -0.024760000000,   5.40000,
Yzero,0.000000e+00,,  -0.024740000000,   5.40000,
Probe Atten,1.000000e+00,,  -0.024720000000,   5.40000,
Model Number,TDS2014B,,  -0.024700000000,   5.40000,
Serial Number,C030757,,  -0.024680000000,   5.40000,
Firmware Version,FV:v22.01,,  -0.024660000000,   5.36000,
,,,-00.024640000000,   5.40000,
,,,-00.024620000000,   5.40000,
,,,-00.024600000000,   5.36000,
,,,-00.024580000000,   5.36000,
,,,-00.024560000000,   5.36000,

Note a few things:
  1. The first 18 lines don't contain any of the data I was measuring. Instead, it's metadata describing how the scope was set up for the measurement, the model of the scope, and so on. This metadata is often very useful, but a lot of plotting packages won't like it being there!
  2. The actual data only starts on line 19 (time and corresponding voltage on the scope). I've only shown 5 lines of the data because the file goes on for 2500 lines. You get the idea, right?
  3. The data is in CSV format. Some plotting packages (e.g., xmgrace) can't handle this.
  4. The data is actually in columns 4 and 5. Columns 1, 2 and 3 are empty. Again, some plotting packages can't handle this.

Solution

The problem can be solved (in linux/unix) by writing a simple script, something like the following:

#! /bin/bash -e

# scope-to-dat - Convert Tektonix oscilloscope CSV data to a plottable table
# (c) Alex Valavanis, University of Leeds 2010



# * Chop off the first 18 lines (i.e settings info)
# * Only use the actual data columns (cols 4 and 5)
# * Replace the comma separators with tabs

tail -n+19 $1 | cut -f4,5 -d',' | tr ',' '\t'


To use the script on a given data file, you'd just say something like

scope-to-dat.sh my_data_file.CSV > plain_text_data.dat


This just takes the scope data from the file "my_data_file.CSV" and outputs it to a new plain-text file called "plain_text_data.dat".

If you don't understand how this script works, read on. It's easy to modify it to handle similar data formats.

Explanation

Only the last line of the script actually does anything.  It consists of three commands that are separated by pipes (the "|" character), meaning that the data is passed between the three commands and "tweaked" into a nicer format at each stage.  I'll explain each of the three commands in the script as follows...

tail -n+19 $1

This first command chops off the header of the file. The "tail" program gives you the last few lines of a file. By default it gives the last 10 lines, but the "-n+19" flag makes it display everything from line 19 onwards instead. The "$1" is the option that was specified when the user runs the script. In this case, it's the CSV data filename.

cut -f4,5 -d','

Having chopped off the header, we now throw away everything except for the 4th and 5th column. The "cut" program is used here... it selects columns from a multi-column file. The "-f4,5" flag means that we want the 4th and 5th columns. The "-d','" flag tells the "cut" program that the data is separated (delimited) by commas.

tr ',' '\t'

Finally, we want to change the data from CSV format to plain text. The "tr" program swaps (or "translates") one character to another. In this case, we tell it to swap every comma in the data to a tab character.