Friday, 8 June 2012

Converting lab equipment data into plain text

I find it mildly irritating that lots of lab equipment (oscilloscopes, network analysers etc) outputs its data in a complex format.  Usually, I just want the data in a plain text format so that I can read and plot it easily.  Often, the data files will come with a load of metadata at the start, and present the "useful bit" of data in a slightly obscure way.

Most plotting packages (Origin, QtiPlot, SciDaVis etc) will let you strip away the unwanted stuff in the file and will accept various data formats (like CSV etc). However, this is still a minor annoyance because each piece of equipment gives data in a different format and you need to configure the input parser differently whenever you read a file from a different machine.

For me, it's even more annoying because my favourite plotting package, xmgrace, only really likes plain text input so the data needs to be parsed into that format before I can open it for plotting.

In this post, I'll give an example of the kind of horrible data that is output from lab equipment, and show how it can be translated into something tidier by using a simple script

Horrible data: an example

As an example, consider the following excerpt of data from one of our Tektronix oscilloscope:

Record Length,2.500000e+03,,  -0.025000000000,   5.40000,
Sample Interval,2.000000e-05,,  -0.024980000000,   5.40000,
Trigger Point,1.250000000000e+03,,  -0.024960000000,   5.40000,
,,,  -0.024940000000,   5.40000,
,,,  -0.024920000000,   5.40000,
,,,  -0.024900000000,   5.40000,
Source,CH1,,  -0.024880000000,   5.40000,
Vertical Units,V,,  -0.024860000000,   5.40000,
Vertical Scale,1.000000e+00,,  -0.024840000000,   5.40000,
Vertical Offset,-3.400000e+00,,  -0.024820000000,   5.40000,
Horizontal Units,s,,  -0.024800000000,   5.40000,
Horizontal Scale,5.000000e-03,,  -0.024780000000,   5.40000,
Pt Fmt,Y,,  -0.024760000000,   5.40000,
Yzero,0.000000e+00,,  -0.024740000000,   5.40000,
Probe Atten,1.000000e+00,,  -0.024720000000,   5.40000,
Model Number,TDS2014B,,  -0.024700000000,   5.40000,
Serial Number,C030757,,  -0.024680000000,   5.40000,
Firmware Version,FV:v22.01,,  -0.024660000000,   5.36000,
,,,-00.024640000000,   5.40000,
,,,-00.024620000000,   5.40000,
,,,-00.024600000000,   5.36000,
,,,-00.024580000000,   5.36000,
,,,-00.024560000000,   5.36000,

Note a few things:
  1. The first 18 lines don't contain any of the data I was measuring. Instead, it's metadata describing how the scope was set up for the measurement, the model of the scope, and so on. This metadata is often very useful, but a lot of plotting packages won't like it being there!
  2. The actual data only starts on line 19 (time and corresponding voltage on the scope). I've only shown 5 lines of the data because the file goes on for 2500 lines. You get the idea, right?
  3. The data is in CSV format. Some plotting packages (e.g., xmgrace) can't handle this.
  4. The data is actually in columns 4 and 5. Columns 1, 2 and 3 are empty. Again, some plotting packages can't handle this.

Solution

The problem can be solved (in linux/unix) by writing a simple script, something like the following:

#! /bin/bash -e

# scope-to-dat - Convert Tektonix oscilloscope CSV data to a plottable table
# (c) Alex Valavanis, University of Leeds 2010



# * Chop off the first 18 lines (i.e settings info)
# * Only use the actual data columns (cols 4 and 5)
# * Replace the comma separators with tabs

tail -n+19 $1 | cut -f4,5 -d',' | tr ',' '\t'


To use the script on a given data file, you'd just say something like

scope-to-dat.sh my_data_file.CSV > plain_text_data.dat


This just takes the scope data from the file "my_data_file.CSV" and outputs it to a new plain-text file called "plain_text_data.dat".

If you don't understand how this script works, read on. It's easy to modify it to handle similar data formats.

Explanation

Only the last line of the script actually does anything.  It consists of three commands that are separated by pipes (the "|" character), meaning that the data is passed between the three commands and "tweaked" into a nicer format at each stage.  I'll explain each of the three commands in the script as follows...

tail -n+19 $1

This first command chops off the header of the file. The "tail" program gives you the last few lines of a file. By default it gives the last 10 lines, but the "-n+19" flag makes it display everything from line 19 onwards instead. The "$1" is the option that was specified when the user runs the script. In this case, it's the CSV data filename.

cut -f4,5 -d','

Having chopped off the header, we now throw away everything except for the 4th and 5th column. The "cut" program is used here... it selects columns from a multi-column file. The "-f4,5" flag means that we want the 4th and 5th columns. The "-d','" flag tells the "cut" program that the data is separated (delimited) by commas.

tr ',' '\t'

Finally, we want to change the data from CSV format to plain text. The "tr" program swaps (or "translates") one character to another. In this case, we tell it to swap every comma in the data to a tab character.

No comments:

Post a Comment