tableutil.py

a tool to manipulate and print tables of numbers

Introduction

This script can be used to manipulate and pretty-print tables of numbers. Such a table consists a line that defines the names of the columns and more lines that contain numbers separated by spaces. This simple format can be used to represent all kinds of number tables. Here is an example:

t   x   y
1   2   4
2   4   8
3   6  16
4   8  32

The first line defines the column names, all following lines define the content of the table.

This script can manipulate the table just by usage of the command line:

tableutil.py -t test.tab --calc 'sum=(x+y)'
t   x   y    sum
1.0 2.0 4.0  6.0
2.0 4.0 8.0  12.0
3.0 6.0 16.0 22.0
4.0 8.0 32.0 40.0

But it can also be used to do more complex actions by providing a script. Assume that the script "test.cmd" has this content:

tab= Table_from_File(fn)
ntab= tab.derive_add("t",["x","y"],["vx","vy"])
ntab.print_(formats=["%d","%d","%d","%.2f","%.2f"],
            justifications=["R"])

Then this command file can be used like this:

tableutil.py -c test.cmd --eval "fn='test.tab'"
t x  y   vx    vy
1 2  4 0.00  0.00
2 4  8 2.00  4.00
3 6 16 2.00  8.00
4 8 32 2.00 16.00

For more examples do have a look at the example section.

Format of the table

The table must be an ASCII text containing a heading followed by one or more data lines. A heading is just a list of space-separated names. Each name is interpreted as a name for a column. Data lines consist of space separated numbers. All common number literals as they are known from C or Python are supported.

Format of the command file

As you may have recognized, the command file is simply Python code that is interpreted. It uses the numpy_table.py module to handle tables of numbers.

The only difference to an ordinary python script is that there is no need to import numpy_table.py. All functions and classes from this module are already imported. For a reference of the functions and classes do have a look at the documentation of numpy_table.py.

Reference of command line options

--version print the version number of the script
-h, --help print a short help
--summary print a one-line summary of the script function
--doc create online help in restructured text format. Use "./tableutil.py --doc | rst2html" to create html-help"
--test perform a simple self-test of some internal functions
--math import the python "math" module into the global namespace. This allows to use all functions of this module for calculations in a "--calc" option.
-t TABLESPEC, --table TABLESPEC
 Read a table according to specification TABLESPEC. TABLESPEC is a comma-separated list of the name the Python variable holding the table and the filename where the table data can be found. If TABLESPEC contains no comma, it is interpreted as a sole filename and the table variable name is created from the basename of file without the file extension. All "-" signs in the filename are changed to "_" when the variable is created this way. So "mypath/table-1.txt" becomes "table_1". You may use this option several times in order read more than one table.
-s SEPARATOR, --sep SEPARATOR
 The separator is the string that separates columns. The default is a single space.
-p PRINTSPEC, --printtab PRINTSPEC
 Print the table according to the given PRINTSPEC. PRINTSPEC is a comma separated list of the table name, the formats and the justifications. The table name or/and the justifications may be omitted. A single tablename without formats and justifications is also allowed. If the table name is not specified, the first table read with the "-t" option is printed but only if no command file (--cmdfile) is specified. You may use this option several times in order to print more than one table. Formats is a string of space separated format strings as they are used in the c programming language. If there are fewer formats than columns, the last format is taken for all remaining columns. Justifications is a space separated list of the letters "L","R" and "C" which stand for "justify left", "justify right" and center. If there are fewer justifications than columns, the last justification is taken for all remaining columns. Note that the justification is done AFTER the format string is applied and that justification itself does not remove leading or trailing spaces from column values.
--calc CALCEXPRESSION
 Calculate additional columns by applying a python expression to each line of the table. CALCEXPRESSION us a colon ":" separated list of the name of the table that is to be changed and a calculation expression in the form (result1,result2..resultn)=(expr1,expr2...exprn). "expr" must be an expression that is valid in a python lambda statement. The values within the line of the table must be adressed by their column names. The "result" strings define the names of the new columns that are created. The brackets around the result name list may be omitted. If the table name is empty or omitted, the calculation is applied to the first table read by the "-t" statement. You may use this option several times in order to apply more that one calculation.
-c COMMANDFILE, --cmdfile COMMANDFILE
 Specify one or more command files that are to be interpreted by python. All python statements can be used. Note that all functions and classes from numpy_table.py are already imported. You may use this option several times in order to execute more that one command file.
--eval EXPRESSION
 Evaluate a python expression. This may be used to set python variables on the command line that are used by the commandfile (see --cmdfile).

Examples

Let test.tab have this content:

time    x        y
1       2        4.0
20      4.1      8.12
300    16.01     16.123
4000   181.001   32

Read from stdin and print

cat test.tab | ./tableutil.py -t -
time   x       y
1.0    2.0     4.0
20.0   4.1     8.12
300.0  16.01   16.123
4000.0 181.001 32.0

Calculate a new column that is the sum of the column x and y

./tableutil.py -t test.tab --calc 'sum=x+y' -p "%d %.3f,R"
time       x      y     sum
   1   2.000  4.000   6.000
  20   4.100  8.120  12.220
 300  16.010 16.123  32.133
4000 181.001 32.000 213.001

Calculate a new column that is the square-root of x

./tableutil.py -t test.tab --math --calc 'sq=sqrt(x)' -p "%d %.3f,R"
time       x      y     sq
   1   2.000  4.000  1.414
  20   4.100  8.120  2.025
 300  16.010 16.123  4.001
4000 181.001 32.000 13.454

Calculate two new columns that are the square-root of x and y

./tableutil.py -t test.tab --math --calc 'sq_x,sq_y=(sqrt(x),sqrt(y))' -p "%d %.3f,R"
time       x      y   sq_x  sq_y
   1   2.000  4.000  1.414 2.000
  20   4.100  8.120  2.025 2.850
 300  16.010 16.123  4.001 4.015
4000 181.001 32.000 13.454 5.657

Calculate with a command file

Suppose we want to calculate a velocity v=dx/dt and a distance r=sqrt(x**2+y**2) with a file "test.cmd" with content:

from math import *
tab= Table_from_File("test.tab")
tab= tab.derive_add("time",["x"],["velocity"])
tab= tab.map_add(["r"],lambda time,x,y,velocity:sqrt(x**2+y**2))
tab.print_(formats=["%.2f"],justifications=["R"])

Now execute the script with this command:

./tableutil.py -c test.cmd
   time      x     y velocity      r
   1.00   2.00  4.00     0.00   4.47
  20.00   4.10  8.12     0.11   9.10
 300.00  16.01 16.12     0.04  22.72
4000.00 181.00 32.00     0.04 183.81