Comparing columns of numbers in two documents

Summary

Comparing lists of numbers output by the same application running on different platforms looking for numerical differences can be a tedious undertaking. You can use the delta script described in this article to show these differences and the magnitude of the differences in an easy-to-scan format.

The delta script will allow you to compare two text files consisting of columns of floating-point values. Such files are commonly output by HPC and other applications. When porting such applications to new platforms it is useful to compare the outputs with those produced on a standard, or reference, platform. For each pair of corresponding values in the two input files, delta will show the arithmetic difference if they are not identical, or a "-" character otherwise. This provides an easier way to compare the outputs visually and skip over sequences of identical outputs quickly, and thereby concentrate on those values that are different.

Environment

Red Hat Enterprise Linux Workstation release 7.5 (Maipo)

Directions

You can obtain the script from https://github.com/cjantonelli/software/blob/master/delta .

Assume two input text files:

File "left":                                                 File "right":
10000    0       4       4      0.999986                               10000    0       4       4      0.999986
20000    3       5       4      1                                      20000    3       5       4      1
30000    2       6       4      1                                      30000    2       6       4      1
40000    0       6       2      0.850441                               40000    0       6       2      0.850441
50000    0       7       5      0.769175                               50000    0       7       5      0.769175
60000    0       9       4      0.62347                                60000    0       9       4      0.62347
70000    3       5       4      0.690913                               70000    3       5       4      0.690913
80000    2       9       7      0.732082                               80000    2       9       7      0.732082
90000    4       11      8      0.545357                               90000    4       11      8      0.545357
100000   7       12      11     0.542018                               100000   7       12      11     0.542014
110000   6       13      8      0.595288                               110000   6       13      8      0.595293
120000   6       9       7      0.596802                               120000   6       9       7      0.596802
130000   5       8       6      0.625419                               130000   5       8       6      0.625419
140000   5       6       5      0.593155                               140000   5       6       5      0.593155
150000   5       9       9      0.611278                               150000   5       9       9      0.611279

Then executing

delta left right

Yields

 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  4e-06
 -  -  -  -  5e-06
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  -
 -  -  -  -  1e-06
72 matches 3 deltas 15 records
max delta 5e-06 at record 11 column 5
pen delta 4e-06 at record 10 column 5

The summary output at the bottom shows the total number of data elements that matched and did not match for the number of records processed. The max(imum) delta is the largest difference found between elements of the file, the pen(ultimate) delta is the second-largest difference.

 

Additional notes:

Executing delta with no arguments gives usage information:

Usage: delta [-s] [-r] left-file right-file
	-s: just show the summary, not the detail lines
	-r: show record number

 

Delta is implemented using gawk, which represents floating point numbers internally in double-precision.