.\" Nroff -man .de SY .sp .SH SYNTAX .PP .nf .. .de ES .SH EXAMPLES .PP .nf .. .de EE .fi .SH DESCRIPTION .. .TH graph+ ajk "31 January 1984" .SH NAME graph+ \- powerful graph generator program .SH SYNOPSIS .B graph+ [ commandfile | \- ] [ parameter ... ] .SH DESCRIPTION .PP .I Graph+ is a utility program for producing graphs. Its main advantage is that it allows easy manipulation of data using dynamic tables in memory using a relational style language (project etc.) rather than requiring a number of separate programs piped together. .I Graph+ also allows functions to be defined returning values or tables. .PP By placing the line .PP .in +4 #! /usr/rmit/graph+ .in -4 .PP at the beginning of a .I graph+ command file, the file can then be executed directly like a shell script, assuming that the binary file is in /usr/rmit. Output is in the format for input to the .IR plot (1) filters. .PP The parameters can be referenced inside a plot program. The first parameter is given the number 1. This allows file names and graph titles to be passed to a graph+ program. (See strings). If parameters are to be suppied, but the actual program is on standard input, then \- must be specified for the program file name. 'parms' is set to the number of parameters. Note that numeric parameters can be handled by using the val() function. .PP In .IR graph+ , a .I table is a simple two dimensional array of floating point numbers. Tables can have any number of rows and columns. Data to be plotted is usually kept in files in a multicolumn format with each line representing one piece of related data (say results from one experiment). An .I awk program can be used to extract data from a complexly structured data file or from files containing text and reformat into columns of data if necessary. .I Graph+ does not allow text to be read \- only floating point data. .PP Anywhere a .I string can appear, a string expression is allowed. A string expression can consist of a list of concatenated strings sepearated by + symbols. Literal strings are enclosed in double quotes ("). To embed a double quote in the string, place two double quotes immediately adjacent to each other. Alternatives for literal strings include the word .I parameter followed by the parameter number (the first parameter is number 1). The reserved word .I date returns a string which consists of the current time and date (UNIX format). The function .I val(string) returns the numeric value of a string. The function .I str(expr) returns a string containing the result of printf("%g",expr). The function .I str(format,expr) allows your own printf format to be used. .PP A .I graph+ program consists of a number of statements. .SH STATEMENTS .PP The ';' is treated as a null statement. It can be used at between other statements if desired. When graph+ is used interactively (ie. type graph+ with no parameters) commands can be entered but due to the look-ahead parser, the next statement must be entered before the existing one is processed. Adding a ; to the end of the command then causes the parser to realise that the end of a statement has been reached. Note also that due to problems with yacc(1), the include statement requires a ; after it (see below). .SY assume .ES assume parms=3 "3 parameters expected: "+str(parms)+" parameters given" assume count(data) > 0 "data table empty" .EE .PP .I Assume aborts a graph+ program if the expression evaluates to false. This allows simple checks for table sizes, numbers of parameters etc. to be inserted in a program and an error message to be printed. .SY = .ES data = read "data.file" .EE .PP Defines the identifier .I tab_ident to have the specified table of values. Table identifiers may be referred to in .I table expressions. In the declaration, .I tab_ident must not have been already defined unless it has already been defined as a table. .SY = .ES pi = 3.1415 .EE .PP Similar to a table declaration, except that the identifier has a single numeric value assigned to it instead of a complete table. .SY ( ) = ( ) =
.ES # this function returns the ratio of two numbers a = 0 b = 0 ratio(a,b) = a/b # this function limit returns a table of values where column # 1 is less than the specified limit limit_val = 0 # declare parameters limit_tab = {{0}} # declares a table limit ( limit_tab , limit_val ) = limit_tab where $1 < limit_val .EE .PP Declares a function which returns a simple value or a table of values. Function identifiers can never be redefined. The parameter lists consist of a comma separated list of .I tab_ident or .I var_ident identifers .B "which must have been previously defined" (see above example). These identifiers will act as parameters to the functions in the following way. When the function is called, the identifiers are redeclared with the parameter values. Care must be taken for nested function calls as if they use the same identifier names for parameters, the parameters to function A might be changed after returning from a call to function B. It is common to declare special variables for parameters to functions immediately before the actual function to be certain of the parameter type. .SY include ; .ES include "bar_chart.g" ; .EE .PP The .I include statement allows another file of .I graph+ commands to be included. This is useful for including function declarations. The ; is needed only because of the current implementation of the parser. When an include is done, one token after the string is accidently lost. This may not be consistant across implementations of yacc and lex. .SY shell .ES shell "awk -f complex.awk prelimfile > finalfile" data = read "finalfile" .EE .PP The .I shell command allows a command to be passed to the shell 'sh'. This can be used to generate a file to be read by including an output redirection. .SY print
|| print >
|| print >>
|| .ES print "saving data to file copy.of.data" # print message on screen print > "copy.of.data" data # save data on a file print >> "copy.of.data" more_data # append some more data .EE .PP The .I print command can have its output redirected (overwrite or append) to a file called .I and print out a table, the result of an expression or a simple string. If two prints to the same file are used, make sure the second is and append redirect (>>). .SY save
as .ES data = read "text.data" save data as "data.binary" print > "text.data" load "data.binary" # converts a binary file back to text .EE .PP The .I save command saves the table of data in binary format for faster loading with the .I load command. This is useful when a large datafile is to be read a number of times by different plot programs as the binary file can be loaded much more quickly than textual versions. It is also possible that the binary file will require less disk space. The format of the binary file is simply two integers (number of columns and number of rows) followed by all the data (doubles). The data is written out column by column so the first double is the first value in the first column, the second value is the second value in the first column and so on. Binary files may not be transportable across machines. .SY graph
[
can be an expression made up from any of the following sub-expressions. These usually appear directly in .I graph statements, or when assigned to a table variable. .SY read read by .ES data = read "data.file" graph read 2 by 200 "data.file" # I know the file is 2 colums by 200 lines data = read parameter 2 + ".dat" .EE .PP The .I read command reads a table from a file. Blank lines and all characters from # to the end of line are ignore (allowing comments to be put in data files). The number of columns in the table is determined by reading the first non-blank line in the file. The number of rows is determined by counting the number of non-blank lines in the file. To read a file, two passes are made \- the first to determine the table size so a table can be allocated enough memory, and a second to read the actual values. .PP The second form of the .I read command allows the dimensions of the table to be prespecified so that only one pass of the file is needed. This makes the command slightly faster but much less flexible to changes in the data file. The first .I expr specifies the number of columns (which must be correct) and the second specifies the number of rows (which can be too large \- the table is simply truncated although enough memory is allocated for the full table. .PP The ability to create strings from concatenations of sub-strings is very useful when specifying a filename. In particular, the ability to extract parameters to the command allows the same plot program to be used with several data files. .SY load .ES data = load "data.file" .EE .PP .I load is very similar to .I read except that it loads a binary file saved with the .I save command. The binary file format is described under the .I save command. .SY { { , , ... } { , , ... } ... } .ES const_table = { {0,0} {1,2} {3,10} {4,20} } # 2 cols, 4 rows calc_table = { {0,log(0)} {10,log(10)} {100,log(100)} } graph {{10,20}} label "label at 10,20 on graph" .EE .PP Braces allow a table of expressions to be specified. One set of braces surround the whole table, each row in the table being specified by a comma separated list of expressions surrounded in one set of braces per row. .SY generate from to = with intervals = interval size .ES tab_size = 10 log_table = generate from 1 to tab_size interval size 1 [ $1 , log($1) ] single_col_table = generate from 1 to 100 with 11 intervals .EE .PP The .I generate command generates a single column table of values in the specified from\-to range using the specified interval details. .SY
append
.ES total_dat = dat1 append dat2 .EE .PP The .I append command appends the second .I table to the end of the first .I table. Both tables must have the same number of columns. .SY
adjacent
.ES wide_tale = dat1 adjacent dat2 .EE .PP The .I adjacent command is similar to the .I append command, but it joins tables horizontally instead of vertically. Both tables can have different numbers of columns, but must have the same number of rows. .SY
[ , , ... ] .ES graph all_data [ $1 , $3 ] # plot columns 1 vs 3 graph all_data [ $0 , ($1+$3)/2 ] # plot the average of column 1 and 3 .EE .PP The projection command (square brackets) allows data to be extracted from a table, or new columns of data to be formed from expressions based on existing columns of data. See the .I expr for exact details of how to extract columns of data from a table. The number of rows in the new table is the same as the number of rows in the old table, but the number of columns is dependent on the number of expressions in the comma separated list of expressions inside the square brackets. .SY
where .ES some_data = data where $0 <= 100 # select first 100 entries in table some_data = data where $1 >= $2/2 # select lines where first column is # greather than or equal to half the second column .EE .PP The .I where (or selection) command forms a new table with the same number of columns as the old table, but only containing rows that satisfy the condition .I expr. The expression normally uses relational operators such as <, >= etc. but any expression can be used. A return value of 0.0 is considered as false, any other value being true (as in the 'C' programming language). .SY sort
by .ES data = sort data by $2 # sort table of two columns by the sum of the two columns sorted_data = sort ( data [ $1 , $2 , $1+$2 ] ) by $3 [ $1 , $2 ] .EE .PP The .I sort command allows a table to be sorted by a column in the table specified by .I attr. The column specified cannot be column zero (which has a special use). An .I attr was used to sort on instead of any expression for efficency reasons. To sort by an expression, simply project an extra column into the table before sorting, and remove it (with another project) later. The sort algorithm used is a version of quicksort. .SY join , by , .ES data = join dat1 , dat2 by $1 , $1 .EE .PP The .I join command performs an equi-join on the two tables of data. and specify the fields on which the join is to be performed. If the data in the two tables is not found to be sorted on the join fields, the tables are sorted automatically. The join then creates a new table which contains all the columns of followed by all the columns of (the join field is NOT removed and so will appear in the resultant table twice). .SY cumulate
by .ES graph cumulate cost_per_unit by $2 .EE .PP The .I cumulate command takes a column in a table specified by .I attr and replaces each entry with the sum of the present value and all the previous values. This is useful for plotting cumulative graphs. .SY group
[ from to ] performing .ES graph group sort data by $1 with 10 intervals performing max [ ($1+$2)/2 , $3 ] data = group data from 10 to 20 with interval size 5 performing average .EE .PP The .I group command is fairly complex but allows graphs to be split into a number of intervals with the specified function .RI ( fvar_ident ) being called for each interval. The function must return a simple value and have a single parameter of a table (such as the predefined .IR sum , .IR min , .IR max , .I count and .I average functions. .PP The group table must have exactly two colums, the first column being used for grouping by and the second containing the values to pass to the function. The table must have already been sorted on the first column. Projects are often made to tables to get them into the correct form for a grouping. The optional range specification (if no range is specified, the minimum and maximum values are calculated from the table) defines the upper and lower bound of values to be considered in the group. The interval specification is used to determine the number of and size of the group intervals. Beware of intervals have no values in them. These will cause the function to be called with an empty table. .PP The result of a group has extactly three columns. The first two columns contain the lower and upper bound values of the interval and the third column contains the result of the function when applied to all of the values within that interval. From this, it is fairly simple to produce different forms of graphs (eg bar graphs, line graphs). .SY ( , , ... ) .ES print min ( data_table ) print average ( data_table[$3] ) tab = {{0}} ftab(tab) = count ( tab[$1] ) / average ( tab[$2] ) print ftab({{1,1}{4,4}}) .EE .PP .I is a table function name which is called with the specified parameter list. .PP Table identifiers return the value of the table as last declared with "=" or when declared as a parameter. .SY (
) .ES sort ( data[ $1 , $3 ] ) by $2 .EE .PP Simple parenthisis allow the priority of evaluation to be forced. Default priority of table operators listed from highest priority to lowest is select and project, adjacent, append, sort, group and cumulate. .SH EXPRESSIONS .PP In the following, a true value is considered to be any non-zero value, and false is considered to be the value 0. The operators are listed here from lowest to highest priority. .SY "?" ":" .ES data = data [ $1 , $2<=0 ? 0 : log($2) ] .EE .PP The .I '?' operator returns the value of the second expression if the first expression is true; otherwise the value of the third expression is returned. .SY || .ES data = data where $1<10 || $1>20 .EE .PP The logical OR operator ('||') returns true if either expression returns true, false otherwise. .SY && .ES data = dat where $1>10 && $2<20 .EE .PP The logical AND operator ('&&') returns true if both expressions return true, false otherwise. .SY =|==|<>|!=|<=|<|>|>= .ES data = data where $1>10 | $2 = 0 .EE .PP These relational operators return true or false based on the operator and the results of the two expressions. Note that equivalence can be specified with either '=' or '==' and inequivalence can be specified with either '<>' or '!='. .SY +|\- .ES print data[ $1+$2 , $1\-$2 ] .EE .PP These are the simple arithmetic addition and subtraction operators. .SY /|*|% .ES pi = 3.1415 print pi / 3 print pi * 3 print 100 % 3 .EE .PP Divide, multiply and modulus operators. .SY = \- ! ( ) ( |
, |
, ... ) parms .ES print -3 print !3 print data[ $2 ] pi = 3.1415 print pi tab={{0}} sizeof ( tab ) = count ( tab ) print "table size should be 2" print sizeof ( {{1},{2}} ) .EE .PP .I Primary expressions are separated in the syntax as an attribute specification (see .IR attr ) can only be a .I primary expression, typically a simple number. .PP The minus unary operator is for negation. .PP The NOT ("!") operator returns the oposite boolean value of the expression. .PP Parenthises allow the default operator precidence to be overridden. .PP .I Numbers are any floating point number. .PP .IR s are variable function names, the parameters being listed inside the parenthises. .PP .IR s are variables defined with the "=" operator or by parameter passing. .PP .I parms returns the number of command line parameters given to the program. This is really only of any use in an .I assume statement (assume parms = 3 "3 parameters expected"). .SY = $ .ES graph data[ $1 , $2 ] sumcol = 3 graph data[ $1 , $sumcol ] graph data[ $1 , $($1<100 ? 2 : 3 ) ] # choose column based on $1 print > "savefile" data[ $0 , $1 , $2 ] .EE .PP .IR 's specify which column of a table to use. Column numbers start from 1 and count upwards. In an expression, the value of that column is returned if used in a project or select type command for which the expression is re-evaluated for every row of a table. In expressions, column zero may be specified which returns the row number (starting from 1) of the row instead of data from the table. .SH "PREDEFINED FUNCTIONS" .PP There are a number of predefined funtions, all of which at present return values. .TP .BI str (expr) .TP .BI str (format,expr) Return a string containing the value of the given expression. If no format is specified, "%g" is used. The format is passed straight to sprintf(). .TP .BI min (table) Return the minimum value in the first column of .IR table . If the table is empty, a warning is printed on stderr and a value of 0.0 is returned. .TP .BI max (table) Return the maximum value in the first column of .IR table . If the table is empty, a warning is printed on stderr and a value of 0.0 is returned. .TP .BI sum (table) Return the total of all the values in the first column of .IR table . If the table is empty, a warning is printed on stderr and a value of 0.0 is returned. .TP .BI count (table) Return the number of rows in .IR table . .TP .BI average (table) Return the average value in the first column of .IR table . If the table is empty, a warning is printed on stderr and a value of 0.0 is returned. This is like doing a .I sum divided by a .I count. .PP The following mathmatical functions are straight from the math library and suffer from all their constraints (no checking is done). See section 3(M) in the Unix Programmers Manual. .TP .BI sqrt (expr) Return the square root of the given expression. .TP .BI log (expr) Return log10 of the given expression. .TP .BI ln (expr) Return the natural log of the given expression. .TP .BI exp (expr) Return e to the power of expr. .TP .BI pow (expr1,expr2) Return .I expr1 to the power of .I expr2. .TP .BI floor (expr) Return the largest integer not greater than .I expr. .TP .BI ceil (expr) Return the smallest integer not less than .I expr. .TP .BI abs (expr) Return the absolute value of .I expr. .TP .BI sin (expr) Return sine of .I expr. .TP .BI cos (expr) Return cosine of .I expr. .TP .BI asin (expr) Return arc-sine of .I expr in the range from -pi/2 to +pi/2. .TP .BI acos (expr) Return arc-cos of .I expr in the range 0 to pi. .TP .BI atan (expr) Return arc-tangent of .I expr in the range -pi/2 to +pi/2. .TP .BI atan2 (expr1,expr2) Return arc-tangent of .I expr1/expr2 in the range -pi to +pi. .TP .BI sinh (expr) Return hyperbolic sine of .I expr. .TP .BI cosh (expr) Return hyperbolic cosine of .I expr. .TP .BI tanh (expr) Return hyperbolic tangent of .I expr. .TP .BI hypot (expr1,expr2) Returns the square root of the sum of the squares of .I expr1 and .I expr2. .TP .BI j0 (expr) .TP .BI j1 (expr) .TP .BI jn (expr1,expr2) .TP .BI y0 (expr) .TP .BI y1 (expr) .TP .BI yn (expr1,expr2) Bessel functions. .TP .BI val (string) Return the numeric value of the specified string. This is particularly useful for getting numeric parameters from the command line. .SH EXAMPLE .PP The following are some simple examples to try and help show how to actually use all these commands. .PP #! /usr/rmit/graph+ # some constant tables file1={ { 0, 1 } { 1, 20 } { 2, 30 } { 3, 16 } { 4, 19 } { 5, 29 } } file2={ { 0, 3 } { 1, 15 } { 2, 34 } { 3, 12 } { 4, 12 } { 5, 24 } } num = 6 final = sort (file1 append file2 ) by $1 # the output of graph can not be directly printed. # turn a graph table into a list of lines through # the middle of each interval by forming a table # with the first column being the middle of the group # interval (the average of the first two columns) and # the other column being the actual data lineg_tab={{0}} # dummy declaration for function parameter lineg(lineg_tab) = lineg_tab[ ($1+$2)/2 , $3 ] graph lineg( group final with num intervals performing max ) graph lineg( group final with 10 intervals performing min ) graph file1 label "file" graph cumulate(file1) label "cumulative" xaxis label "This is the x-axis" yaxis label "This is~the y-axis" # will appear # This # is the # y-axis .PP A more typical example is .PP data1 = read "results.1" data2 = read "results.2" graph data1[ $1 , $2 ] dotted line label "first results" graph data2[ $1 , $2 ] shortdashed line label "second results" graph ( data1[$1,$2] adjacent data2[$2] ) [ $1 , ($2+$3)/2 ] solid line label "average" .SH DIAGNOSTICS .PP Error recovery is poor. Syntax error can mean almost anything, the parse being a simple yacc program. Note that undefined identifiers (due to typing errors) can also come up as syntax errors. .SH BUGS .PP Almost infinite. But the program in general does work. Its just that I personally find the graphs are often not quite what I want. It is however MUCH better than the standard unix graph(1) command and allows good manipulation of data much faster than awk(1). .PP One problem that can occur is running out of memory. All tables are held in memory - never on disk. The program works be continually building new tables from the contents of old tables and then (if possible) freeing the old tables. Parameters after being called do not free the space consumed by the parameter declaration, so large data files can cause a lot of memory to be held by the graph+ program. .PP There would need to be several thousand more options to be able to print all forms of graphs to keep everyone happy. .PP If you have any problems, do not mail to ajk@goanna.oz ;-) .SH AUTHOR Alan Kent (ajk@goanna.oz) .SH SEE ALSO leplot(ajk), awk(1), graph(1), plot(1), plot(3x).