Contents: Overview | Example 1 | Rowspan & Colspan | Errors | Regexps | Relative Addressing | Tables in Tables | Invocation | Input Syntax | Download | To Do | Questions & Comments Overview
Wwwtable is a perl script that aims to make the production of HTML tables a little easier. In some cases, it actually succeeds.
This script works as a filter, reading stdin and writing to stdout (unless it finds some problem with your input, in which case you'll see something on stderr).
The input syntax is as roughly follows:
<wwwtable table-options...> initial text (e.g. <caption> ... </caption>). (X, Y) options for cell (X, Y) text for cell (X,Y)... ((X,Y)) options for header cell X,Y text for header cell (X,Y)... </wwwtable>Rows and cells may be specified in any order, numbering starts at 1. X and/or Y may be replaced by a regular expression to indicate rows or columns. Options or text (or both) may be omitted for cells. Cells may be omitted completely if they are empty or fall under the rowspan/colspan specifications of another cell. Cells may contain arbitrary HTML text, including other wwwtable tables. And there's more...
A First Example
As a simple example, you could provide this as input to wwwtable
<wwwtable border=1> (1,2) align=center width=200 Here's some text for the cell in row 1, column 2 </wwwtable>and you would see the following, nicely formatted, on stdout (assuming you bothered to look):
<table border=1> <tr> <td> <br> </td> <td align=center width=200> Here's some text for the cell in row 1, column 2 </td> </tr> </table>as you might expect/hope. Your browser renders this as
Here's some text for the cell in row 1, column 2 Notice that cell (1,1) in the table is empty, and that we didn't need to specify it in the wwwtable definition.
Rowspan and Colspan
But wait, there's more.
Wwwtable understands ROWSPAN and COLSPAN too, and will do the Right Thing for you. For example, this table:
This is the heading of my superb table. Item Description Price Tax Included? Rats Small $4.50 No (add 7%) Big $7.25 Was produced from the following input:
<div align=center> <wwwtable border=1 cellpadding=10> ((1,1)) colspan=4 bgcolor=yellow This is the heading of my superb table. (2,1) Item (2,2) Description ((2,3)) width=150 Price (2,4) Tax Included? (3,1) rowspan=2 Rats (3,4) rowspan=2 No (add 7%) (3,2) align=center Small (4,2) align=center Big (3,3) align=center $4.50 (4,3) align=center $7.25 </wwwtable> </div>Notice that I did not say what should go into cells (1,2), (1,3), (1,4), (3,4), or (4,4). Wwwtable figured it out from the rowspan and colspan specifications.
Notice too that I did not give the table cells in a left-to-right top-to-bottom order as you would need to do with normal HTML. It was more natural for me to specify cells (3,2) and (4,2) together because they are semantically related (rat sizes). The same goes for (3,3) and (4,3), the rat prices. Wwwtable doesn't care what order you give your row and cell information. The entire table specification is read before anything is output.
Errors
Unless you have a diagram and a clear idea of what goes where, tables can be hard to get right. Even then, they're hard to typeset in HTML, and hard-to-find problems arise all the time. Well, they do if you edit in the cavalier cut & paste style that I prefer. Fortunately, you wont have to debug cryptic dense HTML to find these problems anymore. Wwwtable does a reasonable job of making sure you don't try to do something bogus with you table.
For example, if I had made a mistake and specified some data for cell (1,4), having already said that cell (1,1) has a colspan of 4, I would have received a warning message like this:
wwwtable: Line 25: Cell (1,4) has already been specified, implicitly, in the rowspan/colspan specification of cell 1,1 (see line 3).Cool. Wwwtable will also detect other attempts to put multiple things in one cell, and chastise you accordingly. For example, you might see messages that look like these:
wwwtable: Line 25: Cell (2,2) has already been specified (see line 7). wwwtable: Cell (3,3)'s rowspan and colspan information (rowspan=1, colspan=1, conflicts with a previous definition of that cell's contents (line 21). wwwtable: The intersection of the specifications of cell (1,3) on line 3 and cell (2,1) on line 5 is non-empty. Both include cell (2,3).These cover some of the most obvious ways you can botch up a table, and the messages should be understandable.
Row & Column Regular Expressions
This table:
All yellow All yellow All yellow All yellow was created with the simple input text:
<div align=center> <wwwtable border=1> (*, *) width=140 align=center (2|3, 2|3) bgcolor=yellow All yellow (4, 4) </wwwtable> </div>What's going on here? First of all, it's easy to see why the table is 4x4: only one cell was specified (and it was left empty), so wwwtable fills in the missing cells. It's the first two cell specifications that are new.
The first one (*, *) width=140 align=center uses a perl-style regular expression to indicate the applicable row and column numbers. Wwwtable allows the use of * by itself to mean the regular expression .*. So that line means that all cells whose row number matches the regular expression .* and whose column number matches the regular expression .* should have width=140 align=center appended to their <td> or <th> information. This explains the width of the table cells and why the words All yellow are centered.
The next line, (2|3, 2|3) bgcolor=yellow, also uses a regular expression to indicate that some cells (those whose row numbers and cell numbers match 2|3) should have a yellow background color (apologies to those who cannot see this!). The following line gives some text that should be placed in the cells that match the regular expressions. The cells and rows that match those simple regular expressions are of course those numbered 2 or 3 (the | means OR in a regular expression). So the middle two columns and the middle two rows contain the words All yellow and (if you have the right browser) have a yellow background.
This is a very simple example of what you can do with regular expressions for row and column numbers. In fact, the numbers we saw before were also regular expressions - just ones that only match a single string.
Here's another example.
The table of contents at the very top of this page is actually a simple table. For a while, I played with making it more complex. Here's one of the things I tried:
Contents Overview Errors Invocation Example 1 Regexps Input Syntax Rowspan & Colspan Tables in Tables Download OK, so perhaps this version does look better, but I wanted something more compact. Anyway, here's the source:
<div align=center> <wwwtable border=1 cellpadding=10> (*, [^1]) width=150 align=center ((1,1)) rowspan=3 bgcolor=yellow valign=center Contents (1,2) <a href="main.html#overview">Overview</a> (2,2) <a href="main.html#example1">Example 1</a> (3,2) <a href="main.html#span">Rowspan & Colspan</a> (1,3) <a href="main.html#errors">Errors</a> (2,3) <a href="main.html#regexps">Regexps</a> (3,3) <a href="main.html#recursion">Tables in Tables</a> (1,4) <a href="main.html#invoke">Invocation</a> (2,4) <a href="main.html#syntax">Input Syntax</a> (3,4) <a href="main.html#getting">Download</a> </wwwtable> </div>The interesting regular expression cell specification is the line (*, [^1]) width=150 align=center which specifies that all rows and all columns except the first should have width 150 and be horizontally centered.
Actually.... the last section contains two lies.
The first is that the regular expression [^1] actually matches all numbers that do not contain a 1. In our case, the table only had 4 columns and we got what we expected. But if we'd had 12 columns, then column 11 would have missed out on being 150 pixels wide and horizontally aligned in the center.
Let's take another look at that table [sound of a quick cut & paste going on in the wings]. This time, I've replaced the text with an x in each case, and set the row width to 20. To force the table to have 12 columns, I simply added a line containing (3,12). Here's the result:
And, as you can see (I hope!), the 11th column is narrower than the rest. Amazing. What we should have said was ^[^1]$ to tie the regular expression match to the start and end of the column number. Let that be a lesson to you.
The other lie is this: (shameless self-promotion warning ahead) my input didn't really look anything like the above. The wwwtable structure was the same, but I created the HTML links using my htm4l package. It's another story entirely, but one you might want to check out if you're on a UNIX machine.
This table and this table
y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r
r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y were created with the simple input texts: <wwwtable border=1> <caption>This table</caption> (*, *) width=25 height=25 align=center (1|3|5|7|9, *) bgcolor=yellow y (*, 1|3|5|7|9) bgcolor=red r (9, 9) </wwwtable> <wwwtable border=1> <caption>and this table</caption> (*, *) width=25 height=25 align=center (*, 1|3|5|7|9) bgcolor=red r (1|3|5|7|9, *) bgcolor=yellow y (9, 9) </wwwtable>These two tables illustrate something which (I imagine) is browser dependent. The only difference in their specifications is the order of the regular expression cell lines. When a cell is produced by wwwtable, all the information that applies to it is simply concatenated, in the order it appears in the table specification. If you consider cell (1,1) in the left-hand table, 3 of the 4 cell specification lines apply to it (the one that does not is (9,9) of course). First of all, this cell will get attributes width=25 height=25 align=center from the (*, *) specification. Then it gets attributes bgcolor=yellow and bgcolor=red (and text y and r), in that order, from the next cell regular expression.
As a result of all this, wwwtable ends up outputting the following HTML(?):
<td width=25 height=25 align=center bgcolor=yellow bgcolor=red> y r </td>
Your browser, supposing it allows the bgcolor specifier in a table cell, deals with this somehow. In the case of Netscape 3.0, you get the first color mentioned and so the cell is yellow.
In the right hand table, things are done the other way around, so the cell is red and the text (which I inserted for people whose browsers cannot do table cell colors) is r y.
All of this shows you more-or-less how regular expression cell specifications are handled. You should be careful, because every regular expression that matches a (row,column) pair will be applied to that cell. This can give you unexpected results if you don't watch it. Very general regular expressions should probably be applied with some care, and be used to set things like widths or backgrounds. At some point, I may work on wwwtable to allow you to specify how cells are built when there are many sources of information, but there are some tricky details that it's not clear to me (yet) how to handle.
Relative Addressing
So far, so good. One problem you'll soon run into though is the need to renumber a table's rows and columns. Under the current system, to insert a row or a column, you need to edit all subsequent row and column numbers to correct them. Even HTML isn't that awkward! Something had to be done, and it was.
This table:
This is in position (1,1) And here's (1,2) And (1,3) And (2,1) And (2,2) was generated with the following input:
<div align=center> <wwwtable border=1 cellpadding=5> (=,=) This is in position (1,1) (=,+) And here's (1,2) (,+) And (1,3) (+,-2) And (2,1) (,+) And (2,2) </wwwtable> </div>and it doesn't take much to see what's going on. All cell addresses are relative instead of absolute. Relative to what? you may ask. To the previous cell, and the "previous" cell to the first defined cell is (1,1) by fiat. A row or column specification of = means to use the previous number, +X or -X means to add/subtract X to/from the previous (if X is omitted it defaults to 1). An empty specification is the same as =.
Finally, a specification of $ indicates the last row or column. But it's the last row or column defined so far. You can't (yet) do arithmetic with $, so expressions like $-1 are illegal (and will be silently EATen for reasons I'm not about to divulge).
So you can use the shorthand (,+) to move one column to the right, (+,) to move one row down, (+,1) for newline and carriage return, (1,+) for the top of the next column, and so on.
Just when you were getting ready to edit all those absolute cell numbers, along comes relative addressing to save your day. Here's something resembling that old table of contents again:
Contents Overview Errors Invocation Example 1 Regexps Input Syntax Rowspan & Colspan Tables in Tables Download I made it with the following input:
<div align=center> <wwwtable border=1 cellpadding=10> (*, [^1]) width=150 align=center ((,)) rowspan=3 bgcolor=yellow valign=center Contents (1,+) Overview (+,) Example 1 (+,) Rowspan & Colspan (1,+) Errors (+,) Regexps (+,) Tables in Tables (1,+) Invocation (+,) Input Syntax (+,) Download </wwwtable> </div>It could have been done in many other ways (at least 10!) with relative addressing, I just happened to move top-to-bottom left-to-right as I traversed the cells. And now if I want to add a new column OR row, it's a cinch. Usually a little absolute addressing doesn't hurt either.
Tables in Tables
Wwwtable works as a simple recursive-descent parser, so you can have wwwtables in your wwwtables in your wwwtables...
For example, look at this wacky table:
A sub-caption. Here is the caption of a fabulous table
The big hill:
Albert BarstardusSTOP
* Hi Mark V. Shaney
Talking HeadNot an empty cell.
cell 1,1 sub-text Down here! The main table has subtables in cells (1,3) and (2,3). The input looks like this:
<wwwtable border=1 width="85%"> <caption align=bottom> <strong>Here is the caption of a fabulous table</strong> </caption> (*, 1) bgcolor=white (*, 2) align=left (2,1) bgcolor=red valign=bottom Mark V. Shaney<br> Talking Head (,+) bgcolor=green align=center valign=top Not an empty cell. (,+) bgcolor=pink align=right <wwwtable cellpadding=10 border=3 bgcolor=yellow> <caption align=top> A sub-caption. </caption> (1, *) valign=middle (1,1) bgcolor=orange cell 1,1 sub-text (+,+) Down here! </wwwtable> (1,1) valign=center height=200 align=center The big hill:<br> Albert Barstardus (,+) bgcolor=orchid align=right width=350 STOP (,+) <wwwtable border=1 bgcolor=blue> (*, 4) bgcolor=red (*, 6) bgcolor=white (3, *) bgcolor=green height=50 (4, 5) colspan=3 rowspan=3 bgcolor=yellow align=middle valign=middle width=30 * (10,10) bgcolor=orange Hi </wwwtable> </wwwtable>That's a lot of table for not much work. If you had to type this in yourself, and get it right, it might take a while. The HTML produced by wwwtable for this example is nearly 8K.
As another example of a table in a table, the two tables in the previous section and the code I used to produce them, and the line that says "were created with the simple input texts:" were actually all part of one larger table. OK, so I lied to you again.
Here's what that section actually looks like, this time with a border around it:
This table and this table
y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r r r r r r y r y y r y y r y y r y y r
r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y r r r r r r y y r y y r y y r y y r y were created with the simple input texts: <wwwtable border=1> <caption>This table</caption> (*, *) width=25 height=25 align=center (1|3|5|7|9, *) bgcolor=yellow y (*, 1|3|5|7|9) bgcolor=red r (9, 9) </wwwtable> <wwwtable border=1> <caption>and this table</caption> (*, *) width=25 height=25 align=center (*, 1|3|5|7|9) bgcolor=red r (1|3|5|7|9, *) bgcolor=yellow y (9, 9) </wwwtable>And, you guessed it, here's the source for the whole thing:
<div align=center> <wwwtable border=1> (1,1) align=center <wwwtable border=1> <caption>This table</caption> (*, *) width=25 height=25 align=center (1|3|5|7|9, *) bgcolor=yellow y (*, 1|3|5|7|9) bgcolor=red r (9, 9) </wwwtable> (1,2) width=50 (1,3) align=center <wwwtable border=1> <caption>and this table</caption> (*, *) width=25 height=25 align=center (*, 1|3|5|7|9) bgcolor=red r (1|3|5|7|9, *) bgcolor=yellow y (9, 9) </wwwtable> (2,1) colspan=3 height=50 valign=center align=center were created with the simple input texts: (3,1) <pre> <wwwtable border=1> <caption>This table</caption> (*, *) width=25 height=25 align=center (1|3|5|7|9, *) bgcolor=yellow y (*, 1|3|5|7|9) bgcolor=red r (9, 9) </wwwtable> </pre> (3,3) <pre> <wwwtable border=1> <caption>and this table</caption> (*, *) width=25 height=25 align=center (*, 1|3|5|7|9) bgcolor=red r (1|3|5|7|9, *) bgcolor=yellow y (9, 9) </wwwtable> </pre> </wwwtable> </div>If you can be bothered reading all this, you'll see the table within a table in cells (1,1) and (1,3). Cell (1,2) is just an empty cell with width=50 to give a little separation between the outer cells.
But wait, what's all that ( crud doing there? It's there because I wanted to include lines in a table that looked like directives to wwwtable. If I had simply written (1|3|5|7|9, *) bgcolor=yellow in cell (3,3), wwwtable would have interpreted that as an attempt to define a new cell. Instead, I wanted to show some input to wwwtable, without having it interpreted along the way. So I used the HTML character code for ( which I happen to know is (. You can always prevent wwwtable from recognizing an input line as a cell definition by changing the leading parenthesis in this way. It's one of life's little annoyances, I agree. I could get around it by making the cell definition lines syntactically more obscure so the problem would arise less frequently, but you'd still run into the problem when you tried to put wwwtable input into a wwwtable cell. At some point I may add something to tell wwwtable to stop interpreting lines temporarily, but for now you'll have to live with it. In any case, if I add that speciality, what happens when you want to include IT in a table? And on and on.
Even more depresssing is that every time I add another layer of special case input processing, I have to document it here and go even one layer further so you can see it. Try view source on the above wwwtable text and you'll see what I mean.
Phew. Enough of that.
Invoking Wwwtable
If you are running on a UNIX machine, you will probably invoke wwwtable in a pipeline that eventually produces raw html. So you'll use it something like this:
cat file.in | ... | wwwtable [options] | ... > file.html Wwwtable currently accepts the following command line options:
Option Effect -b Turn off the insertion of a <br> tag in otherwise empty table cells. I think (bordered) tables look much better when empty cells have a <br> in them (this makes them have the lowered 3D appearance of the other cells). If you use this option, the cell will appear raised, like the border is. You wont see any difference in tables with no border. Try it and see what you think. -c Turn on cell commenting. With this option, each table cell produced by wwwtable is preceeded by an HTML comment that indicates its location in the table. These look like
<!-- Cell (3,4) --> and are useful if you're looking at the output of wwwtable to try and figure out what it's doing.
-t Turn on table transposition. With this truly silly option, all your widths become heights, your heights become widths, rows become columns, columns become rows, rowspans become colspans and colspans rowspans. Some of your tables may actually look better too! For a sample, here is a transposed version of this page.
-w Turn off read-only warning messages. Two kinds of messages will appear. The first is placed at the beginning of the output and warns the reader not to edit the file as it was produced automatically and that changes will therefore be lost the next time wwwtable is run. The second warning is of a similar nature, but it preceeds each top-level table generated by wwwtable. Try view source on this page to see what these messages look like. You might want to turn these warnings off if you simply use wwwtable as a one-off generator to make HTML that you include in a page, rather than as a filter in an automated pipeline that creates HTML.
Input Syntax
Wwwtable should be fairly robust with input syntax. You can use upper or lower case (where there's a choice) and should be able to insert whitespace just about anywhere.
Here's something approximating a BNF grammar for a wwwtable. Nonterminals are in uppercase, arbitrary text that you can enter is mentioned in italics, and a digit from 0-9 is given by \d.
WWWTABLE = START INITIAL_TEXT CELLS END START = <wwwtable TABLE_OPTS > TABLE_OPTS = EMPTY | your_text_1 INITIAL_TEXT = your_text_2 CELLS = EMPTY | CELL CELLS CELL = TH_CELL | TD_CELL TH_CELL = (( CELL_NUMBER , CELL_NUMBER )) CELL_OPTIONS \n CELL_CONTENT TD_CELL = ( CELL_REGEXP , CELL_REGEXP ) CELL_OPTIONS \n CELL_CONTENT CELL_NUMBER = \d+ CELL_REGEXP = [.*?[]|\d^$-+=]* CELL_OPTIONS = EMPTY | your_text_3 CELL_CONTENT = EMPTY | your_text_4 END = </wwwtable>The above isn't really a grammar, but you probably get the idea. In particular, CELL_REGEXP isn't BNF-ish at all and is in fact handled poorly by wwwtable (which will, for example, think that something like *[[[^^243 is valid as a cell-specifying regexp). Can someone give me a regexp for recognizing a valid regexp (including itself, of course :-))? Or, even simpler, a regexp that recognizes regexps that are supposed to match numbers?
The upshot of all this, is that you'll wind up with HTML that looks like this:
<table your_text_1> your_text_2 <tr> <td your_text_3> your_text_4 </td> </tr> </table>Which should give you an idea of where your text will wind up.
Notice that you cannot (yet) use (( CELL_REGEXP, CELL_REGEXP )) to make a regular expression worth of cells be header cells. You have to give a specific cell (e.g., ((4,5))). I will fix this at some point, when I decide how to deal with conflicts (like the ones I am about to describe).
In the special case where you use the regular expression * (or, more correctly, .*) to indicate a set of columns and give an exact row number, wwwtable will put the CELL_OPTIONS (if any) into the appropriate <tr> tag. For example, if you give (4, *) align=left then when wwwtable generates the 4th row, it will do so using <tr align=left> rather than by putting align=left into each cell specification in the row. This is an attempt to be nice to the browser that will end up displaying your table.
Unfortunately, this can lead to slight problems because the row specification can then be overridden by specifications given to individual cells within the row.
All of which leads me to say that there are some issues with precedence that I have not yet resolved. The problem is that information about what to put in a cell and what properties the cell should have can come from many places. For example, the alignment of text within cell (4,5) might come from a <tr> tag for the 4th row, from the individual cell's (4,5) specification, or from any number of regular expressions that might match this cell. It is hard to know what to do about this.
Should I take the most specific information only (but then how to decide which regexp of several is the most specific)? Should I not allow information to come from two places at once (and thereby rule out convenient things like (*,*) width=100)? Should I set up some inflexible default ordering of what specifications (and text) will be applied to each cell? Should I look at the content of the specifications and disallow conflicts? And there's more. More than I can handle cleanly right now. I've already spent more time wondering about this than it's probably going to be worth.
The solution at the moment is to include ALL cell specifications and text. Everything that matches a cell's (row,column) numbers will be appended to it. The order that this is done in is
Plus regexps like (4,*) align=baseline are applied to the whole row in the <tr> tag, as mentioned above.
- cell-specific information (e.g., (4,1) align=center)
- column-specific information (e.g., (*,1) align=left)
- other regexp information (e.g., (*,*) align=right)
In the above example for cell (4,1), we'd wind up with HTML output that looked like this
<tr align=baseline> <!-- This is row 4 --> <td> align=center align=left align=right</td>Which, to my mind, is far too confusing for the average bear. When I get some feedback on this sort of thing I imagine I'll rewrite that part of wwwtable to deal with all these a bit more gracefully. A good start would be to stop special-casing row specifications like (4,*) and putting them into <tr> tags. Another option would be to only include the information given by the first matching regexp.
Anyway, this is probably more than you wanted to know. I'm open to suggestions on how to clean this up (but still allow people to do most of what they might want to do). In the meantime, you should be careful about using too many regexps at once, particularly where row and column regexps will provide conflicting information.
Getting Wwwtable
The conditions for getting hold of wwwtable are those same as for my htm4l package (and for the same reasons).
Non-Commercial Use. If you will use wwwtable for non-commercial purposes, you are free to download wwwtable for no charge and with no obligation.
Commercial Use. If you produce html for your company, or if you are a commercial web page designer, you do not fit this category and I ask that, within a month of downloading, you pay a one-time fee of US$25.
This fee licenses a single individual to produce HTML using wwwtable for commercial purposes. If you wish to acquire a license for many people at a site or working on a project, the license price, in US$, will be:
min(15x + 20, 1000) where x > 1 is the number of people you wish to register. You can pay for your copy by sending to me, or transferring to my bank account, US$25, or equivalent (see the bank deposit details for how to do this).
To do
Here is a list of the things I will soon add:
- Allow command line options to also be specified in the definition of a table. This will allow tables to be processed differently. It's not too important right now since none of the options does terribly much.
Questions & Comments
If you have questions or comments about wwwtable, I'd be happy to hear them. Feel free to mail me. If you make improvements or produce any wonderful tables, let me know, I'd like to check them out.
If you really want to test the robustness of your browser, try going to my page with the 800K HTML output of this simple wwwtable:
<wwwtable border=1> (200,200) bgcolor=yellow Hi! </wwwtable>Don't click on this link if you expect to be able to continue using your machine in the next while.
This table has been downloaded 67 times in the last 45 days (it is now Oct 10, 96). I have had only one report of success, from someone using the ANT Fresco browser.
I had to kill X windows to stop my netscape 3.0 chewing on it (running under Linux with 32MB on a pentium 100). I have had Microsoft's Internet Explorer 3.0 spinning the disk on another machine for over 2 hours (8MB, pentium 75), and I though I can still move the mouse around, not much happens when I click. Above all don't say I didn't warn you.
Here it is: the super-duper 200x200 table page. You may wind up wishing you still used Lynx. In any case, have fun.