Why Does Firebug Add ≪Tbody≫ to ≪Table≫

Why do browsers insert tbody element into table elements?

http://htmlhelp.com/reference/html40/tables/tbody.html:

The TBODY element defines a group of data rows in a table. A TABLE must have one or more TBODY elements, which must follow the optional TFOOT. The TBODY end tag is always optional. The start tag is optional when the table contains only one TBODY and no THEAD or TFOOT.

So there always is a tbody there (albeit sometimes with both the start and end tags optional and omitted), and the tools you are using are correct in showing it to you.

thead or tfoot, on the other hand, are never present unless you explicitly include them, and if you do that, the tbody(s) must be explicit too.

Tbody tag in xpath produced by fire bug

In order to take into account and avoid this problem, use XPath expressions of the following kind:

 /locStep1/locStep2/.../table/YourSubExpression
|
/locStep1/locStep2/.../table/tbody/YourSubExpression

If the table doesn't have a tbody child, then the second argument of the union operator (|) selects no nodes and the first argument of the union selects the wanted nodes.

Alternatively, if the table has a tbody child, then the first argument of the union operator selects no nodes and the second argument of the union selects the wanted nodes.

The end result: in both cases the wanted nodes are selected

Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?



The Problem: DOM Requires <tbody/> Tags

Firebug, Chrome's Developer Tool, XPath functions in JavaScript and others work on the DOM, not the basic HTML source code.

The DOM for HTML requires that all table rows not contained in a table header of footer (<thead/>, <tfoot/>) are included in table body tags <tbody/>. Thus, browsers add this tag if it's missing while parsing (X)HTML. For example, Microsoft's DOM documentation says

The tbody element is exposed for all tables, even if the table does not explicitly define a tbody element.

There is an in-depth explanation in another answer on stackoverflow.

On the other hand, HTML does not necessarily require that tag to be used:

The TBODY start tag is always required except when the table contains only one table body and no table head or foot sections.

Most XPath Processors Work on raw XML

Excluding JavaScript, most XPath processors work on raw XML, not the DOM, thus do not add <tbody/> tags. Also HTML parser libraries like tag-soup and htmltidy only output XHTML, not "DOM-HTML".

This is a common problem posted on Stackoverflow for PHP, Ruby, Python, Java, C#, Google Docs (Spreadsheets) and lots of others. Selenium runs inside the browser and works on the DOM -- so it is not affected!

Reproducing the Issue

Compare the source shown by Firebug (or Chrome's Dev Tools) with the one you get by right-clicking and selecting "Show Page Source" (or whatever it's called in your browsers) -- or by using curl http://your.example.org on the command line. Latter will probably not contain any <tbody/> elements (they're rarely used), Firebug will always show them.


Solution 1: Remove /tbody Axis Step

Check if the table you're stuck at really does not contain a <tbody/> element (see last paragraph). If it does, you've probably got another kind of problem.

Now remove the /tbody axis step, so your query will look like

//table[@id="example"]/tr[2]/td[1]

Solution 2: Skip <tbody/> Tags

This is a rather dirty solution and likely to fail for nested tables (can jump into inner tables). I would only recommend to to this in very rare cases.

Replace the /tbody axis step by a descendant-or-self step:

//table[@id="example"]//tr[2]/td[1]

Solution 3: Allow Both Input With and Without <tbody/> Tags

If you're not sure in advance that your table or use the query in both "HTML source" and DOM context; and don't want/cannot use the hack from solution 2, provide an alternative query (for XPath 1.0) or use an "optional" axis step (XPath 2.0 and higher).

  • XPath 1.0:

    //table[@id="example"]/tr[2]/td[1] | //table[@id="example"]/tbody/tr[2]/td[1]
  • XPath 2.0: //table[@id="example"]/(tbody, .)/tr[2]/td[1]

Why do browsers still inject tbody in HTML5?

The answer of "backwards compatiblity" makes absolutely zero sense
because I specifically opted in for a HTML5 doctype.

However, browsers don't differentiate between versions of HTML. HTML documents with HTML5 doctype and with HTML4 doctype (with the small exception of HTML4 transitional doctype without URL in FPI) are parsed and rendered the same way.

I'll quote the relevant part of HTML5 parser description:

8.2.5.4.9 The "in table" insertion mode

...

A start tag whose tag name is one of: "td", "th", "tr"

Act as if a start tag token with the tag name "tbody" had been
seen, then reprocess the current token.

Browser sometimes adds in tbody

When creating a table element with JavaScript, you need to insert the tbody element in order to match the structure of the table element as created by the HTML parser. The HTML markup need not have tbody tags, but the tbody element is there.

So you simply need to modify the JavaScript code so that after creating the table element, it creates a tbody element, makes it a child of the table, and later makes all tr elements children of the tbody, not the table.

Trying to minimize the table height (TABLE, TBODY and offset)

If I understood your question correctly, it looks like you want to do this on all your table elements:

padding:0px
border-collapse:collapse;

http://www.w3schools.com/Css/pr_tab_border-collapse.asp

In general using a good reset.css should help you.

Colgroup tag in code but missing from view source or debugger tool

The following question Table caption does not show when it is runat=server explains that

A complex table model is not supported. You cannot have an HtmlTable control with nested caption, col, colgroup, tbody, thead, or tfoot elements. These elements are removed without warning and do not appear in the output HTML. MSDN

When I create the following HTML

<table border="1">
<colgroup>
<col span="2" style="background-color:orange"></col>
</colgroup>
<tr>
<td>column 1</td>
<td>column 2</td>
<td>column 3</td>
</tr>
</table>

The colgroup/col tags are still there

My best guess is that since your table tag is being with runat="server", the C# parser must be removing it. You can prove that by looking at the actual HTML source that is sent to the client, that is use "view source" instead of looking at the generated DOM.

One of the reasons I can't stand this mix of server and client code writing my HTML for me....

Creating tables with jQuery without tbody

Why Would you try to remove tbody. your browser is trying to add the part of valid html, you are missing. Is it not nice?

jquery says tbody.length = 1 even though no tbody tag is present

tbody adds automatically in table, you could see that tbody is there by right clicking and view component (chrome developer tools)

<table id="test">
<tbody>
<tr>
<td>some table cell</td>
</tr>
</tbody>
</table>


Related Topics



Leave a reply



Submit