Regex Matching Table Rows in HTML

Regular expression to pick a row in an html table containing desired text

I suggest you split up the string with strsplit and use contains for the filtering, which is a lot more readable and maintainable than a regex pattern:

htmlString = ['<tr><td><a href="blu">blu</a></td><td>value</td></tr><tr><td><a',...
'href="bla">findme</a></td><td>value</td></tr><tr><td><a',...
'href="ble">ble</a></td><td>value</td></tr>'];

keyword = 'findme';
splitStrings = strsplit(htmlString,'<tr>');
desiredRow = ['<tr>' splitStrings{contains(splitStrings,keyword)}]

The output is:

<tr><td><ahref="bla">findme</a></td><td>value</td></tr>

Alternatively you may also combine extractBetween and contains:

allRows = extractBetween(htmlString,'<tr>','</tr>');
desiredRow = ['<tr>' allRows{contains(allRows,keyword)} '</tr>']

If you must use regex:

regexp(htmlString,['<tr><td>[^>]+>' keyword '.*?<\/tr>'],'match')

Regex - cant get HTML table rows from string created from file, but works from string in code

Try to use

<tr>((.|\n)*)</tr>

or

<tr>((.|\n|\r)*)</tr>

'.' matches only one-line character

Regex match just first result or second from HTML table

You could do a regular expression something like this which retrieves the text you expect...

regular expression run results

Essentially matching the tags by excluding <> characters within the tag.

However I think an easier and less error prone way to do this in JavaScript and DOM is to use querySelectorAll instead, for example...

let c = document.querySelectorAll('td:nth-of-type(2)');

console.log(Array.from(c).map(i => i.innerText));
   <table>
<td class="bar">
<td class="center">hello1</td>
<td class="center">hello2</td>
<td class="center">hello3</td>
</td>
</table>

<table>
<td class="bar">
<td class="center">hello1</td>
<td class="center">hello2</td>
<td class="center">hello3</td>
</td>
</table>

I want to color HTML table rows based on regex and condition

The issue is because the firstCol variable contains an entire jQuery object. From the context of your question it appears that you expect this to be the innerText of the td you select. In which case you need to use text().

In addition you need to use the this keyword within the each() loop to refer to the current tr only. Your current logic would add the class to every tr in the DOM.

$(function() {
$("tr").each(function() {
var firstCol = $(this).find("td:first").text();
if (firstCol.match(/^[0-9]*[0369]$/)) {
$(this).addClass('row1');
} else if (firstCol.match(/^[0-9]*[147]$/)) {
$(this).addClass('row2');
} else if (firstCol.match(/^[0-9]*[258]$/)) {
$(this).addClass('row3');
}
});
});
.row1 { background: #adcbe3; }
.row2 { background: #e7eff6; }
.row3 { background: #d0e1f9; }
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<table>
<tr><td>23433</td></tr>
<tr><td>23434</td></tr>
<tr><td>23435</td></tr>
<tr><td>23436</td></tr>
<tr><td>23444</td></tr>
</table>

Regex pattern to get HTML table information


Rather than using RegExp to parse the HTML file, use a DOM parser.

The most straightforward way to do it is to add a reference to Microsoft HTML Object Library and use it. Getting to know the objects can be a little tricky, but not as tricky as trying to handle HTML using regular expressions!

The key is determining what rule you want to use to extract the value.

Here's an example that (hopefully) demonstrates the technique.

Public Sub SimpleParser()
Dim doc As MSHTML.HTMLDocument
Dim b As MSHTML.HTMLBody
Dim tr As MSHTML.HTMLTableRow, td As MSHTML.HTMLTableCell
Dim columnNumber As Long, rowNumber As Long
Dim trCells As MSHTML.IHTMLElementCollection
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = "<table><tr style='mso-yfti-irow:8' id=""row_65""> <td width=170 valign=top style='width:127.5pt;background:white; padding:3.75pt 3.75pt 3.75pt 3.75pt' id=""question_65""> <p class=MsoNormal><span style='mso-fareast-font-family:""Times New Roman""'>Shipment's weight<o:p></o:p></span></p> </td> <td style='background:white;padding:3.75pt 3.75pt 3.75pt 3.75pt' id=""value_65""> <p class=MsoNormal><span style='mso-fareast-font-family:""Times New Roman""'>40120<o:p></o:p></span></p> </td> </tr> <tr style='mso-yfti-irow:9' id=""row_116""> <td width=170 valign=top style='width:127.5pt;background:#F3F3F3; padding:3.75pt 3.75pt 3.75pt 3.75pt' id=""question_116""> <p class=MsoNormal><span style='mso-fareast-font-family:""Times New Roman""'>KG or LBS<o:p></o:p></span></p> </td> <td style='background:#F3F3F3;padding:3.75pt 3.75pt 3.75pt 3.75pt' id=""value_116""> <p class=MsoNormal><span style='mso-fareast-font-family:""Times New Roman""'>LBS<o:p></o:p></span></p> </td> </tr></table>"
Set b = doc.body
'Example of looping through elements
For Each tr In b.getElementsByTagName("tr")
rowNumber = rowNumber + 1
columnNumber = 0
For Each td In tr.getElementsByTagName("td")
columnNumber = columnNumber + 1
Debug.Print rowNumber & "," & columnNumber, td.innerText
Next
Next
'Go through each row; if the first cell is "Shipment's weight", display the next cell.
For Each tr In b.getElementsByTagName("tr")
Set trCells = tr.getElementsByTagName("td")
If trCells.Item(0).innerText = "Shipment's weight" Then Debug.Print "Weight: " & trCells.Item(1).innerText
Next

End Sub

Regex match table row of table have multi row

Regexes are generally greedy, so it will match as much as possible before progressing to the </tr>portion of the regex. You can match reluctantly by adding a ?, like:

<tr(?: class="odd")>[^]+?</tr>

That said, I absolutely agree with others that jQuery (among other tools) is almost certainly the better solution.



Related Topics



Leave a reply



Submit