How Do HTML Parses Work If They'Re Not Using Regexp

How do HTML parses work if they're not using regexp?

Usually by using a tokeniser. The draft HTML5 specification has an extensive algorithm for handling "real world HTML".

Using regular expressions to parse HTML: why not?

Entire HTML parsing is not possible with regular expressions, since it depends on matching the opening and the closing tag which is not possible with regexps.

Regular expressions can only match regular languages but HTML is a context-free language and not a regular language (As @StefanPochmann pointed out, regular languages are also context-free, so context-free doesn't necessarily mean not regular). The only thing you can do with regexps on HTML is heuristics but that will not work on every condition. It should be possible to present a HTML file that will be matched wrongly by any regular expression.

RegEx match open tags except XHTML self-contained tags