Parse an HTML string with JS
Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements
Edit: adding a jQuery answer to please the fans!
var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");
$('a', el) // All the anchor elements
Parsing an html tag with JavaScript
Parse the HTML and get value from the tag.
There are 2 methods :
- Using DOMParser :
var x="<a>1234</a>";
var parser = new DOMParser();var doc = parser.parseFromString(x, "text/html");
console.log(doc.querySelector('a').innerHTML)
Parse HTML as a plain text via JavaScript
A pretty elegant solution is to use DOMParser.
const parser = new DOMParser()
const virtualDoc = parser.parseFromString(htmlString, 'text/html')
Then, treat virtualDoc
like you'd treat any DOM-element,
virtualDoc.getElementById('someid').value
How to parse html tag properties using bash and sed
Use sed:
sed -e 's/.*NAME-\([^"]*\)" value="\([^"]*\)".*/\1\t\2/' -e 's/\.//g' INPUT.HTML
.*
any character zero or more times[^"]*
any character but"
repeated 0 or more times\(...\)
captures the enclosed part, here the above substring up to the double quote is remembered in\1
and the value is remembered in\2
s/PATTERN/REPLACEMENT/
substitutes the pattern with the replacement; here, it extracts the part after NAME- and the value and replaces the whole line with just the two captured parts separated by a tab (\t
)s/\.//g
deletes all dots (the/g
means "global", i.e. all of them)
Is there a way to parse html tags with Python?
You can use a recursive generator function with BeautifulSoup
:
import bs4
from bs4 import BeautifulSoup as soup
s = """
<div class="title">
<h1> Hello World </h1>
</div>
"""
def get_tags(d):
ats = " ".join(a+"="+f'"{(b if not isinstance(b, list) else " ".join(b))}"' for a, b in d.attrs.items())
h = f'<{d.name} {ats}>' if ats else f'<{d.name}>'
if (k:=[i for i in d.contents if isinstance(i, bs4.element.Tag)]):
yield h
yield from [j for l in k for j in get_tags(l)]
yield f'</{d.name}>'
else:
yield f'{h}{d.text}</{d.name}>'
print(list(get_tags(soup(s, 'html.parser').contents[1])))
Output:
['<div class="title">', '<h1> Hello World </h1>', '</div>']
How to parse text with html tags into plain text?
It is because tags in the form of <\/foo>
are not HTML tags.
HTML tags have the form of <foo>
as start tag and </foo>
as end tag.
Jsoup as being a HTML parser only parses HTML tags. So, Jsoup only parses <foo>
and </foo>
.
You have 2 options:
- Find and replace e.g.
<\/
by</
in your HTML string before feeding it to Jsoup. - Or, if this HTML string does not come from a hardcoded source, then simply figure out why this HTML string is corrupted like this and fix over there. I.e., step back and figure out how exactly this HTML string was entered, modified and/or saved in the place from where you obtained it. For example, if that process is unintentionally changing
</foo>
to<\/foo>
because of e.g. a badly written HTML sanitizer, then that process obviously needs to be fixed accordingly. Or, if the end-user is already specifying literally<\/foo>
instead of</foo>
during the user input entry, then you need to block the user input entry with a validation error and let the end-user fix it.
NSAttributedString: how to parse html tags and add attributes
Ok, I got the solution, maybe it will help anyone else.
Note, that my strings are strictly multi lined, so it's easy first split them, then add needed font and size to the parts, and then add paragraph styling.
I've played with order of parsing tags/styling fonts/styling paragraph, and at every case something was missed. If you don't need to separate line as multiline in strict order, just don't do mapping. Otherwise, you can miss breaking while styling or parsing tags.
descriptionLabel.attributedText = getAttributedDescriptionText(for: "Register and get <b>all</b>\n<b>rewards cards</b> of our partners\n<b>in one</b> universal <b>card</b>", fontDescription: "ProximaNova-Regular", fontSize: 15)
func getAttributedDescriptionText(for descriptionString: String, fontDescription: String, fontSize: Int) -> NSAttributedString? {
let paragraphStyle = NSMutableParagraphStyle()
paragraphStyle.lineSpacing = 1.0
paragraphStyle.alignment = .center
paragraphStyle.minimumLineHeight = 18.0
let attributedString = NSMutableAttributedString()
let splits = descriptionString.components(separatedBy: "\n")
_ = splits.map { string in
let modifiedFont = String(format:"<span style=\"font-family: '\(fontDescription)'; font-size: \(fontSize)\">%@</span>", string)
let data = modifiedFont.data(using: String.Encoding.unicode, allowLossyConversion: true)
let attr = try? NSMutableAttributedString(
data: data ?? Data(),
options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
],
documentAttributes: nil
)
attributedString.append(attr ?? NSMutableAttributedString())
if string != splits.last {
attributedString.append(NSAttributedString(string: "\n"))
}
}
attributedString.addAttribute(.paragraphStyle, value: paragraphStyle, range: NSRange(location: 0, length: attributedString.length))
return attributedString
}
Related Topics
Maximum Number of Parameter Passed in a Post
How to Highlight Source Code in HTML
Equal Height Columns with Centered Content in Flexbox
CSS Attribute Selector for Input Type="Button" Not Working on Ie7
How to Render a String with Jsx in React
How to Reuse HTML Like a Template on Multiple Pages
Why Is The Default Max Length for an Input 524288
Playing Audio After The Page Loads in HTML
What's The Maximum Number of Simultaneous Connections a Browser Will Make
Video HTML5: How to Display Thumbnail from Video on a Specific Time
The Ajax Response: Data (JSON, Xml) or HTML Snippet
Why Are 3-Digit Hex Color Code Values Interpreted Differently in Internet Explorer
Delete HTML Tags in Sed or Similar
Why Does Angularjs Ng-View Not Work Locally
Onserverclick Event Handler Not Called If Using Onclick
Angular 2 Use a "Template" for Ng-Content to Use Inside Component Loop