Parse an HTML String With Js

Parse an HTML string with JS

Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";

el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements

Edit: adding a jQuery answer to please the fans!

var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");

$('a', el) // All the anchor elements

Parse HTML as a plain text via JavaScript

A pretty elegant solution is to use DOMParser.

const parser = new DOMParser()
const virtualDoc = parser.parseFromString(htmlString, 'text/html')

Then, treat virtualDoc like you'd treat any DOM-element,

virtualDoc.getElementById('someid').value

javascript, parse an html string and recognize parts

You can use DOMParser to parse the string into HTML, and once you create an instance and pass the string to it, you can get the generated document childNodes then iterate through them using .forEach(), note how I check the nodes we are iterating through for #text as a name for the node, since this check is for text nodes and not actual HTML tags:


let domparser = new DOMParser();let doc = domparser.parseFromString('this is a string with <em>html</em> values', 'text/html');
doc.body.childNodes.forEach(function(node) { if (node.nodeName === "#text") { console.log(node.nodeValue); } else { console.log(node); }});

Parse HTML String to DOM and convert it back to string

DOMParser will always give you a document in return. Documents don't have an innerHTML property, but the document.documentElement does, just like in a page's normal document object:

const myHtmlString = '<p><span class="text">Hello World!</span></p>'const htmlDom = new DOMParser().parseFromString(myHtmlString, 'text/html');console.log(htmlDom.documentElement.innerHTML);

Parse Html string to specific Array without using DOM parser

Not sure if this is the most efficient way to do this but here is what I managed to do.

let data = "<p>Size: 5 cm</p><p>Weight: 30 g</p><p>Allows you to collect your hair easily.</p><p><br />Holds your hair, does not come out.</p><p>No more fussing with rubber buckles.</p>";

// regex to match content between the tags
const regex = /(?<=\>)(.*?)(?=\<)/g;

// found matches stored in array
let found = data.match(regex);

// final result will be stored here
let newData = {};

// removes empty strings
found = found.filter(item => item);

// check if index contains ":" then splits it and stores in a dictionary
for(let i=0; i<found.length; i++){
if(found[i].includes(":")){
let temp = found[i].split(':');
newData[temp[0].trimStart()] = temp[1].trimStart();
}
}

console.log(newData);

Parse HTML String to Array

Use textContent to get the text out of an element. The word is in the strong child element, the definition is the rest of the text.

var parser = new DomParser();
var parsedHtml = parser.parseFromString(data, "text/html");
let pTags = parsedHtml.getElementsByTagName("p");
let vocab = []
pTags.forEach(function(item){
let word = item.getElementsByTagName("strong")[0].textContent.trim();
let allText = item.textContent;
let definition = allText.replace(word, "").trim();
vocab.push({word: word, definition: definition})
});

Parse a HTML string into a document in JScript ES3

getElementById and getElementsByTagName seem to work:

var document = new ActiveXObject('htmlfile');
document.open();
document.write('<html><div id="div1" class="class1">test</div></html>');
document.close();

WScript.Echo(document.getElementById("div1").id);

WScript.Echo(document.getElementsByTagName("div")[0].id);

WScript.Echo(document.getElementsByTagName("div")[0].className);

test output



Related Topics



Leave a reply



Submit