Find All Text Nodes in HTML Page

Find all text nodes in HTML page

Based on @kennebec's answer, a slightly tighter implementation of the same logic:

function textNodesUnder(node){
  var all = [];
  for (node=node.firstChild;node;node=node.nextSibling){
    if (node.nodeType==3) all.push(node);
    else all = all.concat(textNodesUnder(node));
  }
  return all;
}

However, far faster, tighter, and more elegant is using createTreeWalker so that the browser filters out everything but the text nodes for you:

function textNodesUnder(el){
  var n, a=[], walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
  while(n=walk.nextNode()) a.push(n);
  return a;
}

How to select all text nodes after specific element

You can get the content and use split with hr to get the html after the hr and then replace this content within a div and you will be able to manipulate this div to get your content:

var content = document.querySelector('.someclass').innerHTML;content = content.split('<hr>');content = content[1];
document.querySelector('.hide').innerHTML = content;/**/
var nodes = document.querySelector('.hide').childNodes;for (var i = 0; i < nodes.length; i++) {  console.log(nodes[i].textContent);}

.hide {  display: none;}

<div class="someclass">  <h3>First</h3>  <strong>Second</strong>  <hr> Third  <br> Fourth  <br>  <em></em> ...</div><div class="hide"></div>

Find all text nodes

Change this...

element.nodeValue != ''

to this...

/\S/.test(element.nodeValue)

This uses the /\S/ regex, which searches for at least one non-space character.

You may need to define further what you mean by "words". I took it to mean that you're only excluding whitespace-only nodes.

In browsers that support String.prototype.trim, this would be an alternative...

element.nodeValue.trim() != ''

How can I get array of #text nodes of the html tree

You can recursively scan through the nodes and push the text nodes into an array.

const textNodes = []

function pushTextNode(node) {
  if (node.nodeName === "#text") {
    const nodeVal = node.nodeValue.trim();
    if (nodeVal) {
      textNodes.push(nodeVal);
    }
    return;
  }
  node.childNodes.forEach((childNode) => {
    pushTextNode(childNode)
  });
}

pushTextNode(document.querySelector("#root"));
console.log(textNodes);

<div id="root">
  <span>
    0
    <b>
      12<u>3</u>
    </b>
    <u>
      4<b>5</b>
    </u>
    <b>67</b>8<a href="#">9</a>
  </span>
</div>

How can I get only the content of text nodes (not nested tags) from an HTML tag element in JS?

You have a couple of options:

bob is in a Text node in the div. You can't select a Text node directly, but you can access it via the childNodes on its parent (or nextSibling on the span in front of it, etc.):

const div = document.getElementById("mydiv");
console.log("`nodeValue` of each text node in the div:");
for (const child of div.childNodes) {
    if (child.nodeType === Node.TEXT_NODE) {
        console.log(child.nodeValue);
    }
}

<div id="mydiv">
     <span>foo</span>
     <span>bar</span>
     bob
</div>

JavaScript find all text nodes and return in a string

In newer browsers you can use textContent for this.

alert(document.getElementById('divText').textContent);

In older browsers you need to walk through the DOM with .childNodes and test, if a node has nodeType === 3 (then it's a text node) or nodeType === 1 (it's an element you need to traverse recursively, too).

You will also need this last solution, when you need to filter whitespace-only nodes, like the line breaks between tags.

Find text (child text nodes) and wrap it in a paragraph

You can use replaceWith() to replace the text node with the paragraph element

function filterTextNode(node) {  var textNodes = [];  for (node = node.firstChild; node; node = node.nextSibling) {    if (node.nodeType == 3 && node.textContent.trim()) textNodes.push(node);    else textNodes = textNodes.concat(filterTextNode(node));  }  return textNodes;}
function wrapTextNode(text) {  var p = document.createElement('p');  p.innerHTML = text.textContent;  text.replaceWith(p);}
const fn = (doc) => {  var textNodes = filterTextNode(doc);  textNodes.forEach(text => {    if (text.parentNode.tagName != 'P') {      wrapTextNode(text);    }  });}
fn(document.body)

p {  color: red;}

Text1<div>  <div>    Text2    <div>Text3</div>  </div>  <div>Text4</div>  <p>Text5</p></div><div>Text6</div><div>  Text7  <p>Text8</p>  Text9</div>

How to get the text node of an element?

var text = $(".title").contents().filter(function() {
  return this.nodeType == Node.TEXT_NODE;
}).text();

This gets the contents of the selected element, and applies a filter function to it. The filter function returns only text nodes (i.e. those nodes with nodeType == Node.TEXT_NODE).

How can I find all text nodes between two element nodes with JavaScript/jQuery?

The following works in all major browsers using DOM methods and no library. It also ignores whitespace text nodes as mentioned in the question.

Obligatory jsfiddle: http://jsfiddle.net/timdown/a2Fm6/

function getTextNodesBetween(rootNode, startNode, endNode) {
    var pastStartNode = false, reachedEndNode = false, textNodes = [];

    function getTextNodes(node) {
        if (node == startNode) {
            pastStartNode = true;
        } else if (node == endNode) {
            reachedEndNode = true;
        } else if (node.nodeType == 3) {
            if (pastStartNode && !reachedEndNode && !/^\s*$/.test(node.nodeValue)) {
                textNodes.push(node);
            }
        } else {
            for (var i = 0, len = node.childNodes.length; !reachedEndNode && i < len; ++i) {
                getTextNodes(node.childNodes[i]);
            }
        }
    }

    getTextNodes(rootNode);
    return textNodes;
}

var x = document.getElementById("x"),
    y = document.getElementById("y");

var textNodes = getTextNodesBetween(document.body, x, y);
console.log(textNodes);