Truncate Text Containing Html, Ignoring Tags

Truncate text containing HTML, ignoring tags

Assuming you are using valid XHTML, it's simple to parse the HTML and make sure tags are handled properly. You simply need to track which tags have been opened so far, and make sure to close them again "on your way out".

<?php
header('Content-type: text/plain; charset=utf-8');

function printTruncated($maxLength, $html, $isUtf8=true)
{
$printedLength = 0;
$position = 0;
$tags = array();

// For UTF-8, we need to count multibyte sequences as one character.
$re = $isUtf8
? '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}'
: '{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}';

while ($printedLength < $maxLength && preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position))
{
list($tag, $tagPosition) = $match[0];

// Print text leading up to the tag.
$str = substr($html, $position, $tagPosition - $position);
if ($printedLength + strlen($str) > $maxLength)
{
print(substr($str, 0, $maxLength - $printedLength));
$printedLength = $maxLength;
break;
}

print($str);
$printedLength += strlen($str);
if ($printedLength >= $maxLength) break;

if ($tag[0] == '&' || ord($tag) >= 0x80)
{
// Pass the entity or UTF-8 multibyte sequence through unchanged.
print($tag);
$printedLength++;
}
else
{
// Handle the tag.
$tagName = $match[1][0];
if ($tag[1] == '/')
{
// This is a closing tag.

$openingTag = array_pop($tags);
assert($openingTag == $tagName); // check that tags are properly nested.

print($tag);
}
else if ($tag[strlen($tag) - 2] == '/')
{
// Self-closing tag.
print($tag);
}
else
{
// Opening tag.
print($tag);
$tags[] = $tagName;
}
}

// Continue after the tag.
$position = $tagPosition + strlen($tag);
}

// Print any remaining text.
if ($printedLength < $maxLength && $position < strlen($html))
print(substr($html, $position, $maxLength - $printedLength));

// Close any open tags.
while (!empty($tags))
printf('</%s>', array_pop($tags));
}


printTruncated(10, '<b><Hello></b> <img src="world.png" alt="Sample Image" /> world!'); print("\n");

printTruncated(10, '<table><tr><td>Heck, </td><td>throw</td></tr><tr><td>in a</td><td>table</td></tr></table>'); print("\n");

printTruncated(10, "<em><b>Hello</b>w\xC3\xB8rld!</em>"); print("\n");

Encoding note: The above code assumes the XHTML is UTF-8 encoded. ASCII-compatible single-byte encodings (such as Latin-1) are also supported, just pass false as the third argument. Other multibyte encodings are not supported, though you may hack in support by using mb_convert_encoding to convert to UTF-8 before calling the function, then converting back again in every print statement.

(You should always be using UTF-8, though.)

Edit: Updated to handle character entities and UTF-8. Fixed bug where the function would print one character too many, if that character was a character entity.

Truncate string to certain amount of characters, ignoring HTML

Yes, character_limiter(strip_tags($text),54); should work for you.

Truncate string with HTML tags in it

How about a function. Here's mine -- AbstractHTMLContents. It has two parameters:

  • input HTML content,
  • limit.

Here's the code:

function AbstractHTMLContents($html, $maxLength=100){
mb_internal_encoding("UTF-8");
$printedLength = 0;
$position = 0;
$tags = array();
$newContent = '';

$html = $content = preg_replace("/<img[^>]+\>/i", "", $html);

while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
{
list($tag, $tagPosition) = $match[0];
// Print text leading up to the tag.
$str = mb_strcut($html, $position, $tagPosition - $position);
if ($printedLength + mb_strlen($str) > $maxLength){
$newstr = mb_strcut($str, 0, $maxLength - $printedLength);
$newstr = preg_replace('~\s+\S+$~', '', $newstr);
$newContent .= $newstr;
$printedLength = $maxLength;
break;
}
$newContent .= $str;
$printedLength += mb_strlen($str);
if ($tag[0] == '&') {
// Handle the entity.
$newContent .= $tag;
$printedLength++;
} else {
// Handle the tag.
$tagName = $match[1][0];
if ($tag[1] == '/') {
// This is a closing tag.
$openingTag = array_pop($tags);
assert($openingTag == $tagName); // check that tags are properly nested.
$newContent .= $tag;
} else if ($tag[mb_strlen($tag) - 2] == '/'){
// Self-closing tag.
$newContent .= $tag;
} else {
// Opening tag.
$newContent .= $tag;
$tags[] = $tagName;
}
}

// Continue after the tag.
$position = $tagPosition + mb_strlen($tag);
}

// Print any remaining text.
if ($printedLength < $maxLength && $position < mb_strlen($html))
{
$newstr = mb_strcut($html, $position, $maxLength - $printedLength);
$newstr = preg_replace('~\s+\S+$~', '', $newstr);
$newContent .= $newstr;
}

// Close any open tags.
while (!empty($tags))
{
$newContent .= sprintf('</%s>', array_pop($tags));
}

return $newContent;
}

It seems, it gives result expected by you.

Truncate text without truncating HTML

Alright so this is what I put together and it seems to be working:

function truncate_html($string, $length, $postfix = '…', $isHtml = true) {
$string = trim($string);
$postfix = (strlen(strip_tags($string)) > $length) ? $postfix : '';
$i = 0;
$tags = []; // change to array() if php version < 5.4

if($isHtml) {
preg_match_all('/<[^>]+>([^<]*)/', $string, $tagMatches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
foreach($tagMatches as $tagMatch) {
if ($tagMatch[0][1] - $i >= $length) {
break;
}

$tag = substr(strtok($tagMatch[0][0], " \t\n\r\0\x0B>"), 1);
if ($tag[0] != '/') {
$tags[] = $tag;
}
elseif (end($tags) == substr($tag, 1)) {
array_pop($tags);
}

$i += $tagMatch[1][1] - $tagMatch[0][1];
}
}

return substr($string, 0, $length = min(strlen($string), $length + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . $postfix;
}

Usage:

truncate_html('<p>I really like the <a href="http://google.com">Google</a> search engine.</p>', 24);

The function was grabbed from (made a small modification):

http://www.dzone.com/snippets/truncate-text-preserving-html

How can I truncate the text contents of an Element while preserving HTML?

It sounds like you'd like to be able to truncate the length of your HTML string as a text string, for example consider the following HTML:

'<b>foo</b> bar'

In this case the HTML is 14 characters in length and the text is 7. You would like to be able to truncate it to X text characters (for example 2) so that the new HTML is now:

'<b>fo</b>'

Disclosure: My answer uses a library I developed.

You could use the HTMLString library - Docs : GitHub.

The library makes this task pretty simple. To truncate the HTML as we've outlined above (e.g to 2 text characters) using HTMLString you'd use the following code:

var myString = new HTMLString.String('<b>foo</b> bar');
var truncatedString = myString.slice(0, 2);
console.log(truncatedString.html());

EDIT: After additional information from the OP.

The following truncate function truncates to the last full tag and caters for nested tags.

function truncate(str, len) {
// Convert the string to a HTMLString
var htmlStr = new HTMLString.String(str);

// Check the string needs truncating
if (htmlStr.length() <= len) {
return str;
}

// Find the closing tag for the character we are truncating to
var tags = htmlStr.characters[len - 1].tags();
var closingTag = tags[tags.length - 1];

// Find the last character to contain this tag
for (var index = len; index < htmlStr.length(); index++) {
if (!htmlStr.characters[index].hasTags(closingTag)) {
break;
}
}

return htmlStr.slice(0, index);
}

var myString = 'This is an <b>example ' +
'<a href="link">of a link</a> ' +
'inside</b> another element';

console.log(truncate(myString, 23).html());
console.log(truncate(myString, 18).html());

This will output:

This is an <b>example <a href="link">of a link</a></b>
This is an <b>example <a href="link">of a link</a> inside</b>

Rails Truncate Method: Ignore html in a string in Length Count

For those interested, to get an accurate count without links, one can do:

 count = strip_tags(string).count

(This is for a string that has html tags in it. If the string needs to be 'textilized' etc.. first, then the code is count = strip_tags(textilize(string)).count).

Rather than use truncate, I just limited the count to 140 characters of this true count ie switched this to a validation on the field.

Trim string to length ignoring HTML

Start at the first character of the post, stepping over each character. Every time you step over a character, increment a counter. When you find a '<' character, stop incrementing the counter until you hit a '>' character. Your position when the counter gets to 250 is where you actually want to cut off.

Take note that this will have another problem that you'll have to deal with when an HTML tag is opened but not closed before the cutoff.

How to truncate HTML with special characters?

I tried

function truncateIfNecessary($string, $length) {
if(strlen($string) > $length) {
$string = html_entity_decode(strip_tags($string));
$string = substr($string, 0, $length).'...';
$string = htmlentities($string);
return $string;
} else {
return strip_tags($string);
}
}

but for some reason it missed a few and . For now, I found the solution at http://alanwhipple.com/2011/05/25/php-truncate-string-preserving-html-tags-words/ (linked at Shortening text tweet-like without cutting links inside) worked perfectly - handles htmltags, preserve whole words (or not), and htmlentities. Now it's just:

function truncateIfNecessary($string, $length) {
if(strlen($string) > $length) {
return truncateHtml($string, $length, "...", true, true);
} else {
return strip_tags($string);
}
}

truncating text, ignoring child nodes javascript

Just iterate text nodes:

$(".row-title").each(function() {  var leng = 25;  [].forEach.call(this.childNodes, function(child) {    if(child.nodeType === 3) { // text node      var txt = child.textContent.trim();      if(txt.length > leng) {        child.textContent = txt.substr(0, leng) + "…";      }    }  });});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div class="oo-roboto-override row-title">  <span class="hidden-lg-up" itemprop="name">    Title:   </span>  This is the text that I want to truncate</div>


Related Topics



Leave a reply



Submit