Ideal Method to Truncate a String with Ellipsis

Ideal method to truncate a string with ellipsis

I like the idea of letting "thin" characters count as half a character. Simple and a good approximation.

The main issue with most ellipsizings however, are (imho) that they chop of words in the middle. Here is a solution taking word-boundaries into account (but does not dive into pixel-math and the Swing-API).

private final static String NON_THIN = "[^iIl1\\.,']";

private static int textWidth(String str) {
return (int) (str.length() - str.replaceAll(NON_THIN, "").length() / 2);
}

public static String ellipsize(String text, int max) {

if (textWidth(text) <= max)
return text;

// Start by chopping off at the word before max
// This is an over-approximation due to thin-characters...
int end = text.lastIndexOf(' ', max - 3);

// Just one long word. Chop it off.
if (end == -1)
return text.substring(0, max-3) + "...";

// Step forward as long as textWidth allows.
int newEnd = end;
do {
end = newEnd;
newEnd = text.indexOf(' ', end + 1);

// No more spaces.
if (newEnd == -1)
newEnd = text.length();

} while (textWidth(text.substring(0, newEnd) + "...") < max);

return text.substring(0, end) + "...";
}

A test of the algorithm looks like this:

Sample Image

Smart way to truncate long strings

Essentially, you check the length of the given string. If it's longer than a given length n, clip it to length n (substr or slice) and add html entity (…) to the clipped string.

Such a method looks like

function truncate(str, n){
return (str.length > n) ? str.slice(0, n-1) + '…' : str;
};

If by 'more sophisticated' you mean truncating at the last word boundary of a string then you need an extra check.
First you clip the string to the desired length, next you clip the result of that to its last word boundary

function truncate( str, n, useWordBoundary ){
if (str.length <= n) { return str; }
const subString = str.slice(0, n-1); // the original check
return (useWordBoundary
? subString.slice(0, subString.lastIndexOf(" "))
: subString) + "…";
};

You can extend the native String prototype with your function. In that case the str parameter should be removed and str within the function should be replaced with this:

String.prototype.truncate = String.prototype.truncate || 
function ( n, useWordBoundary ){
if (this.length <= n) { return this; }
const subString = this.slice(0, n-1); // the original check
return (useWordBoundary
? subString.slice(0, subString.lastIndexOf(" "))
: subString) + "…";
};

More dogmatic developers may chide you strongly for that ("Don't modify objects you don't own". I wouldn't mind though).

An approach without extending the String prototype is to create
your own helper object, containing the (long) string you provide
and the beforementioned method to truncate it. That's what the snippet
below does.

const LongstringHelper = str => {
const sliceBoundary = str => str.substr(0, str.lastIndexOf(" "));
const truncate = (n, useWordBoundary) =>
str.length <= n ? str : `${ useWordBoundary
? sliceBoundary(str.slice(0, n - 1))
: str.slice(0, n - 1)}…`;
return { full: str, truncate };
};
const longStr = LongstringHelper(`Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore
magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum`);

const plain = document.querySelector("#resultTruncatedPlain");
const lastWord = document.querySelector("#resultTruncatedBoundary");
plain.innerHTML =
longStr.truncate(+plain.dataset.truncateat, !!+plain.dataset.onword);
lastWord.innerHTML =
longStr.truncate(+lastWord.dataset.truncateat, !!+lastWord.dataset.onword);
document.querySelector("#resultFull").innerHTML = longStr.full;
body {
font: normal 12px/15px verdana, arial;
}

p {
width: 450px;
}

#resultTruncatedPlain:before {
content: 'Truncated (plain) n='attr(data-truncateat)': ';
color: green;
}

#resultTruncatedBoundary:before {
content: 'Truncated (last whole word) n='attr(data-truncateat)': ';
color: green;
}

#resultFull:before {
content: 'Full: ';
color: green;
}
<p id="resultTruncatedPlain" data-truncateat="120" data-onword="0"></p>
<p id="resultTruncatedBoundary" data-truncateat="120" data-onword="1"></p>
<p id="resultFull"></p>

How can I truncate my strings with a ... if they are too long?

Here is the logic wrapped up in an extension method:

public static string Truncate(this string value, int maxChars)
{
return value.Length <= maxChars ? value : value.Substring(0, maxChars) + "...";
}

Usage:

var s = "abcdefg";

Console.WriteLine(s.Truncate(3));

How to shorten a string by replacing some characters with an ellipsis

If I understand correctly, n represents the maximum length of the string. If the string is longer, you want to replace a substring by the ellipsis character, such that the resulting string has a length of n. If possible, you also want that ellipsis to be placed before the extension of the filename, leaving one character visible before the final dot.

Some things to consider:

  • Use lastIndexOf instead of indexOf, as the last occurrence really determines where the file extension starts.
  • Don't use the substr method, as it is considered a legacy feature in ECMAScript. I prefer slice.
  • Make the necessary calculations so to ensure that after the insertion of the ellipsis you arrive at the correct length.
  • Deal with boundary cases, where the filename has no extension, or where the extension itself is longer than n.
  • Don't use the HTML entity , as that will limit the use of this function to HTML output only. Instead put the actual character, which will both work for HTML and plain text.

Here is the code I would suggest:

function truncate(str, n) {
if (str.length <= n) return str; // Nothing to do
if (n <= 1) return "…"; // Well... not much else we can return here!
let dot = str.lastIndexOf("."); // Where the extension starts
// How many characters from the end should remain:
let after = dot < 0 ? 1 : Math.max(1, Math.min(n - 2, str.length - dot + 2));
// How many characters from the start should remain:
let before = n - after - 1; // Account for the ellipsis
return str.slice(0, before) + "…" + str.slice(-after);
}

Python truncate a long string

info = (data[:75] + '..') if len(data) > 75 else data

Ellipsis in the middle of a text (Mac style)

In the HTML, put the full value in a custom data-* attribute like

<span data-original="your string here"></span>

Then assign load and resize event listeners to a JavaScript function which will read the original data attribute and place it in the innerHTML of your span tag. Here is an example of the ellipsis function:

function start_and_end(str) {
if (str.length > 35) {
return str.substr(0, 20) + '...' + str.substr(str.length-10, str.length);
}
return str;
}

Adjust the values, or if possible, make them dynamic, if necessary for different objects. If you have users from different browsers, you can steal a reference width from a text by the same font and size elsewhere in your dom. Then interpolate to an appropriate amount of characters to use.

A tip is also to have an abbr-tag on the ... or who message to make the user be able to get a tooltip with the full string.

<abbr title="simple tool tip">something</abbr>

Trim a string based on the string length

s = s.substring(0, Math.min(s.length(), 10));

Using Math.min like this avoids an exception in the case where the string is already shorter than 10.


Notes:

  1. The above does simple trimming. If you actually want to replace the last characters with three dots if the string is too long, use Apache Commons StringUtils.abbreviate; see @H6's solution. If you want to use the Unicode horizontal ellipsis character, see @Basil's solution.

  2. For typical implementations of String, s.substring(0, s.length()) will return s rather than allocating a new String.

  3. This may behave incorrectly1 if your String contains Unicode codepoints outside of the BMP; e.g. Emojis. For a (more complicated) solution that works correctly for all Unicode code-points, see @sibnick's solution.


1 - A Unicode codepoint that is not on plane 0 (the BMP) is represented as a "surrogate pair" (i.e. two char values) in the String. By ignoring this, we might trim the string to fewer than 10 code points, or (worse) truncate it in the middle of a surrogate pair. On the other hand, String.length() is not a good measure of Unicode text length, so trimming based on that property may be the wrong thing to do.



Related Topics



Leave a reply



Submit