A Better Way to Replace Emoticons in PHP

A better way to replace emoticons in PHP?

You can use the preg_replace function and then use word boundaries in the regular expression.

 foreach($icons as $icon=>$image) {
$icon = preg_quote($icon);
$text = preg_replace("~\b$icon\b~",$image,$text);
}

You need to use word boundaries and not white space because this will take care of the start and end points to. Needing a space before means that just a :) won't be found.

Emoticon Replacement - PHP

If you want to use a regex:

$pat = '#(^|\W)'.preg_quote($this->laugh,'#').'($|\W)#';
$str = str_replace($pat, $this->t_laugh, $str);

This basically means the emoticon can be at the start of the string or proceded by a non-word character, and must be followed by the end of the string or another non-word character. preg_quote is necessary in case your emoticon contains any special regex characters.

Also, a better format might be:

$emoticons = array(
'smile' => array('<img src...', array('>:]',':-)',...),
'laugh' => array('<img src....', array(...)),
...
)

Then you can loop over everything.


Update

Should use negative lookarounds instead to match side-by-side emoticons. Then it won't try matching the surrounding spaces.

<?php
$smile = array(">:]", ":-)", ":)", ":o)", ":]", ":3", ":c)", ":>", "=]", "8)", "=)", ":}", ":^)");
$laugh = array(">:D", ":-D", ":D", "8-D", "x-D", "X-D", "=-D", "=D", "=-3", "8-)");
$sad = array(">:[", ":-(", ":(", ":-c", ":c", ":-<", ":-[", ":[", ":{", ">.>", "<.<", ">.<");
$wink = array(">;]", ";-)", ";)", "*-)", "*)", ";-]", ";]", ";D", ";^)");
$tongue = array(">:P", ":-P", ":P", "X-P", "x-p", ":-p", ":p", "=p", ":-Ã", ":Ã", ":-b", ":b", "=p", "=P");
$surprise = array(">:o", ">:O", ":-O", ":O", "°o°", "°O°", ":O", "o_O", "o.O", "8-0");
$annoyed = array(">:\\", ">:/", ":-/", ":-.", ":\\", "=/", "=\\", ":S");
$cry = array(":'(", ";'(");

$ary = array_merge($smile, $laugh, $sad, $wink, $tongue,$surprise,$annoyed,$cry);

foreach ($ary as $a)
{
$quoted[] = preg_quote($a, '#');
}

$regex = implode('|', $quoted);

$full = '#(?!<\w)(' . $regex .')(?!\w)#';
echo $full.PHP_EOL;
$str = "Testing :) emoticons :D :(";

preg_match_all($full, $str, $matches);
print_r($matches[0]);

Also, try to use single-quotes when writing regex patterns, because double-quotes allow escape sequences, and single quotes won't interpret escape sequence. i.e., you sometimes need to double your slashes when using double quotes.

Single PHP Function To Replace Text With Emoticons Or Filter Words

[edit] sorry I couldn't help but do it the way I would if it were my project. A repeatable non-redundant process.

$array = [
'<img src="emoticons/{{value}}" height="18" width="18">' => [
':)' => 'smile.png',
';)' => 'wink.png'
],
'<br>' => ['\n', '\r'],
'****' => ['4lettercussword', '4lettercussword'],
'*****' => '5lettercussword'
];

function filterText($array, &$msg) {
foreach($array as $key => $value) {
if(is_array($value)) {
if(array_keys($value) !== range(0, count($value) - 1)) {
foreach($value as $k => $v) {
$msg = str_replace($k, str_replace('{{value}}', $v, $key), $msg);
}
} else {
for($i = 0;$i < count($value);$i++) {
$msg = str_replace($value[$i], $key, $msg);
}
}
} else {
$msg = str_replace($value, $key, $msg);
}
}
}

$msg = '4lettercussword :) \n';
filterText($array, $msg);
echo $msg;

output:

**** <img src="emoticons/smile.png" height="18" width="18"> <br>

The key in the array is what will replace the value. If the key includes a {{value}} identifier then it knows the array pointed to will be associative, and that it needs to take the value from that array and plug it into the {{value}} identifier in your key. If any key equals a simple array of values it will replace any of those values with the key. This always you to have different html tags and replace only portions of it with a key value str_replace.

Match and replace emoticons in string - what is the most efficient way?

If the $string, in which you want replace emoticons, is provided by a visitor of your site(I mean it's a user's input like comment or something), then you should not relay that there will be a space before or after the emoticon. Also there are at least couple of emoticons, that are very similar but different, like :-) and :-)).
So I think that you will achieve better result if you define your emoticon's array like this:

$emoticons = array(
':-)' => '[HAPPY]',
':)' => '[HAPPY]',
':o)' => '[HAPPY]',
':-(' => '[SAD]',
':(' => '[SAD]',
...
)

And when you fill all find/replace definitions, you should reorder this array in a way, that there will be no chance to replace :-)) with :-). I believe if you sort array values by length will be enough. This is in case your are going to use str_replace(). strtr() will do this sort by length automatically!

If you are concerned about performance, you can check strtr vs str_replace, but I will suggest to make your own testing (you may get different result regarding your $string length and find/replace definitions).

The easiest way will be if your "find definitions" doesn't contain trailing spaces:

$string = strtr( $string, $emoticons );
$emoticons = str_replace( '][', '', trim( join( array_unique( $emoticons ) ), '[]' ) );
$string = preg_replace( '/\s*\[(' . join( '|', $emoticons ) . ')\]\s*/', '[$1]', $string ); // striping white spaces around word-styled emoticons

Remove non-text chars (like emoticons) from string

Code with sample input: Demo

$VideoTitles=[
'Kilian à Dijon #4 • Vlog #2 • Primark again !? - YouTube',
'Funfesty on Twitter: "Je commence à avoir mal à la tête à force',
'Sia 2017 Cheap Thrills 2017 live '
];

$VideoTitles=preg_replace('/[^ -\x{2122}]\s+|\s*[^ -\x{2122}]/u','',$VideoTitles); // remove out of range characters and whitespace character on one side only

var_export($VideoTitles);

Output:

array (
0 => 'Kilian à Dijon #4 • Vlog #2 • Primark again !? - YouTube',
1 => 'Funfesty on Twitter: "Je commence à avoir mal à la tête à force',
2 => 'Sia 2017 Cheap Thrills 2017 live',
)

The above regex pattern uses a character range from \x20-\x2122 (space to trade-mark-sign). I have selected this range because it should cover the vast majority of word-related characters including letters with accents and non-English characters. (Admittedly, it also includes many non-word-related characters. You may like to use two separate ranges for greater specificity like: /[^\x{20}-\x{60}\x{7B}-\x{FF}]/ui -- this case-insensitively searches two ranges: space to grave accent and left curly bracket to latin small letter y with diaeresis)

If you find that this range is unnecessarily generous or takes too long to process, you can make your own decision about the appropriate character range.

For instance, you might like the much lighter but less generous /[^\x20-\x7E]/u (from space to tilde). However, if you apply it to either of my above French $VideoTitles then you will mangle the text by removing legitimate letters.

Here is a menu of characters and their unicode numbers to help you understand what is inside the aforementioned ranges and beyond.

*And remember to include a unicode flag u after your closing delimiter.


For completeness, I should say the literal/narrow solution for removing the two emojis would be:

$VideoTitle=preg_replace('/[\x{1F3A7}\x{1F3AC}]/u','',$VideoTitle);  // omit 2 emojis

These emojis are called "clapper board (U+1F3AC)" and "headphone (U+1F3A7)".

Remove emojis from string

I used this function, customized with additional Unicode ranges (including more emojis and country flags), You can do the same using the details on this page http://unicode.org/emoji/charts/full-emoji-list.html.

Here is the result:

function remove_emoji($string)
{
// Match Enclosed Alphanumeric Supplement
$regex_alphanumeric = '/[\x{1F100}-\x{1F1FF}]/u';
$clear_string = preg_replace($regex_alphanumeric, '', $string);

// Match Miscellaneous Symbols and Pictographs
$regex_symbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clear_string = preg_replace($regex_symbols, '', $clear_string);

// Match Emoticons
$regex_emoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clear_string = preg_replace($regex_emoticons, '', $clear_string);

// Match Transport And Map Symbols
$regex_transport = '/[\x{1F680}-\x{1F6FF}]/u';
$clear_string = preg_replace($regex_transport, '', $clear_string);

// Match Supplemental Symbols and Pictographs
$regex_supplemental = '/[\x{1F900}-\x{1F9FF}]/u';
$clear_string = preg_replace($regex_supplemental, '', $clear_string);

// Match Miscellaneous Symbols
$regex_misc = '/[\x{2600}-\x{26FF}]/u';
$clear_string = preg_replace($regex_misc, '', $clear_string);

// Match Dingbats
$regex_dingbats = '/[\x{2700}-\x{27BF}]/u';
$clear_string = preg_replace($regex_dingbats, '', $clear_string);

return $clear_string;
}

Replace emoticons in string with keywords

This should be enough, taking advantage of the fact that str_replace accepts arrays for any of its first two parameters:

foreach ($emoticons as $emot => $icons) {
$tweet = str_replace($icons, $emot, $tweet);
}

See it in action.

PHP : writing a simple removeEmoji function

I think the preg_replace function is the simpliest solution.

As EaterOfCode suggests, I read the wiki page and coded new regex since none of SO (or other websites) answers seemed to work for Instagram photo captions (API returning format) . Note: /u identifier is mandatory to match \x unicode chars.

public static function removeEmoji($text) {

$clean_text = "";

// Match Emoticons
$regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clean_text = preg_replace($regexEmoticons, '', $text);

// Match Miscellaneous Symbols and Pictographs
$regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clean_text = preg_replace($regexSymbols, '', $clean_text);

// Match Transport And Map Symbols
$regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
$clean_text = preg_replace($regexTransport, '', $clean_text);

// Match Miscellaneous Symbols
$regexMisc = '/[\x{2600}-\x{26FF}]/u';
$clean_text = preg_replace($regexMisc, '', $clean_text);

// Match Dingbats
$regexDingbats = '/[\x{2700}-\x{27BF}]/u';
$clean_text = preg_replace($regexDingbats, '', $clean_text);

return $clean_text;
}

The function does not remove all emojis since there are many more, but you get the point.

Please refer to unicode.org - full emoji list (thanks Epoc)



Related Topics



Leave a reply



Submit