Php: How to See "Invisible" Characters Like \N

PHP: is there a way to see invisible characters like \n

You can use the addcslashes function:

string addcslashes ( string $str, string $charlist )

which will return a string with backslashes before characters. An example would be:

<?php
echo addcslashes('foo[ ]', 'A..z');
// output: \f\o\o\[ \]
// All upper and lower-case letters will be escaped
// ... but so will the [\]^_`
?>

PHP/HTML display hidden characters

If you truly want to see everything, use for example this hex dump function. It's good for finding weird UTF-8 (UTF-8 space is not same as ASCII space character and so on) or BOM stuff etc.

It outputs like this

0000  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f   ........ ........
0010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f ........ ........
0020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f !"#$%&' ()*+,-./
0030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 01234567 89:;<=>?
0040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f @ABCDEFG HIJKLMNO
0050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f PQRSTUVW XYZ[]^_
0060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f `abcdefg hijklmno
0070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f pqrstuvw xyz{|}~
0080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f €‚ƒ„…†‡ ˆ‰Š‹ŒŽ
0090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f ‘’“”•–— ˜™š›œžŸ
00a0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af ¡¢£¤¥¦§ ¨©ª«¬­®¯
00b0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf °±²³´µ¶· ¸¹º»¼½¾¿
00c0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf ÀÁÂÃÄÅÆÇ ÈÉÊËÌÍÎÏ
00d0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df ÐÑÒÓÔÕÖ× ØÙÚÛÜÝÞß
00e0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef àáâãäåæç èéêëìíîï
00f0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ðñòóôõö÷ øùúûüýþ

how to filter out invisible characters from string

The invisible characters are '/n','/r','/t'
and method for manually removing them is

$string = trim(preg_replace('/\s\s+/', ' ', $string));

Strip hidden character from string

You may try to have a look here; remove control characters?

Remove control characters from php String

Identifying hidden characters

What I would do is parse the string and get the ASCII character

$str = str_split('your string here');
foreach($str as $char) echo ord($char);

You'll then have the ASCII code of the character. You can theoretically work backwards from there

Determining and removing invisible characters from a string in PHP (%E2%80%8E)

If the input is utf8-encoded, might use unicode regex to match/strip invisible control characters like e2808e (left-to-right-mark). Use u (PCRE_UTF8) modifier and \p{C} or \p{Other}.

Strip out all invisibles:

$str = preg_replace('/\p{C}+/u', "", $str);

Here is a list of \p{Other}


Detect/identify invisibles:

$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";

// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
echo "<pre>\n";
foreach($out[0] AS $k => $v) {
echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
}
echo "</pre>";
}

outputs:

detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9

Test on eval.in

To identify, look up at Google e.g. fileformat.info:

@google: site:fileformat.info e2808e

PHP Invisible character(s) in a string taken from a txt file

As suggested in comment use var_dump() or Hexdump to get your real string output if it contains any special characters or unwanted spaces.

I assume you're getting an unwanated spaces while reading a string from a file Use trim to remove that spaces and see if it works,

//$str2 = str_replace(array("\n", "\r"), '', $str2); Try this too
similar_text(trim($str1),trim($str2),$percent1);

DEMO.

How to display hidden characters in PhpStorm, especially line seperators

Based on your update it is now clear what character you have in mind:

Sorry I could identify it as "U+2028 : LINE SEPARATOR" http://www.babelstone.co.uk/Unicode/whatisit.html

Install and use Zero Width Characters locator 2 plugin: it can detect quite a few invisible characters (e.g. UTF-8 BOOM sequence, non-breakable space, Unicode line separator (your case) etc).

It is implemented as a separate inspection with highest (Error) severity so will be easy to spot or check the whole folder/project just for these issues.


There is a ticket (Feature Request) to have an option to show invisible characters in the editor.

https://youtrack.jetbrains.com/issue/IDEA-115572 -- watch this ticket (star/vote/comment) to get notified on any progress. implemented in 2020.2 version.

Other related tickets:

  • https://youtrack.jetbrains.com/issue/IDEA-99899 (your case, as I understand)
  • https://youtrack.jetbrains.com/issue/IDEA-140567
  • https://youtrack.jetbrains.com/issue/WEB-13506

UPDATE 2021-11-10:

As of 2020.2 version the IDE can show invisible/special symbols right in the editor.

An example:

Sample Image

UTF 8 String remove all invisible characters except newline

Use a "double negation":

$string = preg_replace('/[^\P{C}\n]+/u', '', $string);

Explanation:

  • \P{C} is the same as [^\p{C}].
  • Therefore [^\P{C}] is the same as \p{C}.
  • Since we now have a negated character class, we can substract other characters like \n from it.


Related Topics



Leave a reply



Submit