How do I convert Unicode escape sequences to Unicode characters in a .NET string?
The answer is simple and works well with strings up to at least several thousand characters.
Example 1:
Regex rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() );
Example 2:
Regex rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );
The first example shows the replacement being made using a lambda expression (C# 3.0) and the second uses a delegate which should work with C# 2.0.
To break down what's going on here, first we create a regular expression:
new Regex( @"\\[uU]([0-9A-F]{4})" );
Then we call Replace() with the string 'result' and an anonymous method (lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string.
The Unicode escape is processed like this:
((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); });
Get the string representing the number part of the escape (skip the first two characters).
match.Value.Substring(2)
Parse that string using Int32.Parse() which takes the string and the number format that the Parse() function should expect which in this case is a hex number.
NumberStyles.HexNumber
Then we cast the resulting number to a Unicode character:
(char)
And finally we call ToString() on the Unicode character which gives us its string representation which is the value passed back to Replace():
.ToString()
Note: Instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320'), but that's more complicated and less readable.
Replace Unicode escape sequences in a string
You could use a regular expression to parse the file:
private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);
public string Decoder(string value)
{
return _regex.Replace(
value,
m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
);
}
And then:
string data = Decoder(File.ReadAllText("test.txt"));
convert unicode escape sequences to string
Your escape sequences do not start with a \ like "\u00fd" so you Regex should be only
"[uU]([0-9A-F]{4})"
...
VB.NET, I can't convert Unicode escape sequences to text
You can use Regex.Unescape.
For example,
Dim s = "sa3444444d4ds\u0040outllok.com"
Console.WriteLine(Regex.Unescape(s))
outputs:
sa3444444d4ds@outllok.com
Credit to Tim Patrick for showing this in the Visual Studio Magazine article Overcoming Escape Sequence Envy in Visual Basic and C#.
Convert non-escaped unicode string to unicode
These are essentially UTF-16 code points, so this would do (this approach is not very efficient, but I assume optimization isn't the main goal):
Regex.Replace(
"u0393u03a5u039du0391u0399u039au0391",
"u[0-9a-f]{4}",
m => "" + (char) int.Parse(m.Value.Substring(1), NumberStyles.AllowHexSpecifier)
)
This can't deal with the ambiguity of un-escaped "regular" characters in the string: dufface
would effectively get turned into d
+ \uffac
+ e
, which is probably not right. It will correctly handle surrogates, though (ud83dudc96
is ).
Using the technique in this answer is another option:
Regex.Unescape(@"u0393u03a5u039du0391u0399u039au0391".Replace(@"\", @"\\").Replace("u", @"\u"))
The extra \
escaping is there just in case the string should contain any backslashes already, which could be wrongly interpreted as escape sequences.
HOWTO : convert unicode character representation in string to the actual unicode character
You want to have a Char
holding the value of the private-use code point U+F641.
You can do so by parsing it as the hexadecimal value it represents:
var input = "f641";
int p = int.Parse(input, System.Globalization.NumberStyles.HexNumber); // 63041
And then convert it to a Char
:
char c = (char)p;
Depending on the range of possible code points, you may not have enough space in a char
to store the code point, so as @Panagiotis indicates, use Char.ConvertFromUtf32(int)
:
string chars = Char.ConvertFromUtf32(p);
But then you'll have a string, not a single char.
how to parse string containing Unicode ID's as well as plain text for display in datagrid view
You can use Regex.Unescape() to convert the unicode escaped char (\uXXXX
) to a string.
If you receive \U
instead of \u
, you also need to perform that substitution, since \U
is not recognized as a valid escape sequence.
Dim input as String = "Castle: \Ud83d\Udc40Jerusal\U00e9m.Miles"
Dim result As String = Regex.Unescape(input.Replace("\U", "\u")).
This prints (it may depend on the Font used):
Castle: Jerusalém.Miles
As a note, you might also have used the wrong encoding when you decoded the input stream.
How to convert string with Unicode literal characters in it to a Unicode string
There are a number of ways to do this, however this might work for you.
Disclaimer: it's assumed your string looks like this in your db, Universidad de M\u00e1laga
var test1 = "Universidad de M\\u00e1laga";
var test2 = Regex.Unescape(test1);
Console.WriteLine(test1);
Console.WriteLine(test2);
Output
Universidad de M\u00e1laga
Universidad de Málaga
Note : This maybe pointing to an overall structural or design problem with this entire situation. Though, who knows what APIs give you back
Full demo here
Related Topics
Asp.Net: Invalid Postback or Callback Argument
How to Access HTML Form Input from ASP.NET Code Behind
Parsing HTML Page with HTMLagilitypack
C# (Mono) Linux Web Server Hosting with Consistent Static Variables Across Threads
How to Install Msbuild on Os X and Linux
Filesystemwatcher with Samba on Linux
Is Ruby's Code Block Same as C#'s Lambda Expression
Calling JavaScript Function from Codebehind
How to Build a JSON Object to Send to an Ajax Webservice
How to Fix a Opacity Bug with Drawtobitmap on Webbrowser Control
Are There Any Fuzzy Search or String Similarity Functions Libraries Written for C#
How to Force All Referenced Assemblies to Be Loaded into the App Domain
Tuples( or Arrays ) as Dictionary Keys in C#
How to Loop Through Items in a List Box and Then Remove Those Item