Unicode Characters Replace from String Using C#

Replace the string of special characters in C#

I believe, best is to use a regular expression here as below

s/[*'",_&#^@]/ /g

You can use Regex class for this purpose

Regex reg = new Regex("[*'\",_&#^@]");
str1 = reg.Replace(str1, string.Empty);

Regex reg1 = new Regex("[ ]");
str1 = reg.Replace(str1, "-");

Replace special character with unicode character in a string c#

You can try using Linq:

  using System.Linq;

...

string source = "İntersport";

// you may want to change 255 into 127 if you want standard ASCII table
string target = string.Concat(source
.Select(c => c < 32 || c > 255
? "\\u" + ((int)c).ToString("x4") // special symbol: command one or above Ascii
: c.ToString())); // within ascii table [32..255]

// \u0130ntersport
Console.Write(target);

Edit: No Linq solution:

  string source = "İntersport";

StringBuilder sb = new StringBuilder();

foreach (char c in source)
if (c < 32 || c > 255)
sb.Append("\\u" + ((int)c).ToString("x4"));
else
sb.Append(c);

string target = sb.ToString();

Unicode characters replace from string using C#

Use regexp:

var unicodeRegexp = new Regex(@"\x1f");
var testWord = "our guests will experience \u001favor in an area";
var newWord = unicodeRegexp.Replace(testWord, "text for replacement");

\x1f is the replacement for \uoo1f, leading zeros should be skipped
https://www.regular-expressions.info/unicode.html#codepoint

String.Replace does not replace Unicode Character 'ARABIC LETTERS'

.Replace returns a new string, as System.String is immutable.

Consider reassigning.

stringValue = stringValue.Replace(Convert.ToString(currentCharacter), replaceValue);

Is there a way to replace a character based off it's unicode code?

Ended up solving it like this:

            foreach (var (key, value ) in characterSet)
{
var hexValue = int.Parse(key.Substring(1), System.Globalization.NumberStyles.HexNumber);
str = str.Replace(((char)hexValue).ToString(), value );
}
return str;

Replacing unicode characters in string in C#

Converting to a JSON string like that is more cumbersome than it should be, mainly because you need to work with Unicode code points which in practice means calling char.ConvertToUtf32. In order to do that, you need to somehow handle surrogate pairs; System.Globalization.StringInfo can help with that.

Here's a function that uses these building blocks to perform the conversion:

string str = "ĄĆŹ - ćwrą";

public string ToJsonString(string s)
{
var enumerator = StringInfo.GetTextElementEnumerator(s);
var sb = new StringBuilder();

while (enumerator.MoveNext())
{
var unicodeChar = enumerator.GetTextElement();
var codePoint = char.ConvertToUtf32(unicodeChar, 0);
if (codePoint < 0x80) {
sb.Append(unicodeChar);
}
else if (codePoint < 0xffff) {
sb.Append("\\u").Append(codePoint.ToString("x4"));
}
else {
sb.Append("\\u").Append((codePoint & 0xffff).ToString("x4"));
sb.Append("\\u").Append(((codePoint >> 16) & 0xffff).ToString("x4"));
}
}

return sb.ToString();
}

Remove unwanted unicode characters from string

testString = Regex.Replace(testString, @"[\u0000-\u0008\u000A-\u001F\u0100-\uFFFF]", "");

or

testString = Regex.Replace(testString, @"[^\t\r\n -~]", "");



Related Topics



Leave a reply



Submit