How to Output Unicode String to Rtf (Using C#)

How to output unicode string to RTF (using C#)

Provided that all the characters that you're catering for exist in the Basic Multilingual Plane (it's unlikely that you'll need anything more), then a simple UTF-16 encoding should suffice.

Wikipedia:

All possible code points from U+0000
through U+10FFFF, except for the
surrogate code points U+D800–U+DFFF
(which are not characters), are
uniquely mapped by UTF-16 regardless
of the code point's current or future
character assignment or use.

The following sample program illustrates doing something along the lines of what you want:

static void Main(string[] args)
{
// ë
char[] ca = Encoding.Unicode.GetChars(new byte[] { 0xeb, 0x00 });
var sw = new StreamWriter(@"c:/helloworld.rtf");
sw.WriteLine(@"{\rtf
{\fonttbl {\f0 Times New Roman;}}
\f0\fs60 H" + GetRtfUnicodeEscapedString(new String(ca)) + @"llo, World!
}");
sw.Close();
}

static string GetRtfUnicodeEscapedString(string s)
{
var sb = new StringBuilder();
foreach (var c in s)
{
if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}

The important bit is the Convert.ToUInt32(c) which essentially returns the code point value for the character in question. The RTF escape for unicode requires a decimal unicode value. The System.Text.Encoding.Unicode encoding corresponds to UTF-16 as per the MSDN documentation.

Easiest way to format rtf/unicode/utf-8 in a RichTextBox?

You could use either

rtb.Rtf = Regex.Replace(rtb.Rtf, @"\\'02\s*(.*?)\s*\\'02", @"\b $1 \b0");

or

rtb.Rtf = Regex.Replace(rtb.Rtf, @"\\'02\s*(.*?)\s*\\'02", @"\'02 \b $1 \b0 \'02");

depending on whether you want to keep the \u0002s in there.

The \b and \b0 turn the bold on and off in RTF.

Output RTF special characters to Unicode

any of the following should help:

  • Rich Text Format (RTF) Specification, version 1.6, Special Characters. according to this, \'d3\'d6 are hexadecimal value[s], based on the specified character set. so you could write your own processing function, like splitting the the string at \'s, converting hex to char, rejoining it, converting to unicode.
  • How to: Convert RTF to Plain Text
  • NRTFTree - A class library for RTF processing in C#
  • Questions on SO tagged rtf and c#
  • Google search "rtf c#"

How to convert a string to RTF in C#?

Doesn't RichTextBox always have the same header/footer? You could just read the content based on off-set location, and continue using it to parse. (I think? please correct me if I'm wrong)

There are libraries available, but I've never had good luck with them personally (though always just found another method before fully exhausting the possibilities). In addition, most of the better ones are usually include a nominal fee.


EDIT
Kind of a hack, but this should get you through what you need to get through (I hope):

RichTextBox rich = new RichTextBox();
Console.Write(rich.Rtf);

String[] words = { "Européen", "Apple", "Carrot", "Touché", "Résumé", "A Européen eating an apple while writing his Résumé, Touché!" };
foreach (String word in words)
{
rich.Text = word;
Int32 offset = rich.Rtf.IndexOf(@"\f0\fs17") + 8;
Int32 len = rich.Rtf.LastIndexOf(@"\par") - offset;
Console.WriteLine("{0,-15} : {1}", word, rich.Rtf.Substring(offset, len).Trim());
}

EDIT 2

The breakdown of the codes RTF control code are as follows:

  • Header
    • \f0 - Use the 0-index font (first font in the list, which is typically Microsoft Sans Serif (noted in the font table in the header: {\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}))
    • \fs17 - Font formatting, specify the size is 17 (17 being in half-points)
  • Footer
    • \par is specifying that it's the end of a paragraph.

Hopefully that clears some things up. ;-)

Specify utf-8 character encoding in RTF? The text (in UTF-8) format is correctly shown in Sqlite

I read in many places that RTF doesn't have a UTF-8 standard solution.

So, I created my own converter after scanning half the internet. If you have a standard/better solution, please let me know!

So after studying this book and I created a converter based on these character mappings. Great resources.

This solved my question. Re-using other solutions is what I would like to do for this kind of features, but I was not able to find one, alas.

The converter could be something like:

public static String convertHtmlToRtf(String html) {
String tmp = html.replaceAll("\\R", " ")
.replaceAll("\\\\", "\\\\\\\\")
.replaceAll("\\{", "\\\\{")
.replaceAll("}", "\\\\}");
tmp = tmp.replaceAll("<a\\s+target=\"_blank\"\\s+href=[\"']([^\"']+?)[\"']\\s*>([^<]+?)</a>",
"{\\\\field{\\\\*\\\\fldinst HYPERLINK \"$1\"}{\\\\fldrslt \\\\plain \\\\f2\\\\b\\\\fs20\\\\cf2 $2}}");
tmp = tmp.replaceAll("<a\\s+href=[\"']([^\"']+?)[\"']\\s*>([^<]+?)</a>",
"{\\\\field{\\\\*\\\\fldinst HYPERLINK \"$1\"}{\\\\fldrslt \\\\plain \\\\f2\\\\b\\\\fs20\\\\cf2 $2}}");

tmp = tmp.replaceAll("<h3>", "\\\\line{\\\\b\\\\fs30{");
tmp = tmp.replaceAll("</h3>", "}}\\\\line\\\\line ");
tmp = tmp.replaceAll("<b>", "{\\\\b{");
tmp = tmp.replaceAll("</b>", "}}");
tmp = tmp.replaceAll("<strong>", "{\\\\b{");
tmp = tmp.replaceAll("</strong>", "}}");
tmp = tmp.replaceAll("<i>", "{\\\\i{");
tmp = tmp.replaceAll("</i>", "}}");
tmp = tmp.replaceAll("&", "&");
tmp = tmp.replaceAll(""", "\"");
tmp = tmp.replaceAll("©", "{\\\\'a9}");
tmp = tmp.replaceAll("<", "<");
tmp = tmp.replaceAll(">", ">");
tmp = tmp.replaceAll("<br/?><br/?>", "{\\\\pard \\\\par}\\\\line ");
tmp = tmp.replaceAll("<br/?>", "\\\\line ");
tmp = tmp.replaceAll("<BR>", "\\\\line ");
tmp = tmp.replaceAll("<p[^>]*?>", "{\\\\pard ");
tmp = tmp.replaceAll("</p>", " \\\\par}\\\\line ");
tmp = convertSpecialCharsToRtfCodes(tmp);
return "{\\rtf1\\ansi\\ansicpg0\\uc0\\deff0\\deflang0\\deflangfe0\\fs20{\\fonttbl{\\f0\\fnil Tahoma;}{\\f1\\fnil Tahoma;}{\\f2\\fnil\\fcharset0 Tahoma;}}{\\colortbl;\\red0\\green0\\blue0;\\red0\\green0\\blue255;\\red0\\green255\\blue0;\\red255\\green0\\blue0;}" + tmp + "}";
}

private static String convertSpecialCharsToRtfCodes(String input) {
char[] chars = input.toCharArray();
StringBuffer sb = new StringBuffer();
int length = chars.length;
for (int i = 0; i < length; i++) {
switch (chars[i]) {
case '’':
sb.append("{\\'92}");
break;
case '`':
sb.append("{\\'60}");
break;
case '€':
sb.append("{\\'80}");
break;
case '…':
sb.append("{\\'85}");
break;
case '‘':
sb.append("{\\'91}");
break;
case '̕':
sb.append("{\\'92}");
break;
case '“':
sb.append("{\\'93}");
break;
case '”':
sb.append("{\\'94}");
break;
case '•':
sb.append("{\\'95}");
break;
case '–':
case '‒':
sb.append("{\\'96}");
break;
case '—':
sb.append("{\\'97}");
break;
case '©':
sb.append("{\\'a9}");
break;
case '«':
sb.append("{\\'ab}");
break;
case '±':
sb.append("{\\'b1}");
break;
case '„':
sb.append("\"");
break;
case '´':
sb.append("{\\'b4}");
break;
case '¸':
sb.append("{\\'b8}");
break;
case '»':
sb.append("{\\'bb}");
break;
case '½':
sb.append("{\\'bd}");
break;
case 'Ä':
sb.append("{\\'c4}");
break;
case 'È':
sb.append("{\\'c8}");
break;
case 'É':
sb.append("{\\'c9}");
break;
case 'Ë':
sb.append("{\\'cb}");
break;
case 'Ï':
sb.append("{\\'cf}");
break;
case 'Í':
sb.append("{\\'cd}");
break;
case 'Ó':
sb.append("{\\'d3}");
break;
case 'Ö':
sb.append("{\\'d6}");
break;
case 'Ü':
sb.append("{\\'dc}");
break;
case 'Ú':
sb.append("{\\'da}");
break;
case 'ß':
case 'β':
sb.append("{\\'df}");
break;
case 'à':
sb.append("{\\'e0}");
break;
case 'á':
sb.append("{\\'e1}");
break;
case 'ä':
sb.append("{\\'e4}");
break;
case 'è':
sb.append("{\\'e8}");
break;
case 'é':
sb.append("{\\'e9}");
break;
case 'ê':
sb.append("{\\'ea}");
break;
case 'ë':
sb.append("{\\'eb}");
break;
case 'ï':
sb.append("{\\'ef}");
break;
case 'í':
sb.append("{\\'ed}");
break;
case 'ò':
sb.append("{\\'f2}");
break;
case 'ó':
sb.append("{\\'f3}");
break;
case 'ö':
sb.append("{\\'f6}");
break;
case 'ú':
sb.append("{\\'fa}");
break;
case 'ü':
sb.append("{\\'fc}");
break;
default:
if( chars[i] != ' ' && isSpaceChar( chars[i])) {
System.out.print( ".");
//sb.append("{\\~}");
sb.append(" ");
} else if( chars[i] == 8218) {
System.out.println("Strange comma ... ");
sb.append(",");
} else if( chars[i] > 132) {
System.err.println( "Special code that is not translated in RTF: '" + chars[i] + "', nummer=" + (int) chars[i]);
sb.append(chars[i]);
} else {
sb.append(chars[i]);
}
}
}
return sb.toString();
}

RichTextBox and special chars c#

You can simply use the Text property:

richTextBox1.Text = "╔═══This is only an example, the special characters may change═══╗";

If you want to use the RTF property:
Take a look at this question: How to output unicode string to RTF (using C#)

You need to use something like this to convert the special characters to rtf format:

static string GetRtfUnicodeEscapedString(string s)
{
var sb = new StringBuilder();
foreach (var c in s)
{
if(c == '\\' || c == '{' || c == '}')
sb.Append(@"\" + c);
else if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}

Then use:

richtextbox1.Rtf = GetRtfUnicodeEscapedString(TextString);


Related Topics



Leave a reply



Submit