How to Detect "â€‹" (Combination of Unicode) in C++ String

I have this unicodestring Param�tres,the è is converted into unknown char.why?

std::string str(ws.begin(), ws.end()) simply copies each wchar_t as-is, narrowing each one to a char, truncating off the unused bits. This is not what you want to do, as it will only work without data loss for ASCII characters.

You need to convert the wchar_t data from UTF-16/32 (depending on what your compiler uses for encoding wchar_t data) to whatever charset you want the std::string to hold (ANSI/MBCS, UTF-8, ISO-8869-X, etc).

The C++ standard library has minimal built-in support for such conversions (std::wstring_convert, std::wcstombs(), etc), so you may have to resort to 3rd party Unicode libraries (ICONV, ICU, etc) or platform-specific APIs (WideCharToMultiByte(), etc).

Since you want to not only convert Unicode strings, but also compare them, then using a 3rd party Unicode library is probably going to be your best bet. Unicode is not trivial to work with, so leverage the hard work that has already been done for it.

How to convert â€™ to apostrophe in C#?

Try the following:

var bytes = Encoding.Default.GetBytes("â€™");
var text = Encoding.UTF8.GetString(bytes);
Console.WriteLine(text);

â€™ showing on page instead of '

Ensure the browser and editor are using UTF-8 encoding instead of ISO-8859-1/Windows-1252.

Or use ’.

How to make the python interpreter correctly handle non-ASCII characters in string operations?

Python 2 uses ascii as the default encoding for source files, which means you must specify another encoding at the top of the file to use non-ascii unicode characters in literals. Python 3 uses utf-8 as the default encoding for source files, so this is less of an issue.

See:
http://docs.python.org/tutorial/interpreter.html#source-code-encoding

To enable utf-8 source encoding, this would go in one of the top two lines:

# -*- coding: utf-8 -*-

The above is in the docs, but this also works:

# coding: utf-8

Additional considerations:

The source file must be saved using the correct encoding in your text editor as well.
In Python 2, the unicode literal must have a u before it, as in s.replace(u"Â ", u"") But in Python 3, just use quotes. In Python 2, you can from __future__ import unicode_literals to obtain the Python 3 behavior, but be aware this affects the entire current module.
s.replace(u"Â ", u"") will also fail if s is not a unicode string.
string.replace returns a new string and does not edit in place, so make sure you're using the return value as well

How to convert std::string to std::u32string in C++11?

Thanks everybody for help!

Using these 2 links, I was able to found some relevant functions:

https://en.cppreference.com/w/cpp/string/multibyte/mbrtoc32
How to convert a Unicode code point to characters in C++ using ICU?

I tried using codecvt functions, but I got the error:

fatal error: codecvt: No such file or directory
 #include <codecvt>
                   ^
compilation terminated.

So, I skipped that & on further searching, I found mbrtoc32() function which works:)

This is the working code:

#include <iostream>
#include <string>
#include <locale>
#include "unicode/unistr.h"
#include "unicode/ustream.h"
#include <cassert>
#include <cwchar>
#include <uchar.h>

int main()
{
    constexpr char locale_name[] = "";
    setlocale( LC_ALL, locale_name );
    std::locale::global(std::locale(locale_name));
    std::ios_base::sync_with_stdio(false);
    std::wcin.imbue(std::locale());
    std::wcout.imbue(std::locale());

    std::string str;
    std::cin >> str;
    //For example, the input string is "hello☺br>
    std::mbstate_t state{}; // zero-initialized to initial state
    char32_t c32;
    const char *ptr = str.c_str(), *end = str.c_str() + str.size() + 1;

    icu::UnicodeString ustr;

    while(std::size_t rc = mbrtoc32(&c32, ptr, end - ptr, &state))
    {
      icu::UnicodeString temp((UChar32)c32);
      ustr+=temp;
      assert(rc != (std::size_t)-3); // no surrogates in UTF-32
      if(rc == (std::size_t)-1) break;
      if(rc == (std::size_t)-2) break;
      ptr+=rc;
    }

    std::cout << "Unicode string is: " << ustr << std::endl;
    std::cout << "Size of unicode string = " << ustr.countChar32() << std::endl;
    std::cout << "Individual characters of the string are:" << std::endl;
    for(int i=0; i < ustr.countChar32(); i++)
      std::cout << icu::UnicodeString(ustr.char32At(i)) << std::endl;

    return 0;
}

The output on entering input hello☺/code> is as expected:

Unicode string is: hello☺br>Size of unicode string = 7
Individual characters of the string are:
h
e
l
l
o
☺
br>

    C++, Additional characters in string declaration
You have an unicode character of \u202D in your array that can not be represented in the current code page. Hence the displayed ? character.
    Replace non-ASCII characters with a single space
Your ''.join() expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:
return ''.join([i if ord(i) < 128 else ' ' for i in text])
This handles characters one by one and would still use one space per character replaced.
Your regular expression should just replace consecutive non-ASCII characters with a space:
re.sub(r'[^\x00-\x7F]+',' ', text)
Note the + there.                       

							       


       Related Topics
          
              Frozen Table Header Inside Scrollable Div
 Removing Space at The Top Left and Right of Div
 Preserve Line Breaks in Textarea
 Post Vs Post, Get Vs Get
 Can a: :Before Selector Be Used with a <Textarea>
 Responsive Order Confirmation Emails for Mobile Devices
 How to Apply Padding to Every Line in Multi-Line Text
 Prevent a Child Element from Overflowing Its Parent in Flexbox
 HTML List Element: Sharing The Parent Width into Equal Parts
 CSS Animate Custom Properties/Variables
 CSS Stretch Textbox to Fill Remaining Space
 How to Set Background-Color on 50% of Area CSS
 Animated Gif in HTML5 Canvas
 Adding Icon to Rails Application
 Insert HTML Code Inside Svg Text Element
 How to Use an <H2> Tag </H2> Inside a <P></P> in The Middle of a Text
 HTML - Two Tables Side by Side
 Bootstrap 3 - Show Collapsed Navigation for All Screen Sizes


				

						
                                                

						
						
							Leave a reply
							
								
								
								
								


								Submit