Convert a Single Character to a String

How do I convert a single char to a string?

Use the .ToString() Method

String myString = "Hello, World";
foreach (Char c in myString)
{
String cString = c.ToString();
}

Convert a single character to a string?

Off the top of my head, if you're using STL then do this:

string firstLetter(1,str[0]);

Pulling a single char from a string and converting it to int

You can write:

std::string s = "#/5";
std::string substring = s.substr(2, 1);
int value = std::stoi(substring);

Using the substr method of std::string to pull out the substring that you want to parse as an integer, and then using stoi (which takes a std::string) instead of atoi (which takes a const char *).

Convert single-character string to char

If you want to convert a single character string to char, do this

char.Parse("a");

If you want to get char code do this

char.ConvertToUtf32("a", 0);  // return 97

C++ convert from 1 char to string?

All of

std::string s(1, c); std::cout << s << std::endl;

and

std::cout << std::string(1, c) << std::endl;

and

std::string s; s.push_back(c); std::cout << s << std::endl;

worked for me.

Implicit conversion from char to single character string

First off, as I always say when someone asks "why not?" question about C#: the design team doesn't have to provide a reason to not do a feature. Features cost time, effort and money, and every feature you do takes time, effort and money away from better features.

But I don't want to just reject the premise out of hand; the question might be better phrased as "what are design pros and cons of this proposed feature?"

It's an entirely reasonable feature, and there are languages which allow you to treat single characters as strings. (Tim mentioned VB in a comment, and Python also treats chars and one-character strings as interchangeable IIRC. I'm sure there are others.) However, were I pitched the feature, I'd point out a few downsides:

  • This is a new form of boxing conversion. Chars are cheap value types. Strings are heap-allocated reference types. Boxing conversions can cause performance problems and produce collection pressure, and so there's an argument to be made that they should be more visible in the program, not less visible.
  • The feature will not be perceived as "chars are convertible to one-character strings". It will be perceived by users as "chars are one-character strings", and now it is perfectly reasonable to ask lots of knock-on questions, like: can call .Length on a char? If I can pass a char to a method that expects a string, and I can pass a string to a method that expects an IEnumerable<char>, can I pass a char to a method that expects an IEnumerable<char>? That seems... odd. I can call Select and Where on a string; can I on a char? That seems even more odd. All the proposed feature does is move your question; had it been implemented, you'd now be asking "why can't I call Select on a char?" or some such thing.

  • Now combine the previous two points together. If I think of chars as one-character strings, and I convert a char to an object, do I get a boxed char or a string?

  • We can also generalize the second point a bit further. A string is a collection of chars. If we're going to say that a char is convertible to a collection of chars, why stop with strings? Why not also say that a char can also be used as a List<char>? Why stop with char? Should we say that an int is convertible to IEnumerable<int>?
  • We can generalize even further: if there's an obvious conversion from char to sequence-of-chars-in-a-string, then there is also an obvious conversion from char to Task<char> -- just create a completed task that returns the char -- and to Func<char> -- just create a lambda that returns the char -- and to Lazy<char>, and to Nullable<char> -- oh, wait, we do allow a conversion to Nullable<char>. :-)

All of these problems are solvable, and some languages have solved them. That's not the issue. The issue is: all of these problems are problems that the language design team must identify, discuss and resolve. One of the fundamental problems in language design is how general should this feature be? In two minutes I've gone from "chars are convertible to single-character strings" to "any value of an underlying type is convertible to an equivalent value of a monadic type". There is an argument to be made for both features, and for various other points on the spectrum of generality. If you make your language features too specific, it becomes a mass of special cases that interact poorly with each other. If you make them too general, well, I guess you have Haskell. :-)

Suppose the design team comes to a conclusion about the feature: all of that has to be written up in the design documents and the specification, and the code, and tests have to be written, and, oh, did I mention that any time you make a change to convertibility rules, someone's overload resolution code breaks? Convertibility rules you really have to get right in the first version, because changing them later makes existing code more fragile. There are real design costs, and there are real costs to real users if you make this sort of change in version 8 instead of version 1.

Now compare these downsides -- and I'm sure there are more that I haven't listed -- to the upsides. The upsides are pretty tiny: you avoid a single call to ToString or + "" or whatever you do to convert a char to a string explicitly.

That's not even close to a good enough benefit to justify the design, implementation, testing, and backwards-compat-breaking costs.

Like I said, it's a reasonable feature, and had it been in version 1 of the language -- which did not have generics, or an installed base of billions of lines of code -- then it would have been a much easier sell. But now, there are a lot of features that have bigger bang for smaller buck.

Replace a character at a specific index in a string?

String are immutable in Java. You can't change them.

You need to create a new string with the character replaced.

String myName = "domanokz";
String newName = myName.substring(0,4)+'x'+myName.substring(5);

Or you can use a StringBuilder:

StringBuilder myName = new StringBuilder("domanokz");
myName.setCharAt(4, 'x');

System.out.println(myName);

How do I convert a single character code to a `char` given a character set?

So, a couple of things.

First of all the page you linked to says this about the code point range in question:

The extended ASCII codes (character code 128-255)

There are several different variations of the 8-bit ASCII table. The table below is according to ISO 8859-1, also called ISO Latin-1. Codes 128-159 contain the Microsoft® Windows Latin-1 extended characters.

This is incorrect, or at least, to me, misleadingly worded. ISO 8859-1 / Latin-1 does not define code point 146 (and another reference just because). So that's already asking for trouble. You can see this also if you do the conversion through String:

String s = new String(new byte[] {(byte)146}, "iso-8859-1");
System.out.println(s);

Outputs the same "unexpected" result. It appears that what they are actually referring to is the Windows-1252 set (aka "Windows Latin-1", but this name is almost completely obsolete these days), which does define that code point as a right single quote (for other charsets that provide this character at 146 see this list and look for encodings that provide it at 0x92), and we can verify this as such:

String s = new String(new byte[] {(byte)146}, "windows-1252");
System.out.println(s);

So the first mistake is that page is confusing.

But the big mistake is you can't do what you're trying to do in the way you are doing it. A char in Java is a UTF-16 code point (or half of one, if you're representing the supplementary characters > 0xFFFF, a single char corresponds to a BMP point, a pair of them or an int corresponds to the full range, including the supplementary ones).

Unfortunately, Java doesn't really expose a lot of API for single-character conversions. Even Character doesn't have any readily available ways to convert from the charset of your choice to UTF-16.

So one option is to do it via String as hinted at in the examples above, e.g. express your code points as a raw byte[] array and convert from there:

String s = new String(new byte[] {(byte)146}, "windows-1252");
System.out.println(s);
char c = s.charAt(0);
System.out.println(c);

You could grab the char again via s.charAt(0). Note that you have to be mindful of your character set when doing this. Here we know that our byte sequence is valid for the specified encoding, and we know that the result is only one char long, so we can do this.

However, you have to watch out for things in the general case. For example, perhaps your byte sequence and character set yield a result that is in the UTF-16 supplementary character range. In that case s.charAt(0) would not be sufficient and s.codePointAt(0) stored in an int would be required instead.

As an alternative, with the same caveats, you could use Charset to decode, although it's just as clunky, e.g.:

Charset cs = Charset.forName("windows-1252");
CharBuffer cb = cs.decode(ByteBuffer.wrap(new byte[] {(byte)146}));
char c = cb.get(0);
System.out.println(c);

Note that I am not entirely sure how Charset#decode handles supplementary characters and can't really test right now (but anybody, feel free to chime in).


As an aside: In your case, 146 (0x92) cast directly to char corresponds to the UTF-16 character "PRIVATE USE TWO" (see also), and all bets are off for what you'll end up displaying there. This character is classified by Unicode as a control character, and seems to fall in the range of characters reserved for ANSI terminal control (although AFAIK isn't actually used, but it's in that range regardless). I wouldn't be surprised if perhaps browsers in some locales rendered it as a right-single-quote for compatibility, but terminals did something weird with it.

Also, fyi, the official UTF-16 code point for right single quote is 0x2019. You could reliably store that in a char by using that value, e.g.:

System.out.println((char)0x2019);

You can also see this for yourself by looking at the value after the conversion from windows-1252:

String s = new String(new byte[] {(byte)146}, "windows-1252");
char c = s.charAt(0);
System.out.printf("0x%x\n", (int)c); // outputs 0x2019

Or, for completeness:

String s = new String(new byte[] {(byte)146}, "windows-1252");
int cp = s.codePointAt(0);
System.out.printf("0x%x\n", cp); // outputs 0x2019


Related Topics



Leave a reply



Submit