How can I iterate through the unicode codepoints of a Java String?
Yes, Java uses a UTF-16-esque encoding for internal representations of Strings, and, yes, it encodes characters outside the Basic Multilingual Plane (BMP) using the surrogacy scheme.
If you know you'll be dealing with characters outside the BMP, then here is the canonical way to iterate over the characters of a Java String:
final int length = s.length();
for (int offset = 0; offset < length; ) {
final int codepoint = s.codePointAt(offset);
// do something with the codepoint
offset += Character.charCount(codepoint);
}
Iterate through unicode characters dynamically
Unicode in the range U+1200 to U+137F covers Ethiopic as well as Amharic, so it exists in the BMP (Basic Multilingual Plane) and can be represented by a 16 bit value.
doing "(char)i" converts it to an ASCII character [???]
False. Unlike some other languages, a char in Java is 2 bytes large, so that is sufficient for your purposes.
For more information see: Comparing a char to a code-point?
What is the easiest/best/most correct way to iterate through the characters of a string in Java?
I use a for loop to iterate the string and use charAt()
to get each character to examine it. Since the String is implemented with an array, the charAt()
method is a constant time operation.
String s = "...stuff...";
for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
//Process char
}
That's what I would do. It seems the easiest to me.
As far as correctness goes, I don't believe that exists here. It is all based on your personal style.
How to iterate over over all Unicode characters?
According to the docs, the parameter passed to String.fromCharCode(a)
is converted calling ToUint16
and then said character is returned. You may call it with any number you want but the values will be capped to between 0 and 216 or 232
highNumber = 500; //This could go very high
out = ""
for(i=0;i<highNumber;i++){
out += String.fromCharCode(i);
}
console.log(out);
Danger note if you run this code using 2^16
you may freeze your tab or browser, it's way too big. This is understanding you want to iterate over all characters and not all characters in a given string which is quite a different thing.
A sample output of a more reasonable highNumber
(ie 500) is the following:
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
stuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæç
èéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺ
ĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍ
ƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠ
ǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdz
Iterating through Unicode codepoints character by character
Use the ICU library.
http://site.icu-project.org/
for example:
http://icu-project.org/apiref/icu4c/classUnicodeString.html#ae3ffb6e15396dff152cb459ce4008f90
is the function that returns the character at a particular character offset in a string.
How to iterate through unicode characters and print them on the screen with printf in C?
If the __STDC_ISO_10646__
macro is defined, wide characters correspond to Unicode codepoints. So, assuming a locale that can represent the characters you are interested in, you can just printf()
wide characters via the %lc
format conversion:
#include <stdio.h>
#include <locale.h>
#ifndef __STDC_ISO_10646__
#error "Oops, our wide chars are not Unicode codepoints, sorry!"
#endif
int main()
{
int i;
setlocale(LC_ALL, "");
for (i = 0; i < 0xffff; i++) {
printf("%x - %lc\n", i, i);
}
return 0;
}
Related Topics
Java Program That Runs Commands with Linux Terminal
How to Install Intellij Idea on Ubuntu
How to Downsample Images Within PDF File
Start a Jar File Like Service in Linux
Error: Unable to Load Installed Packages Just Now
How to Make Rjava Use the Newer Version of Java on Osx
Library to Read/Write Pbxproj/Xcodeproj Files
Compute Hex Color Code for an Arbitrary String
Aes Java Encoding, Ruby Decoding
Spring JSON Request Getting 406 (Not Acceptable)
Does Python Have an Equivalent to Java Class.Forname()
How to Connect to Ftps Server with Data Connection Using Same Tls Session
Why Does the Jvm Consume Less Memory Than -Xms Specified
Java - Process.Destroy() Source Code for Linux
Noclassdeffounderror in Eclipse