Get Utf-8 Input with X11 Display

Get UTF-8 input with X11 Display

You have to do this:

            if (XFilterEvent(&ev, win))
continue;

in your event loop. This runs the input method machinery, without it you will get raw X events. For example, when you press a dead accent key followed by a letter key, and do not call XFilterEvent, you will get two KeyPress events as usual. But if you do the call, you will get three events. There are two raw events, for which XFilterEvent(&ev, win) returns True. And then there is one event synthesized by the input method, for which XFilterEvent(&ev, win) returns False. It is this third event that contains the accented character.

If you want both raw events and those synthesized by the input method, you can of course do your own raw event processing instead of continue.

Note you will need buf[count] = 0; in order to print buf correctly (or explicitly use a length), Xutf8LookupString doesn't null-terminate its output.

Finally, as mentioned in the comments, with recent versions of X11 you will need to specify a modify to XSetLocaleModifiers such as XSetLocaleModifiers("@im=none"), otherwise the extra events won't be generated.

Here is a corrected version of the code:

#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <X11/Xresource.h>
#include <X11/Xlocale.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char ** argv)
{
int screen_num, width, height;
unsigned long background, border;
Window win;
XEvent ev;
Display *dpy;
XIM im;
XIC ic;
char *failed_arg;
XIMStyles *styles;
XIMStyle xim_requested_style;

/* First connect to the display server, as specified in the DISPLAY
environment variable. */
if (setlocale(LC_ALL, "") == NULL) {
return 9;
}

if (!XSupportsLocale()) {
return 10;
}
if (XSetLocaleModifiers("@im=none") == NULL) {
return 11;
}

dpy = XOpenDisplay(NULL);
if (!dpy) {
fprintf(stderr, "unable to connect to display");
return 7;
}
/* these are macros that pull useful data out of the display object */
/* we use these bits of info enough to want them in their own variables */
screen_num = DefaultScreen(dpy);
background = BlackPixel(dpy, screen_num);
border = WhitePixel(dpy, screen_num);

width = 400; /* start with a small window */
height = 200;

win = XCreateSimpleWindow(dpy, DefaultRootWindow(dpy), /* display, parent */
0,0, /* x, y: the window manager will place the window elsewhere */
width, height, /* width, height */
2, border, /* border width & colour, unless you have a window manager */
background); /* background colour */

/* tell the display server what kind of events we would like to see */
XSelectInput(dpy, win, ButtonPressMask|StructureNotifyMask|KeyPressMask|KeyReleaseMask);

/* okay, put the window on the screen, please */
XMapWindow(dpy, win);

im = XOpenIM(dpy, NULL, NULL, NULL);
if (im == NULL) {
fputs("Could not open input method\n", stdout);
return 2;
}

failed_arg = XGetIMValues(im, XNQueryInputStyle, &styles, NULL);

if (failed_arg != NULL) {
fputs("XIM Can't get styles\n", stdout);
return 3;
}

int i;
for (i = 0; i < styles->count_styles; i++) {
printf("style %d\n", (int)styles->supported_styles[i]);
}
ic = XCreateIC(im, XNInputStyle, XIMPreeditNothing | XIMStatusNothing, XNClientWindow, win, NULL);
if (ic == NULL) {
printf("Could not open IC\n");
return 4;
}

XSetICFocus(ic);

/* as each event that we asked about occurs, we respond. In this
* case we note if the window's shape changed, and exit if a button
* is pressed inside the window */
while(1) {
XNextEvent(dpy, &ev);
if (XFilterEvent(&ev, win))
continue;
switch(ev.type){
case MappingNotify:
XRefreshKeyboardMapping(&ev.xmapping);
break;
case KeyPress:
{
int count = 0;
KeySym keysym = 0;
char buf[20];
Status status = 0;
count = Xutf8LookupString(ic, (XKeyPressedEvent*)&ev, buf, 20, &keysym, &status);

printf("count: %d\n", count);
if (status==XBufferOverflow)
printf("BufferOverflow\n");

if (count)
printf("buffer: %.*s\n", count, buf);

if (status == XLookupKeySym || status == XLookupBoth) {
printf("status: %d\n", status);
}
printf("pressed KEY: %d\n", (int)keysym);
}
break;
case KeyRelease:
{
int count = 0;
KeySym keysym = 0;
char buf[20];
Status status = 0;
count = XLookupString((XKeyEvent*)&ev, buf, 20, &keysym, NULL);

if (count)
printf("in release buffer: %.*s\n", count, buf);

printf("released KEY: %d\n", (int)keysym);
}
break;
case ConfigureNotify:
if (width != ev.xconfigure.width
|| height != ev.xconfigure.height) {
width = ev.xconfigure.width;
height = ev.xconfigure.height;
printf("Size changed to: %d by %d", width, height);
}
break;
case ButtonPress:
XCloseDisplay(dpy);
return 0;
}
fflush(stdout);
}
}

XLookupString returning a UTF-8 code (Latin-1 to UTF-8)

You need to read chapter 11 of the Xlib programming manual: google books link. You are looking for XmbLookupString() or XwcLookupString() but they are not drop-in substitutes for XLookupString(). I am not an expert in this but this should point you in the right direction.

How to map a X11 KeySym to a Unicode character?

Is there a simple function in X11/Xlib that will map a KeySym to its
Unicode equivalent?

The definitive answer is no

Because Unicode was invented years after Xlib and no one ever went
back to add such a thing? Most of the Xlib API is codeset
independent since it was written in the days when every locale used a
different character set (ISO 8859-*, Big5, JIS, etc.), so you get a
char buffer appropriate to the current locale. There were a few UTF-8
specific additions in later years, but mostly we've been trying to let
Xlib rest in peace since then, pushing new API design towards xcb
instead.

X11 XM_NAME type is 'UTF-8' rather than STRING_UTF8

It looks like besides the standard types STRING, COMPOUND_STRING and UTF8_STRING (the latter is an XFree86 extension), it is also acceptable to have any multibyte encoding.

When passing XTextStyle to XmbTextListToTextProperty will simply take the current encoding from the current locale. In the en_US.UTF-8 locale, that would be UTF-8. To get the standardized (by XFree86) UTF8_STRING type for the property, we need to pass XUTF8StringStyle to XmbTextListToTextProperty instead of XTextStyle

Using fgetws after setting a UTF-8 locale?

Luckily my system produces the same error and the same backtrace as yours (same files, same line numbers) so I have been able to investigate a bit.

Here's what causes the segmentation failure: after the return from main, all the internal structures are freed, one of them being stdin. Up to the point of _IO_wfile_sync the same first goes off for stdout which does not cause any problems, so it is intended to happen. The difference is that for stdin, the delta at line 508 is zero for stdout, leading to skipping of most of the function's code, but nonzero for stdin. At this point, fp->_wide_data->_IO_read_end points (understandably) to the end of the input string L"5555555555\n", while fp->_wide_data->_IO_read_ptr is at the third character (after reading two), and the difference is -9.

Now if you ask me storing a negative difference in some type called _IO_ssize_t smells, and surely enough this does cause trouble. Line 531 calls the function do_length which expects an argument max for a buffer size and receives -9 (or, presumably, 2^word_size - 9). Among the first lines in this function is the declaration

wchar_t to_buf[max];

which results in increasing the stack pointer instead of decreasing it, and data that should have been safely stored there (among them the pointer fp of _IO_wfile_sync(), as it ends up stored in a register rbx) is overwritten at the first opportunity.

After the return from the function fp is overwritten with something that does not make sense (NULL, in my case), even though it has never been exposed to it, and dereferencing it on line 534 causes a SIGSEGV, as the backtrace tells us.

I haven't read enough of the code to make an educated guess on whether line 508 should have maybe said

delta = fp->_wide_data->_IO_read_end - fp->_wide_data->_IO_read_ptr;

instead of the opposite, or if -delta should have been passed on for max, or if it is unexpected behaviour that the _ptr points before the _end, but certainly anything that results in passing a negative value to a variable length array is not OK. As both of the files referenced here are part of glibc, I think it's safe to assume that would be the right place to direct a bug report to. This goes along well with the negative confirmations from non-glibc systems.

PS. This does not happen for non-UTF locales because the call leading to do_length is only executed for variable-length encodings (it's wrapped in an if-else on line 518). If it's a 8-bit or fixed 16- or 32-bit UCS (supposedly), delta only gets multiplied by a constant. If the encoding can have varying byte lengths per character, the corresponding calculation must look inside the buffer to figure out how many characters it represents, or construct the representation in order to determine how many characters it will take.



Related Topics



Leave a reply



Submit