How to Render Android's Yuv-Nv21 Camera Image on the Background in Libgdx with Opengles 2.0 in Real-Time

How to render Android's YUV-NV21 camera image on the background in libgdx with OpenGLES 2.0 in real-time?

The short answer is to load the camera image channels (Y,UV) into textures and draw these textures onto a Mesh using a custom fragment shader that will do the color space conversion for us. Since this shader will be running on the GPU, it will be much faster than CPU and certainly much much faster than the Java code. Since this mesh is part of GL, any other 3D shapes or sprites can be safely drawn over or under it.

I solved the problem starting from this answer https://stackoverflow.com/a/17615696/1525238. I understood the general method using the following link: How to use camera view with OpenGL ES, it is written for Bada but the principles are the same. The conversion formulas there were a bit weird so I replaced them with the ones in the Wikipedia article YUV Conversion to/from RGB.

The following are the steps leading to the solution:

YUV-NV21 explanation

Live images from the Android camera are preview images. The default color space (and one of the two guaranteed color spaces) is YUV-NV21 for camera preview. The explanation of this format is very scattered, so I'll explain it here briefly:

The image data is made of (width x height) x 3/2 bytes. The first width x height bytes are the Y channel, 1 brightness byte for each pixel. The following (width / 2) x (height / 2) x 2 = width x height / 2 bytes are the UV plane. Each two consecutive bytes are the V,U (in that order according to the NV21 specification) chroma bytes for the 2 x 2 = 4 original pixels. In other words, the UV plane is (width / 2) x (height / 2) pixels in size and is downsampled by a factor of 2 in each dimension. In addition, the U,V chroma bytes are interleaved.

Here is a very nice image that explains the YUV-NV12, NV21 is just U,V bytes flipped:

YUV-NV12

How to convert this format to RGB?

As stated in the question, this conversion would take too much time to be live if done inside the Android code. Luckily, it can be done inside a GL shader, which runs on the GPU. This will allow it to run VERY fast.

The general idea is to pass the our image's channels as textures to the shader and render them in a way that does RGB conversion. For this, we have to first copy the channels in our image to buffers that can be passed to textures:

byte[] image;
ByteBuffer yBuffer, uvBuffer;

...

yBuffer.put(image, 0, width*height);
yBuffer.position(0);

uvBuffer.put(image, width*height, width*height/2);
uvBuffer.position(0);

Then, we pass these buffers to actual GL textures:

/*
 * Prepare the Y channel texture
 */

//Set texture slot 0 as active and bind our texture object to it
Gdx.gl.glActiveTexture(GL20.GL_TEXTURE0);
yTexture.bind();

//Y texture is (width*height) in size and each pixel is one byte; 
//by setting GL_LUMINANCE, OpenGL puts this byte into R,G and B 
//components of the texture
Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE, 
    width, height, 0, GL20.GL_LUMINANCE, GL20.GL_UNSIGNED_BYTE, yBuffer);

//Use linear interpolation when magnifying/minifying the texture to 
//areas larger/smaller than the texture size
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

/*
 * Prepare the UV channel texture
 */

//Set texture slot 1 as active and bind our texture object to it
Gdx.gl.glActiveTexture(GL20.GL_TEXTURE1);
uvTexture.bind();

//UV texture is (width/2*height/2) in size (downsampled by 2 in 
//both dimensions, each pixel corresponds to 4 pixels of the Y channel) 
//and each pixel is two bytes. By setting GL_LUMINANCE_ALPHA, OpenGL 
//puts first byte (V) into R,G and B components and of the texture
//and the second byte (U) into the A component of the texture. That's 
//why we find U and V at A and R respectively in the fragment shader code.
//Note that we could have also found V at G or B as well. 
Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE_ALPHA, 
    width/2, height/2, 0, GL20.GL_LUMINANCE_ALPHA, GL20.GL_UNSIGNED_BYTE, 
    uvBuffer);

//Use linear interpolation when magnifying/minifying the texture to 
//areas larger/smaller than the texture size
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, 
    GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

Next, we render the mesh we prepared earlier (covers the entire screen). The shader will take care of rendering the bound textures on the mesh:

shader.begin();

//Set the uniform y_texture object to the texture at slot 0
shader.setUniformi("y_texture", 0);

//Set the uniform uv_texture object to the texture at slot 1
shader.setUniformi("uv_texture", 1);

mesh.render(shader, GL20.GL_TRIANGLES);
shader.end();

Finally, the shader takes over the task of rendering our textures to the mesh. The fragment shader that achieves the actual conversion looks like the following:

String fragmentShader = 
    "#ifdef GL_ES\n" +
    "precision highp float;\n" +
    "#endif\n" +

    "varying vec2 v_texCoord;\n" +
    "uniform sampler2D y_texture;\n" +
    "uniform sampler2D uv_texture;\n" +

    "void main (void){\n" +
    "   float r, g, b, y, u, v;\n" +

    //We had put the Y values of each pixel to the R,G,B components by 
    //GL_LUMINANCE, that's why we're pulling it from the R component,
    //we could also use G or B
    "   y = texture2D(y_texture, v_texCoord).r;\n" + 

    //We had put the U and V values of each pixel to the A and R,G,B 
    //components of the texture respectively using GL_LUMINANCE_ALPHA. 
    //Since U,V bytes are interspread in the texture, this is probably 
    //the fastest way to use them in the shader
    "   u = texture2D(uv_texture, v_texCoord).a - 0.5;\n" +
    "   v = texture2D(uv_texture, v_texCoord).r - 0.5;\n" +

    //The numbers are just YUV to RGB conversion constants
    "   r = y + 1.13983*v;\n" +
    "   g = y - 0.39465*u - 0.58060*v;\n" +
    "   b = y + 2.03211*u;\n" +

    //We finally set the RGB color of our pixel
    "   gl_FragColor = vec4(r, g, b, 1.0);\n" +
    "}\n";

Please note that we are accessing the Y and UV textures using the same coordinate variable v_texCoord, this is due to v_texCoord being between -1.0 and 1.0 which scales from one end of the texture to the other as opposed to actual texture pixel coordinates. This is one of the nicest features of shaders.

The full source code

Since libgdx is cross-platform, we need an object that can be extended differently in different platforms that handles the device camera and rendering. For example, you might want to bypass YUV-RGB shader conversion altogether if you can get the hardware to provide you with RGB images. For this reason, we need a device camera controller interface that will be implemented by each different platform:

public interface PlatformDependentCameraController {

    void init();

    void renderBackground();

    void destroy();
}

The Android version of this interface is as follows (the live camera image is assumed to be 1280x720 pixels):

public class AndroidDependentCameraController implements PlatformDependentCameraController, Camera.PreviewCallback {

    private static byte[] image; //The image buffer that will hold the camera image when preview callback arrives

    private Camera camera; //The camera object

    //The Y and UV buffers that will pass our image channel data to the textures
    private ByteBuffer yBuffer;
    private ByteBuffer uvBuffer;

    ShaderProgram shader; //Our shader
    Texture yTexture; //Our Y texture
    Texture uvTexture; //Our UV texture
    Mesh mesh; //Our mesh that we will draw the texture on

    public AndroidDependentCameraController(){

        //Our YUV image is 12 bits per pixel
        image = new byte[1280*720/8*12];
    }

    @Override
    public void init(){

        /*
         * Initialize the OpenGL/libgdx stuff
         */

        //Do not enforce power of two texture sizes
        Texture.setEnforcePotImages(false);

        //Allocate textures
        yTexture = new Texture(1280,720,Format.Intensity); //A 8-bit per pixel format
        uvTexture = new Texture(1280/2,720/2,Format.LuminanceAlpha); //A 16-bit per pixel format

        //Allocate buffers on the native memory space, not inside the JVM heap
        yBuffer = ByteBuffer.allocateDirect(1280*720);
        uvBuffer = ByteBuffer.allocateDirect(1280*720/2); //We have (width/2*height/2) pixels, each pixel is 2 bytes
        yBuffer.order(ByteOrder.nativeOrder());
        uvBuffer.order(ByteOrder.nativeOrder());

        //Our vertex shader code; nothing special
        String vertexShader = 
                "attribute vec4 a_position;                         \n" + 
                "attribute vec2 a_texCoord;                         \n" + 
                "varying vec2 v_texCoord;                           \n" + 

                "void main(){                                       \n" + 
                "   gl_Position = a_position;                       \n" + 
                "   v_texCoord = a_texCoord;                        \n" +
                "}                                                  \n";

        //Our fragment shader code; takes Y,U,V values for each pixel and calculates R,G,B colors,
        //Effectively making YUV to RGB conversion
        String fragmentShader = 
                "#ifdef GL_ES                                       \n" +
                "precision highp float;                             \n" +
                "#endif                                             \n" +

                "varying vec2 v_texCoord;                           \n" +
                "uniform sampler2D y_texture;                       \n" +
                "uniform sampler2D uv_texture;                      \n" +

                "void main (void){                                  \n" +
                "   float r, g, b, y, u, v;                         \n" +

                //We had put the Y values of each pixel to the R,G,B components by GL_LUMINANCE, 
                //that's why we're pulling it from the R component, we could also use G or B
                "   y = texture2D(y_texture, v_texCoord).r;         \n" + 

                //We had put the U and V values of each pixel to the A and R,G,B components of the
                //texture respectively using GL_LUMINANCE_ALPHA. Since U,V bytes are interspread 
                //in the texture, this is probably the fastest way to use them in the shader
                "   u = texture2D(uv_texture, v_texCoord).a - 0.5;  \n" +                                   
                "   v = texture2D(uv_texture, v_texCoord).r - 0.5;  \n" +

                //The numbers are just YUV to RGB conversion constants
                "   r = y + 1.13983*v;                              \n" +
                "   g = y - 0.39465*u - 0.58060*v;                  \n" +
                "   b = y + 2.03211*u;                              \n" +

                //We finally set the RGB color of our pixel
                "   gl_FragColor = vec4(r, g, b, 1.0);              \n" +
                "}                                                  \n"; 

        //Create and compile our shader
        shader = new ShaderProgram(vertexShader, fragmentShader);

        //Create our mesh that we will draw on, it has 4 vertices corresponding to the 4 corners of the screen
        mesh = new Mesh(true, 4, 6, 
                new VertexAttribute(Usage.Position, 2, "a_position"), 
                new VertexAttribute(Usage.TextureCoordinates, 2, "a_texCoord"));

        //The vertices include the screen coordinates (between -1.0 and 1.0) and texture coordinates (between 0.0 and 1.0)
        float[] vertices = {
                -1.0f,  1.0f,   // Position 0
                0.0f,   0.0f,   // TexCoord 0
                -1.0f,  -1.0f,  // Position 1
                0.0f,   1.0f,   // TexCoord 1
                1.0f,   -1.0f,  // Position 2
                1.0f,   1.0f,   // TexCoord 2
                1.0f,   1.0f,   // Position 3
                1.0f,   0.0f    // TexCoord 3
        };

        //The indices come in trios of vertex indices that describe the triangles of our mesh
        short[] indices = {0, 1, 2, 0, 2, 3};

        //Set vertices and indices to our mesh
        mesh.setVertices(vertices);
        mesh.setIndices(indices);

        /*
         * Initialize the Android camera
         */
        camera = Camera.open(0);

        //We set the buffer ourselves that will be used to hold the preview image
        camera.setPreviewCallbackWithBuffer(this); 

        //Set the camera parameters
        Camera.Parameters params = camera.getParameters();
        params.setFocusMode(Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO);
        params.setPreviewSize(1280,720); 
        camera.setParameters(params);

        //Start the preview
        camera.startPreview();

        //Set the first buffer, the preview doesn't start unless we set the buffers
        camera.addCallbackBuffer(image);
    }

    @Override
    public void onPreviewFrame(byte[] data, Camera camera) {

        //Send the buffer reference to the next preview so that a new buffer is not allocated and we use the same space
        camera.addCallbackBuffer(image);
    }

    @Override
    public void renderBackground() {

        /*
         * Because of Java's limitations, we can't reference the middle of an array and 
         * we must copy the channels in our byte array into buffers before setting them to textures
         */

        //Copy the Y channel of the image into its buffer, the first (width*height) bytes are the Y channel
        yBuffer.put(image, 0, 1280*720);
        yBuffer.position(0);

        //Copy the UV channels of the image into their buffer, the following (width*height/2) bytes are the UV channel; the U and V bytes are interspread
        uvBuffer.put(image, 1280*720, 1280*720/2);
        uvBuffer.position(0);

        /*
         * Prepare the Y channel texture
         */

        //Set texture slot 0 as active and bind our texture object to it
        Gdx.gl.glActiveTexture(GL20.GL_TEXTURE0);
        yTexture.bind();

        //Y texture is (width*height) in size and each pixel is one byte; by setting GL_LUMINANCE, OpenGL puts this byte into R,G and B components of the texture
        Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE, 1280, 720, 0, GL20.GL_LUMINANCE, GL20.GL_UNSIGNED_BYTE, yBuffer);

        //Use linear interpolation when magnifying/minifying the texture to areas larger/smaller than the texture size
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

        /*
         * Prepare the UV channel texture
         */

        //Set texture slot 1 as active and bind our texture object to it
        Gdx.gl.glActiveTexture(GL20.GL_TEXTURE1);
        uvTexture.bind();

        //UV texture is (width/2*height/2) in size (downsampled by 2 in both dimensions, each pixel corresponds to 4 pixels of the Y channel) 
        //and each pixel is two bytes. By setting GL_LUMINANCE_ALPHA, OpenGL puts first byte (V) into R,G and B components and of the texture
        //and the second byte (U) into the A component of the texture. That's why we find U and V at A and R respectively in the fragment shader code.
        //Note that we could have also found V at G or B as well. 
        Gdx.gl.glTexImage2D(GL20.GL_TEXTURE_2D, 0, GL20.GL_LUMINANCE_ALPHA, 1280/2, 720/2, 0, GL20.GL_LUMINANCE_ALPHA, GL20.GL_UNSIGNED_BYTE, uvBuffer);

        //Use linear interpolation when magnifying/minifying the texture to areas larger/smaller than the texture size
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MIN_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_MAG_FILTER, GL20.GL_LINEAR);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_S, GL20.GL_CLAMP_TO_EDGE);
        Gdx.gl.glTexParameterf(GL20.GL_TEXTURE_2D, GL20.GL_TEXTURE_WRAP_T, GL20.GL_CLAMP_TO_EDGE);

        /*
         * Draw the textures onto a mesh using our shader
         */

        shader.begin();

        //Set the uniform y_texture object to the texture at slot 0
        shader.setUniformi("y_texture", 0);

        //Set the uniform uv_texture object to the texture at slot 1
        shader.setUniformi("uv_texture", 1);

        //Render our mesh using the shader, which in turn will use our textures to render their content on the mesh
        mesh.render(shader, GL20.GL_TRIANGLES);
        shader.end();
    }

    @Override
    public void destroy() {
        camera.stopPreview();
        camera.setPreviewCallbackWithBuffer(null);
        camera.release();
    }
}

The main application part just ensures that init() is called once in the beginning, renderBackground() is called every render cycle and destroy() is called once in the end:

public class YourApplication implements ApplicationListener {

    private final PlatformDependentCameraController deviceCameraControl;

    public YourApplication(PlatformDependentCameraController cameraControl) {
        this.deviceCameraControl = cameraControl;
    }

    @Override
    public void create() {              
        deviceCameraControl.init();
    }

    @Override
    public void render() {      
        Gdx.gl.glViewport(0, 0, Gdx.graphics.getWidth(), Gdx.graphics.getHeight());
        Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT | GL20.GL_DEPTH_BUFFER_BIT);

        //Render the background that is the live camera image
        deviceCameraControl.renderBackground();

        /*
         * Render anything here (sprites/models etc.) that you want to go on top of the camera image
         */
    }

    @Override
    public void dispose() {
        deviceCameraControl.destroy();
    }

    @Override
    public void resize(int width, int height) {
    }

    @Override
    public void pause() {
    }

    @Override
    public void resume() {
    }
}

The only other Android-specific part is the following extremely short main Android code, you just create a new Android specific device camera handler and pass it to the main libgdx object:

public class MainActivity extends AndroidApplication {

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);

        AndroidApplicationConfiguration cfg = new AndroidApplicationConfiguration();
        cfg.useGL20 = true; //This line is obsolete in the newest libgdx version
        cfg.a = 8;
        cfg.b = 8;
        cfg.g = 8;
        cfg.r = 8;

        PlatformDependentCameraController cameraControl = new AndroidDependentCameraController();
        initialize(new YourApplication(cameraControl), cfg);

        graphics.getView().setKeepScreenOn(true);
    }
}

How fast is it?

I tested this routine on two devices. While the measurements are not constant across frames, a general profile can be observed:

Samsung Galaxy Note II LTE - (GT-N7105): Has ARM Mali-400 MP4 GPU.
- Rendering one frame takes around 5-6 ms, with occasional jumps to around 15 ms every couple of seconds
- Actual rendering line (mesh.render(shader, GL20.GL_TRIANGLES);) consistently takes 0-1 ms
- Creation and binding of both textures consistently take 1-3 ms in total
- ByteBuffer copies generally take 1-3 ms in total but jump to around 7ms occasionally, probably due to the image buffer being moved around in the JVM heap
Samsung Galaxy Note 10.1 2014 - (SM-P600): Has ARM Mali-T628 GPU.
- Rendering one frame takes around 2-4 ms, with rare jumps to around 6-10 ms
- Actual rendering line (mesh.render(shader, GL20.GL_TRIANGLES);) consistently takes 0-1 ms
- Creation and binding of both textures take 1-3 ms in total but jump to around 6-9 ms every couple of seconds
- ByteBuffer copies generally take 0-2 ms in total but jump to around 6ms very rarely

Please don't hesitate to share if you think that these profiles can be made faster with some other method. Hope this little tutorial helped.

Render camera in high resolution, with lower quality of image in onPreviewFrame

No it isn't. Most image processing libraries know how to downscale their input. If yours is not the case, you should resample the byte[] in your onPreviewFrame() callback for it. Note that often AR processes only need grayscale, and thus you can save 33% of CPU cycles of downscaling.

LibGDX + ARCore: Usage of multiple cameras and viewports

Problem solved. It turned out, that it is not even necessary to work with Viewports etc, we can simply restrict the drawing area on the surface by using

HdpiUtils.glViewport(0, 0, Gdx.graphics.getWidth() / 2, Gdx.graphics.getHeight());

However you need to be aware, that you now also need to convert screen taps by e.g.:

int x = Gdx.input.getX() * 2;
int y = Gdx.input.getY();

Because of shrinking the viewport to a portion of the original screen, but all input handling methods expect the screen taps to originate from a fullscreen.

How to Render Android's Yuv-Nv21 Camera Image on the Background in Libgdx with Opengles 2.0 in Real-Time