Hardware Acceleration Without X

Hardware acceleration without X

The answer will depend on your user application. If everything is bare metal and your application team is writing everything, the DirectFB API can be used as Fredrik suggest. This might be especially interesting if you use the framebuffer version of GTK.

However, if you are using Qt, then this is not the best way forward. Qt5.0 does away with QWS (Qt embedded acceleration). Qt is migrating to LightHouse, now known as QPA. If you write a QPA plug-in that uses your graphics acceleration by whatever kernel mechanism you expose, then you have accelerated Qt graphics. Also of interest might be the Wayland architecture; there are QPA plug-ins for Wayland. Support exists for QPA in Qt4.8+ and Qt5.0+. Skia is also an interesting graphics API with support for an OpenGL backend; Skia is used by Android devices.

Getting graphics acceleration is easy. Do you want compositing? What is your memory foot print? Who is your developer audience that will program to the API? Do you need object functionality or just drawing primitives? There is a big difference between SKIA, PegUI, WindML and full blown graphics frameworks (Gtk, Qt) with all the widget and dynamics effects that people expect today. Programming to the OpenGL ES API might seem fine at first glance, but if your application has any complexity you will need a richer graphics framework; Mostly re-iterating Mats Petersson's comment.

Edit: From the Qt embedded acceleration link,

CPU blitter - slowest
Hardware blitter - Eg, directFB. Fast memory movement usually with bit ops as opposed to machine words, like DMA.
2D vector - OpenVG, Stick figure drawing, with bit manipulation.
3D drawing - OpenGL(ES) has polygon fills, etc.

This is the type of drawing you wish to perform. A framework like Qt and Gtk, give an API to put a radio button, checkbox, editbox, etc on the screen. It also has styling of the text and interaction with a keyboard, mouse and/or touch screen and other elements. A framework uses the drawing engine to put the objects on the screen.

Graphics acceleration is just putting algorithms like a Bresenham algorithm in a separate CPU or dedicated hardware. If the framework you chose doesn't support 3D objects, the frameworks is unlikely to need OpenGL support and may not perform any better.

The final piece of the puzzle is a window manager. Many embedded devices do not need this. However, many handset are using compositing and alpha values to create transparent windows and allow multiple apps to be seen at the same time. This may also influence your graphics API.

Additionally: DRI without X gives some compelling reasons why this might not be a good thing to do; for the case of a single user task, the DRI is not even needed.

The following is a diagram of a Wayland graphics stack a blog on Wayland.

Wayland diagram

Vt-x/AMD-V hardware acceleration is not available on your system

It depends on what tool your using to run your VM,

e.g. Virtual box there is setting to enable Hardware virtualization for guest VM

Edited:

If you don't have hardware Virtualisation enable you can try with software Virtualisation this will impact performance

Sample Image

Java Hardware Acceleration

1)
So far hardware acceleration is never enabled by default, and to my knowledge it has not changed yet. To activate rendering acceleration pass this arg (-Dsun.java2d.opengl=true) to the Java launcher at program start up, or set it before using any rendering libraries. System.setProperty("sun.java2d.opengl", "true"); It is an optional parameter.

2)
Yes BufferedImage encapsulates some of the details of managing the Volatile Memory because, when the BufferdImage is accelerated a copy of it is stored in V-Ram as a VolatileImage.

The upside to a BufferedImage is as long as you are not messing with the pixels it contains, just copying them like a call to graphics.drawImage(), then the BufferedImage will be accelerated after a certain non specified number of copies and it will manage the VolatileImage for you.

The downside to a BufferedImage is if you are doing image editing, changing the pixels in the BufferedImage, in some cases it will give up trying to accelerate it, at that point if you are looking for performant rendering for your editing you need to consider managing your own VolatileImage. I do not know which operations make the BufferedImage give up on trying to accelerate rendering for you.

3)
The advantage of using the createCompatibleImage()/createCompatibleVolatileImage()
is that ImageIO.read() does not do any conversion to a default supported Image Data Model.
So if you import a PNG it will represent it in the format built by the PNG reader. This means that every time it is rendered by a GraphicsDevice it must first be converted to a compatible Image Data Model.

BufferedImage image = ImageIO.read ( url );
BufferedImage convertedImage = null;
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment ();
GraphicsDevice gd = ge.getDefaultScreenDevice ();
GraphicsConfiguration gc = gd.getDefaultConfiguration ();
convertedImage = gc.createCompatibleImage (image.getWidth (), 
                                           image.getHeight (), 
                                           image.getTransparency () );
Graphics2D g2d = convertedImage.createGraphics ();
g2d.drawImage ( image, 0, 0, image.getWidth (), image.getHeight (), null );
g2d.dispose()

The above process will convert an image read in with the image io api to a BufferedImage that has a Image Data Model compatible with the default screen device so that conversion does not need to take place when ever it is rendered. The times when this is most advantageous is when you will be rendering the image very frequently.

4)
You do not need to make an effort to batch your image rendering because for the most part Java will attempt to do this for you. There is no reason why you cant attempt to do this but in general it is better to profile your applications and confirm that there is a bottleneck at the image rendering code before you attempt to carry out a performance optimization such as this. The main disadvantage is that it my be implemented slightly differently in each JVM and then the enhancements might be worthless.

5)
To the best of my knowledge the design you have outlined is one of the better strategies out there when doing Double Buffering manually and actively rendering an application.
http://docs.oracle.com/javase/7/docs/api/java/awt/image/BufferStrategy.html
At this link you will find a description of the BufferStrategy. In the description it shows a code snippet that is the recommended way to do active rendering with a BufferStrategy object. I use this particular technique for my active rendering code. The only major difference is that in my code. like you, I have created the BufferStrategy on an instance of a Canvas which I put on a JFrame.

Which OpenGL functions are not GPU-accelerated?

Boy, is this a big subject.

First, I'll start with the obvious: Since you're calling the function (any function) from the CPU, it has to run at least partly on the CPU. So the question really is, how much of the work is done on the CPU and how much on the GPU.

Second, in order for the GPU to get to execute some command, the CPU has to prepare a command description to pass down. The minimal set here is a command token describing what to do, as well as the data for the operation to be executed. How the CPU triggers the GPU to do the command is also somewhat important. Since most of the time, this is expensive, the CPU does not do it often, but rather batches commands in command buffers, and simply sends a whole buffer for the GPU to handle.

All this to say that passing work down to the GPU is not a free exercise. That cost has to be pitted against just running the function on the CPU (no matter what we're talking about).

Taking a step back, you have to ask yourself why you need a GPU at all. The fact is, a pure CPU implementation does the job (as AshleysBrain mentions). The power of the GPU comes from its design to handle:

specialized tasks (rasterization, blending, texture filtering, blitting, ...)
heavily parallel workloads (DeadMG is pointing to that in his answer), when a CPU is more designed to handle single-threaded work.

And those are the guiding principles to follow in order to decide what goes in the chip. Anything that can benefit from those ought to run on the GPU. Anything else ought to be on the CPU.

It's interesting, by the way. Some functionality of the GL (prior to deprecation, mostly) are really not clearly delineated. Display lists are probably the best example of such a feature. Each driver is free to push as much as it wants from the display list stream to the GPU (typically in some command buffer form) for later execution, as long as the semantics of the GL display lists are kept (and that is somewhat hard in general). So some implementations only choose to push a limited subset of the calls in a display list to a computed format, and choose to simply replay the rest of the command stream on the CPU.

Selection is another one where it's unclear whether there is value to executing on the GPU.

Lastly, I have to say that in general, there is little correlation between the API calls and the amount of work on either the CPU or the GPU. A state setting API tends to only modify a structure somewhere in the driver data. It's effect is only visible when a Draw, or some such, is called.

A lot of the GL API works like that. At that point, asking whether glEnable(GL_BLEND) is executed on the CPU or GPU is rather meaningless. What matters is whether the blending will happen on the GPU when Draw is called. So, in that sense, Most GL entry points are not accelerated at all.

I could also expand a bit on data transfer but Danvil touched on it.

I'll finish with the little "s/w path". Historically, GL had to work to spec no matter what the hardware special cases were. Which meant that if the h/w was not handling a specific GL feature, then it had to emulate it, or implement it fully in software. There are numerous cases of this, but one that struck a lot of people is when GLSL started to show up.

Since there was no practical way to estimate the code size of a GLSL shader, it was decided that the GL was supposed to take any shader length as valid. The implication was fairly clear: either implement h/w that could take arbitrary length shaders -not realistic at the time-, or implement a s/w shader emulation (or, as some vendors chose to, simply fail to be compliant). So, if you triggered this condition on a fragment shader, chances were the whole of your GL ended up being executed on the CPU, even when you had a GPU siting idle, at least for that draw.

VS code on my MAC after update to v.1.69.2 appearance problem

This is related to hardware acceleration. Try to disable it in the file /Users/<your_username>/.vscode/argv.json and uncomment the line "disable-hardware-acceleration": true,. Then, make sure you restart the IDE by quitting it first.

It would become the following:

// This configuration file allows you to pass permanent command line arguments to VS Code.
// Only a subset of arguments is currently supported to reduce the likelihood of breaking
// the installation.
//
// PLEASE DO NOT CHANGE WITHOUT UNDERSTANDING THE IMPACT
//
// NOTE: Changing this file requires a restart of VS Code.
{
    // Use software rendering instead of hardware accelerated rendering.
    // This can help in cases where you see rendering issues in VS Code.
    "disable-hardware-acceleration": true,

    // Enabled by default by VS Code to resolve color issues in the renderer
    // See https://github.com/microsoft/vscode/issues/51791 for details
    "disable-color-correct-rendering": true,

    // Allows to disable crash reporting.
    // Should restart the app if the value is changed.
    "enable-crash-reporter": true,

    // Unique id used for correlating crash reports sent from this instance.
    // Do not edit this value.
    "crash-reporter-id": "81f0b16e-4c41-4fd3-b37c-b04c643b5f79"
}

Hardware Acceleration Without X