Is Asynchronous Jdbc Call Possible

Is asynchronous jdbc call possible?

I don't understand how any of the proposed approaches that wrap JDBC calls in Actors, executors or anything else can help here - can someone clarify.

Surely the basic problem is that the JDBC operations block on socket IO. When it does this it blocks the Thread its running on - end of story. Whatever wrapping framework you choose to use its going to end up with one thread being kept busy/blocked per concurrent request.

If the underlying database drivers (MySql?) offers a means to intercept the socket creation (see SocketFactory) then I imagine it would be possible to build an async event driven database layer on top of the JDBC api but we'd have to encapsulate the whole JDBC behind an event driven facade, and that facade wouldn't look like JDBC (after it would be event driven). The database processing would happen async on a different thread to the caller, and you'd have to work out how to build a transaction manager that doesn't rely on thread affinity.

Something like the approach I mention would allow even a single background thread to process a load of concurrent JDBC exec's. In practice you'd probably run a pool of threads to make use of multiple cores.

(Of course I'm not commenting on the logic of the original question just the responses that imply that concurrency in a scenario with blocking socket IO is possible without the user of a selector pattern - simpler just to work out your typical JDBC concurrency and put in a connection pool of the right size).


Looks like MySql probably does something along the lines I'm suggesting ---
http://code.google.com/p/async-mysql-connector/wiki/UsageExample

Is it possible to access a database asynchronously through Java NIO non-blocking sockets?

It's possible, but not with JDBC. Unless you want to use the raw SocketChannel interface and parse the results yourself, the only async database drivers on the JVM I'm aware of are https://github.com/mauricio/postgresql-async (which despite the name also supports MySQL). They're written in Scala but it should be possible to call them from Java (since it's a JVM language), though I can't say how Java-friendly the API will be.

JDBC calls wrapped in Scala Future

Does the underlying thread idle while waiting for the database response?
Yes, the thread is blocked until JDBC call finishes. It's not a good thing, but until adba is ready there is probably no better option.

It is a common pattern to use Future for blocking IO like JDBC calls. There are some things to consider though. There's a great article on that topic on github.

Some points to sum up things described in the article:

  • wrap your blocking calls inside blocking block, like that:

    def fetchUser(id: Long): Future[User]  = Future {
    blocking { //mark this operation as blocking
    ...
    preparedStatement.execute()
    ...
    }
    }
  • you shouldn't use scala.concurrent.ExecutionContext.Implicits.global for futures that do any blocking, because you might starve thread pool. You should rather create a separate thread pool for your blocking operations:

    object BlockingIOExecutionContext {
    implicit val ec: ExecutionContextExecutor = ExecutionContext.fromExecutor(
    Executors.newCachedThreadPool()
    ) // create seperate thread pool for our blocking operations
    }

The best option for you would be just to use some kind of mature Scala frameworks, that do these things for you, like slick or doobie.

Async database API for Java

There is no standard API like JBDC which would allow you to asynchronously call any DB. However there is this Google Project which tries to do exactly this for PostgreSQL and MySQL.

You may also take a look at this question, which addresses similar stuff:

Is asynchronous jdbc call possible?

How to handle transactions in Async calls

  • Don't try to interact with the same DB connection from multiple threads at once. JDBC's connection system isn't specced to let you do this.
  • A transaction belongs to a single connection. You can't smear it out over multiples.
  • The obvious way to ensure that 'it is all rolled back' is to have a single long-lived transaction (but see later).

Combine these 3 facets and you end up with: Do all work in the async block. At least, all work that either needs to all happen, or none of it happens (i.e. the one transaction).

Any other basic approach wouldn't work or wouldn't be useful; there's no point freezing the main thread to wait for the async task (just do the async task on the spot; moving code to another thread doesn't magically make it go any faster. On the contrary, in fact).

However, transactions that aren't just long lived, but make a ton of changes to a DB is its own problem, but we now we're getting into the performance characteristics of your specific batch of queries and your particular DB engine, version, indices, and data. Kinda hard to answer with specifics, what with all those unknowns.

There are ways to design your DB to deal with this (mostly involving a table representing a calculation, and having a row indicate whether the calculation is complete or not. As long as you aren't done, dont set it to 'completed', and all your queries should ignore non-complete results. Upon bootup, delete (and with it, let that cascade) any non-complete results: Those must be half-baked work done right before your server crashed, and now you've restarted it). It's probably not the right answer here, just making sure you're aware that such options also exist.

As a general rule of thumb, countering a problem of "Our code has been observed to run too slowly" with "lets make it all async" doesn't work. async makes code harder to read, way harder to debug, and doesn't make stuff go faster. All you can really do with async is soothe the user by playing them some elevator music or slightly more pragmatic: A progress bar or whatnot, whilst they wait. And that's actually generally easier by spawning off the bits that tell the user what's happening into a separate thread, instead of asyncing the work itself. That, and make your algorithm better and/or fix your DB index definitions. You can search the web for that too; run EXPLAIN variants of your queries to make the DB tell you whether it is using any table sweeps (that's where it goes through the entire dataset before it can answer a query. You want to avoid those).

If you need help with either of those parts (show the user what is going on, instead of freezing the webpage or freezing the GUI / how to optimize a DB query), search the web for this information, there are tons of tutorials. Make sure to include the frontend tech; java can be used for swing apps, javafx, android, and there are at last count like a 100 web frameworks.

Is it good to put jdbc operations in actors?

Whether putting JDBC access in actors is 'good' or not greatly depends upon the rest of your application.

Most web applications today are synchronous, thanks to the Servlet API that underlies most Java (and Scala) web frameworks. While we're now seeing support for asynchronous servlets, that support hasn't worked its way up all frameworks. Unless you start with a framework that supports asynchronous processing, your request processing will be synchronous.

As for JDBC, JDBC is synchronous. Realistically there's never going to be anything done about that, given the burden that would place on modifying the gazillion JDBC driver implementations that are out in the world. We can hope, but don't hold your breath.

And the JDBC implementations themselves don't have to be thread safe, so invoking an operation on a JDBC connection prior to the completion of some other operation on that same connection will result in undefined behavior. And undefined behavior != good.

So my guess is that you won't see quite the same capacity improvements that you see with NIO.

Edit: Just discovered adbcj; an asynchronous database driver API. It's an experimental project written for a master's thesis, very early, experimental. It's a worthy experiment, and I hope it succeeds. Check it out!

But, if you are building an asynchronous, actor-based system, I really like the idea of having data access or repository actors, much in the same way your would have data acccess or repository objects in a layered OO architecture.

Actors guarantee that messages are processed one at a time, which is ideal for accessing a single JDBC connection. (One word of caution: most connection pools default to handing out connection-per-thread, which does not play well with actors. Instead you'll need to make sure that you are using a connection-per-actor. The same is true for transaction management.)

This allows you to treat the database like the asynchronous remote system we ought to have been treating it as all along. This also means that results from your data access/repository actors are futures, which are composable. This makes it easier to coordinate data access with other asynchronous activities.

So, is it good? Probably, if it fits within the architecture of the rest of your system. Will it improve capacity? That will depend on your overall system, but it sounds like a very worthy experiment.



Related Topics



Leave a reply



Submit