Correct Use of Flush() in JPA/Hibernate

Correct use of flush() in JPA/Hibernate

Probably the exact details of em.flush() are implementation-dependent.
In general anyway, JPA providers like Hibernate can cache the SQL instructions they are supposed to send to the database, often until you actually commit the transaction.
For example, you call em.persist(), Hibernate remembers it has to make a database INSERT, but does not actually execute the instruction until you commit the transaction. Afaik, this is mainly done for performance reasons.

In some cases anyway you want the SQL instructions to be executed immediately; generally when you need the result of some side effects, like an autogenerated key, or a database trigger.

What em.flush() does is to empty the internal SQL instructions cache, and execute it immediately to the database.

Bottom line: no harm is done, only you could have a (minor) performance hit since you are overriding the JPA provider decisions as regards the best timing to send SQL instructions to the database.

What is exact purpose of flush in JPA

In theory, you (as a user of JPA) should never (or in absolutely rare situations) get in a situation to call flush().

Flushing is the process of synchronizing the underlying persistent
store with persistable state held in memory

In other words, on a flush() all the insert, update, delete or whatever statements are actually called on the database, before a flush() nothing happens on your database. Flushing is caused by a commit of your transaction or some kinds of database reads. For example if you execute a JPQL query, a flush() has to be done to get the correct results from the database. But this is just very nice to know and completely handled by your JPA implementation.

There may be some situations you want to control this flushing on your own and then you can invoke it with flush().

Edit to answer the questions in comment:

Not on every read a flush is necessary, consider this scenario (one transaction):

Read a person Person p = em.find(Person.class, 234)
Update person p.setAge(31)
Read a building Building b = em.find(Building.class, 123
Read a building with JPQL query select b from Building b where b.id = 123

Automatic flush occurs only before 4., because Eclipselink can't determine what you are gonna read, so the person's age must be up to date on the database before this read can occur. Before 3. there is no flush needed because Eclipselink knows that the update on a person can not affect a building.

To work with optimistic locking, you have to implement it. Read about the @Version annotation here: https://blogs.oracle.com/carolmcdonald/entry/jpa_2_0_concurrency_and. Without that your entity will not use optimistic locking and the "last update wins".

When does flush and clear commit?

Entities are synchronized to the connected database at transaction commit time. If you only have n = 1 ongoing transaction (here: JTA/container managed), changes on one or more entities get written to the DB the moment you call flush() on the EntityManager instance.

However, changes become "visible" only after the transaction has been properly executed by the container (here: Glassfish) which is responsible for transaction handling. For reference, see. section 7.6.1 (p. 294) of JPA Spec 2.0 which defines:

A new persistence context begins when the container-managed entity manager is invoked (Specifically, when one of the methods of the EntityManager interface is invoked) in the scope of an active JTA transaction, and there is no current persistence context already associated with the JTA transaction. The persistence context is created and then associated with the JTA transaction.

The persistence context ends when the associated JTA transaction commits or rolls back, and all entities that were managed by the EntityManager become detached.

In section 3.2.4 (Synchronization to the Database) of the JPA Spec 2.0 we find:

The state of persistent entities is synchronized to the database at transaction commit.

[..]

The persistence provider runtime is permitted to perform synchronization to the database at other times as well when a transaction is active. The flush method can be used by the application to force synchronization.

It applies to entities associated with the persistence context. The EntityManager and Query setFlushMode methods can be used to control synchronization semantics. The effect of FlushModeType.AUTO is defined in section 3.8.7. If FlushModeType.COMMIT is specified, flushing will occur at transaction commit; the persistence provider is permitted, but not required, to perform to flush at other times. If there is no transaction active, the persistence provider must not flush to the database.

Most likely in your scenario, the container (Glassfish) and/or your application is configured for FlushModeType.COMMIT(*1). In case FlushModeType.AUTO is in place, it is up to the Persistence Provider (EclipseLink) which "is responsible for ensuring that all updates to the state of all entities in the persistence context which could potentially affect the result of the query are visible to the processing of the query." (Section 3.8.7, p. 122)

By contrast, the clear() method does NOT commit anything by itself. It simply detaches all managed entities from the current persistence context, thus causing any changes on entities which have not been flushed (committed) to get lost. For reference, see p. 70 of the linked JPA Spec.

With respect to the OutOfMemoryError, it's hard to tell what's causing this under which circumstances, as you did not provide much detail either. However, I would:

read the aforementioned sections of the JPA specification
check how your environment is configured and
reevaluate how your application is written/implemented, potentially making false assumptions on the transaction handling of the container it is running in.

Related to 2., you might check your persistence.xml whether it configures

<property name="eclipselink.persistence-context.flush-mode" value="COMMIT" />

and change it to AUTO to see if there is any difference.

Hope it helps.

Footnotes

*1: But that's a good guess, as you did not provide that much detail on your setup/environment.

What does EntityManager.flush do and why do I need to use it?

A call to EntityManager.flush(); will force the data to be persisted in the database immediately as EntityManager.persist() will not (depending on how the EntityManager is configured: FlushModeType (AUTO or COMMIT) by default is set to AUTO and a flush will be done automatically. But if it's set to COMMIT the persistence of the data to the underlying database will be delayed until the transaction is committed.

Calling flush() in @Transactional method in Spring Boot application

It should not save anything before you call em.commit() or transaction ends. The best explanation I found is from here .Below the essential excerpt:

This operation will cause DML statements (insert/update/delete etc) to be executed to the database but the current transaction will not be committed. That means flush() will not make current changes visible to other EntityManager instances or other external database clients; that will only happen at the transaction commit. In other words flush() operation will only flush the current memory cache from EntityManager to the database session.

So the flush might raise some JPA exceptions but it would not actually be committed to database before transaction ends.

Hibernate: flush() and commit()

In the Hibernate Manual you can see this example

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for (int i = 0; i < 100000; i++) {
    Customer customer = new Customer(...);
    session.save(customer);
    if (i % 20 == 0) { // 20, same as the JDBC batch size
        // flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

tx.commit();
session.close();

Without the call to the flush method, your first-level cache would throw an OutOfMemoryException

Also you can look at this post about flushing

JPA use in about save(flush) in play framework 1.x

Entity manager flush operation sends the sql to the database, but remember that when a transaction is in process, the data send to database through sql is persisted only when the database is told to commit.

This is general database behaviour during transaction and you use transaction to set a boundary for database operations to be atomic.

Even when using plain jdbc instead of any orm you will see this behaviour unless of course when the auto commit is enabled for each of the sql query sent. Under the hood, the orm also uses jdbc. So for a single resource transaction (e.g., a single database), the usual idiom is to first set the autoCommit false on the jdbc connection, then send multiple inserts/updates through SQL and then call commit on the connection. If some exception was reported then call rollback. So the data is persisted only when the final call to commit is sent to database.

UPDATE

You need to understand that flush() without an active transaction is not going to work. Also there is a thing called FlushMode which actually controls when the flush is going to happen
You can have a look at the answer here.
The key to understanding all this is that data sent to database in the form of insert/updates do not immediately make it persistent unless the transaction is committed but the same transaction has access
to the changed data (changed but yet not persisted).On the database side you can visualize this as a seperate area for each transaction where each transaction
can change the data within itself but the data that finally goes into the underlying table does so only after the transaction is told to commit. The flush is also
implicitly called by the provider before running a query whose results may be affected by the state of persistent context. For e.g., if you load an entity, then change
its property and then run a query for that entity, then provider sees that the change must first be sent to database in its transaction and then the entity is queried
so that the loaded properties reflect the changed one. However the changed data will not be persisted to the actual table row until the transaction is committed.

Correct Use of Flush() in JPA/Hibernate