Duplicate Entry Exception: Spring Hibernate/Jpa Cascade Save Many to One

Duplicate Entry Exception: Spring Hibernate/JPA cascade save Many To One

I prepared the working solution: Cepr0/greyhound-demo.
I reworked your project "a little" - did it with Spring-Boot, Lombok and H2 database, just for demo purposes and to simplify it.

So, if I'm not mistaken, the task is to transform 'assignments' (from the greyhound site):

{
    "results": [
        {
            "oper_nbr": 1,
            "carrier_cd": "GLX ",
            "last_name": "JOHN",
            "first_name": "SMITH",
            "middle_init": null,
            "home_loc_6": 12345,
            "home_loc_3": "NLX",
            "oper_class": "T"
        },
        {
            "oper_nbr": 2,
            "carrier_cd": "GLX ",
            "last_name": "JOHN",
            "first_name": "DOE",
            "middle_init": null,
            "home_loc_6": 67890,
            "home_loc_3": "NLX",
            "oper_class": "T"
        }
    ]
}

to three entities: Driver, Location, and Carrier with the relations:

Location -1---*- Driver -*---1- Carrier

i.e. Driver has 'many-to-one' relation with Location and Carrier.

The main problem of this task is that while saving the Driver entity, we need to use the already persisted Location and Carrier entities, or use new ones. So to solve it we have to:

Prepare 3 repositories for those entities.
For each 'assignment' find related Location and Carrier.
If Location and Carrier are not found then create new ones.
Create a new Driver and set the found Location and Carrier or new ones having been created.
Persist the Driver (and cascaded persist Location and Carrier if they are not found).

The final code of method GreyhoundService.process():

@Transactional
public void process() {
  client.getAssignments()
      .stream()
      .forEach(a -> {
        log.debug("[d] Assignment: {}", a);

        Driver driver = new Driver();

        driver.setId(a.getDriverId());
        driver.setFirstName(a.getFirstName());
        driver.setLastName(a.getLastName());
        driver.setMiddleName(a.getMiddleName());

        driver.setLocation(
            locationRepo.findById(new Location.PK(a.getLocationId(), a.getLocationName()))
                .orElse(new Location(a.getLocationId(), a.getLocationName()))
        );

        driver.setCarrier(
            carrierRepo.findById(a.getCarrierId().trim())
                .orElse(new Carrier(a.getCarrierId().trim()))
        );

        driverRepo.saveAndFlush(driver);

        log.debug("[d] Driver: {}", driver);
      });
}

To minimize the size of the data in the database and the number of SQL selects I transformed the initial entities as follows:

Driver

@Getter
@Setter
@ToString
@EqualsAndHashCode(of = "id")
@Entity
@Table(name = "drivers")
public class Driver implements Persistable<Long> {

    @Id private Long id;

    private String firstName;
    private String lastName;
    private String middleName;

    @ManyToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
    @JoinColumn(name = "carrierId", foreignKey = @ForeignKey(name = "drivers_carriers"))
    private Carrier carrier;

    @ManyToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
    @JoinColumns(
            value = {@JoinColumn(name = "locationId"), @JoinColumn(name = "locationName")},
            foreignKey = @ForeignKey(name = "drivers_locations")
    )
    private Location location;

    @Override
    public boolean isNew() {
        return true;
    }
}

Location

@Data
@NoArgsConstructor
@Entity
@Table(name = "locations")
@IdClass(Location.PK.class )
public class Location {

    @Id private Long locationId;
    @Id private String locationName;

    public PK getId() {
        return new PK(locationId, locationName);
    }

    public void setId(PK id) {
        this.locationId = id.getLocationId();
        this.locationName = id.getLocationName();
    }

    public Location(final Long locationId, final String locationName) {
        this.locationId = locationId;
        this.locationName = locationName;
    }

    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class PK implements Serializable {
        private Long locationId;
        private String locationName;
    }
}

Carrier

@Data
@NoArgsConstructor
@Entity
@Table(name = "carriers")
public class Carrier {
    @Id private String carrierId;

    public Carrier(final String carrierId) {
        this.carrierId = carrierId;
    }
}

As you can see I used natural identifiers for Location and Carrier (and a composite one in Carrier). This made it possible not only to reduce the size of the data but also reduce the number of additional SQL queries that Hibernate performs when storing complex entities. When Location and Carrier tables are filled, Hibernate does not perform unnecessary queries to find them but takes their data from its own cache (you can see this in the app log).

P.S. Note that this solution is not optimal. IMO to make it better you can split the main process into two parts: the first one persists distinct Locations and Carriers and the second one just persists Drivers without finding Locations and Carriers. Both parts perform with batch insert.

UPDATE

Branch with the optimal solution: Cepr0/greyhound-demo:async_and_batch_insert

Due to the asynchronous persisting of Locations and Carriers and with batch insert, processing takes only about 5 seconds.

Spring JPA Hibernate many to many insertion duplicate entry for primary key

The problem you are probably facing is that since you have @JoinTable declaration on both entities so you have two insertions with the same composite primary keys in the Join Table. You can just remove the annotation from Authority class to this:

@ManyToMany(mappedBy="authorities")
private Set<User> users = new HashSet<>();

Keep everything else the same. Hopefully this will work.

How to avoid saving duplicates in @ManyToMany, but insert into mapping table?

If I understand well your predicament, your problem is that you are trying to insert new PostTag when saving your Post entity.

Since you are doing a cascade save due to CascadeType.ALL, your EntityManager is roughly doing this:

Saving Post
Saving PostTag which cause your exception
Saving Post <-> PostTag

You should

Have a service (for example: PostTag findOrCreateTagByName(String)) that fetch existing PostTag by name and eventually create them. Thus returning existing PostTag.
Save the Post after with the association with said existing tags.

Edit (as answer to comment):

The JPA is only a mapping to a relational database.

In your code, you only shows the mapping which says that a Post is linked to several PostTag (and that PostTag are linked to several Post).

You added a unique constraint which apply on all tags: in all your database, there must be one tag "A", one tag "B", and so on.

If you populate your object like this (I don't use lombok, so I assume a minimal constructor here):

Post post = new Post();
post.setXXX(...); 
post.getPostTagSet().add(new PostTag("A"));
post.getPostTagSet().add(new PostTag("B"));

This means that you create two new tags named A and B.

The JPA Implementation (Hibernate, EclipseLink) are not magic: they won't fetch for you the existing tag and that where it will fail. If you violate the unicity constraint on table post_tag, this means you are inserting twice the same value. Either in the same transaction, either because the tag is already present in the table.

For example:

post.getPostTagSet().add(new PostTag("A"));
post.getPostTagSet().add(new PostTag("A"));

If you did not define correctly the hashCode(), then only the object identity hashCode would be used and there would be an attempt to add (insert) two tag A.

The only thing you can do here, is to restrict the PostTag by implementing correctly hashCode()/equals so that the PostTagSet ensure unicity only for the related Post.

Let's say now that you first fetch them and have a new tag C:

Post post = new Post();
post.setXXX(...); 
for (String tagName : asList("A", "B", "C")) {
  post.getPostTagSet().add(tagRepository.findByName(tagName)
                                     .orElseGet(() -> new PostTag(tagName ));
}
postRepository.save(post);

The tagRepository is simply a Spring JPA Repository - which I think you are using - and the findByName signature is:

Optional<String> findByName(String tagName);

The code will do:

Find tag A: it is in database as in PostTag(1, "A")
Find tag B: it is in database as in PostTag(2, "B")
Find tag C: it is not in database, create it.

This should then work because the cascade will perform a save on the Post, then on the PostTag, then on the relation Post <-> PostTag.

In term of SQL query, you should normally see something like this:

insert into post_tag (tag_id, name) (3, "C")
insert into post (post_id, ...) (<some id>, ...)
insert into post_tag_mapping (tag_id, post_id) (1, <some id>)
insert into post_tag_mapping (tag_id, post_id) (2, <some id>)
insert into post_tag_mapping (tag_id, post_id) (3, <some id>)

The other problem here is with the hashCode() and equals() provided by PostTag which ensure unicity of PostTag for one single Post:

If you use the id in hashCode() (and equals use id and name):

If you use the id, then the set will have PostTag(1, "A"), PostTag(2, "B") and PostTag("C")
When you save, PostTag("C") will have an id assigned -> PostTag(3, "C")
With standard HashSet, the PostTag("C") will no longer be in its valid bucket and you will fail to find it again.

This may not be problematic if you don't use the object after the set but I think it is best to first save the PostTag (assigning it an id) then add it to the set.

If you use the name in hashCode() and equals: as long as you don't update the name after insertion in the set, you won't have a problem.

Spring JPA @ManyToOne save fails with duplicate key

Ok, to answer my own question:

it is caused by the CascadeType @ManyToOne(cascade = CascadeType.ALL) on the Child property of the Parent. Removing that fixes it.

Apparently JPA cascades the PERSIST operation to the Child as well, even though it could have easily known that the Child is already a managed and persisted entity.

How to persist @ManyToMany relation - duplicate entry or detached entity

For above problem I would say your entity relationship cascade is wrong. Consider this: A user can have multiple roles but there can be fixed number of roles that can exist in the system. So CASCADE ALL from User entity does not make any sense, since life cycle of UserRoles should not depend on User entity life cycle. E.g. when we remove User, UserRoles should not get removed.

detached entity to persist exception will only occur when you are passing object which has primary key already set to persist.

Remove cascade and your problem will be solved now only thing you will need to decide is how you are going to insert User roles. According to me there should be separate functionality to do so.

Also do not use ArrayList, use HashSet. ArrayList allows duplicates.

Duplicate Entry Exception: Spring Hibernate/Jpa Cascade Save Many to One