Spring Data JPA and loading large volumes of read-only data.

Question

I have a Person table with 10M rows in it, and this data is read-only. My application reads the whole table into a List via Spring Data JPA on its startup, and then uses this List throughout the lifetime of the app without making any more Person queries.

I'm using Postgres 9.6, Java 8, Spring Data JPA 1.11, and Hibernate 5.2, and there are bunch of other tables which are smaller/have updates etc, and overall everything works great.

The issue I have is that I need 2-3 times memory in order to load these 10M Person objects vs the memory required to hold these Person objects after they are loaded. During the load, JPA will download the whole result set, and then convert it into my Person objects, duplicating the memory. The level one cache of Hibernate is also holding on to these objects.

Hibernate has a StatelessSession which can help me with the caching issue (https://gist.github.com/jelies/5181262), and I can do paging queries of 500k rows at a time or something like that to not duplicate the whole dataset on load, but is there a simpler way of doing this with Spring Data JPA in 2018?

I.e. can I stream the Person table into my Person objects N rows at a time, and disable all caching in the process?

1. For streaming query results, see here, here, here, and here 2. 'The level one cache of Hibernate is also holding on to these objects' - why would that even matter? — crizzis
– crizzis, Commented Apr 18, 2018 at 23:05
What was your solution? I'm trying stream 200M rows, and it's not going so well... — Josh C.
– Josh C., Commented Jan 8, 2019 at 4:32
@JoshC. I used a StatelessSession - this lets me read in batches of 500k. I posted an answer with some code. — kozyr
– kozyr, Commented Jan 19, 2019 at 2:08

kozyr · Accepted Answer · 2019-01-19 02:12:46Z

2

Ended up doing something similar to this. Set the fetch size as a config param and test different ones

    StatelessSession session = ((Session) em.getDelegate()).getSessionFactory().openStatelessSession();
    // wherever you want to store them
    List<MyObject> output = new ArrayList<>();
    ScrollableResults results = null;


    try {
        Query query = session.createQuery("SELECT a FROM MyObject a");
        query.setFetchSize(250_000);
        query.setReadOnly(true);
        query.setCacheable(false);
        query.setLockMode("a", LockMode.NONE);
        results = query.scroll(ScrollMode.FORWARD_ONLY);
        while (results.next()) {
            MyObject o = (MyObject) results.get(0);
            output.add(o);
        }
    }

answered Jan 19, 2019 at 2:12

kozyr

1,2741 gold badge21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Josh C. Over a year ago

I ended up switching to Spring Batch and used a JDBC cursor.

Collectives™ on Stack Overflow

Spring Data JPA and loading large volumes of read-only data.

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related