1

I have a Person table with 10M rows in it, and this data is read-only. My application reads the whole table into a List via Spring Data JPA on its startup, and then uses this List throughout the lifetime of the app without making any more Person queries.

I'm using Postgres 9.6, Java 8, Spring Data JPA 1.11, and Hibernate 5.2, and there are bunch of other tables which are smaller/have updates etc, and overall everything works great.

The issue I have is that I need 2-3 times memory in order to load these 10M Person objects vs the memory required to hold these Person objects after they are loaded. During the load, JPA will download the whole result set, and then convert it into my Person objects, duplicating the memory. The level one cache of Hibernate is also holding on to these objects.

Hibernate has a StatelessSession which can help me with the caching issue (https://gist.github.com/jelies/5181262), and I can do paging queries of 500k rows at a time or something like that to not duplicate the whole dataset on load, but is there a simpler way of doing this with Spring Data JPA in 2018?

I.e. can I stream the Person table into my Person objects N rows at a time, and disable all caching in the process?

3
  • 1. For streaming query results, see here, here, here, and here 2. 'The level one cache of Hibernate is also holding on to these objects' - why would that even matter? Commented Apr 18, 2018 at 23:05
  • What was your solution? I'm trying stream 200M rows, and it's not going so well... Commented Jan 8, 2019 at 4:32
  • @JoshC. I used a StatelessSession - this lets me read in batches of 500k. I posted an answer with some code. Commented Jan 19, 2019 at 2:08

1 Answer 1

2

Ended up doing something similar to this. Set the fetch size as a config param and test different ones

    StatelessSession session = ((Session) em.getDelegate()).getSessionFactory().openStatelessSession();
    // wherever you want to store them
    List<MyObject> output = new ArrayList<>();
    ScrollableResults results = null;


    try {
        Query query = session.createQuery("SELECT a FROM MyObject a");
        query.setFetchSize(250_000);
        query.setReadOnly(true);
        query.setCacheable(false);
        query.setLockMode("a", LockMode.NONE);
        results = query.scroll(ScrollMode.FORWARD_ONLY);
        while (results.next()) {
            MyObject o = (MyObject) results.get(0);
            output.add(o);
        }
    }
Sign up to request clarification or add additional context in comments.

1 Comment

I ended up switching to Spring Batch and used a JDBC cursor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.