7

I have a table with ID and name. I want to go through every row of this table. TheID is a primary key and auto_increment.

I can't use(?) a single query to get all rows because the table is huge. I am doing something with every result. I want the possibility to stop this task and continue with it later.

I thought I could do something like this:

for (int i = 0; i < 90238529; i++) {
  System.out.println("Current ID :" + i);
  query = "SELECT name FROM table_name WHERE id = " + i;
  ...
}

But that does not work because the auto_increment skipped some numbers.

As mentioned, I need an option to stop this task in a way that would allow me to start again where I left. Like with the example code above, I know the ID of the current entry and if I want to start it again, I just set int i = X.

3 Answers 3

6

Use a single query to fetch all the records :

query = "SELECT name FROM table_name WHERE id > ? ORDER BY id";

Then iterate over the ResultSet and read how many records you wish (you don't have to read all the row returned by the ResultSet).

Next time you run the query, pass the last ID you got in the previous execution.

Sign up to request clarification or add additional context in comments.

Comments

4

You mention this is a big table. It's important to note then that the MySQL Connector/J API Implementation Notes say

ResultSet

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, create a Statement instance in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

So, I think you need to do that and I would use a try-with-resources Statement. Next, I suggest you let the database help you iterate the rows

String query = "SELECT id, name FROM table_name ORDER BY id";
try (PreparedStatement ps = conn.prepareStatement(query,
        ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
        ResultSet rs = ps.executeQuery();) {
    while (rs.next()) {
        int id = rs.getInt("id");
        String name = rs.getString("name");
        System.out.printf("id=%d, name=%s%n", id, name);
    }
} catch (SQLException e) {
    e.printStackTrace();
}

1 Comment

Apperantly the table is not that big (yet). But thanks, this might be useful later.
0

I can't use a single query to get all rows because the table is huge and I am doing something with every result. Also I want the possibility to stop this task and continue with it later.

Neither of these reasons eliminate using a single query. It only impacts performance (keeping one connection alive for a long time vs. constantly opening and closing a connection, which can be mitigated using a connection pool).

As mentioned I need a option to stop this task but so that I could start again where I left. Like with the example code above I know the ID of the current entry and if I want to start it again I just set the int i = X

If you think about it, this wouldn't work either, as you said yourself

But that does not work because the auto_increment skipped some numbers.

More importantly, rows could have been inserted or deleted since the last time you queried the DB.

First of all, this sounds like a classical XY Problem, (you are describing a problem with your solution to the problem, rather than the actual problem). Secondly, seem to be using an RDBM for something (A queue) that it was never really designed for.

If you really want to do this, rather than use a better suited database there are a number of approaches you can use. Your first problem is that you want to resume from a certain point/state, but that this is not stored in the Database, so will not work in a scenario where there are multiple DB connections. The first way to fix this is to introduce a "processed" field in your table (which you can clear with an UPDATE statement if you want to resume from an arbitrary point), now depending on which problem you're actually trying to solve, this can be a simple true/false field, a unique identifier of the currently processing thread, or a relational table. Depending on requirements.

Then you can go back to using SQL to get the data you want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.