1

I've been assigned a task to unmarshal a XML file using JAXB and generate corresponding SQLs and fire to database. I've used following method to generate the list of SQLS.

public List<String> getSqlOfNationalityList(File file)throws JAXBException, FileNotFoundException, UnsupportedEncodingException {
List<String> unNationalityList = new ArrayList<String>();
JAXBContext jaxbcontext = JAXBContext.newInstance(ObjectFactory.class);
Unmarshaller unmarshaller = jaxbcontext.createUnmarshaller();

CONSOLIDATEDLIST consolidate = (CONSOLIDATEDLIST) unmarshaller.unmarshal(file);
    // accessing individuals properties
    INDIVIDUALS individuals = consolidate.getINDIVIDUALS();
    List<INDIVIDUAL> list = individuals.getINDIVIDUAL();

    for (INDIVIDUAL individual : list) {
        NATIONALITY nationality = individual.getNATIONALITY();
        if (nationality != null) {
            List<String> values = nationality.getVALUE();
            if (values != null) {
                for (String value : values) {
                    String string2 = "";
                    StringBuffer builder = new StringBuffer();
                    builder.append("INSERT INTO LIST_UN_NATIONALITY");
                    builder.append("(" + "\"DATAID\"" + "," + "\"VALUE\"" + ")");
                    builder.append(" " + "VALUES(");
                    string2 = string2.concat("'" + individual.getDATAID() + "'" + ",");
                    if ("null ".contentEquals(value + " ")) {
                        string2 = string2.concat("' '" + ",");
                    } else {
                        string2 = string2.concat("'" + value.replace("'", "/") + "'" + ",");
                    }

                    if (string2.length() > 0) {
                        builder.append(string2.substring(0, string2.length() - 1));
                    }
                    builder.append(");");
                    builder.append("\r\n");
                    unNationalityList.add(builder.toString());
                }
            }
        }
    }
    return unNationalityList;

}// end of file nationality List

I have used following method to read from the list and insert into database.

private void readListAndInsertToDb(List<String> list) {

        int duplicateCount = 0;
        int totalCount = 0;

        try {
                for (String sql : list) {

                    try {

                        int i = jdbcTemplate.update(sql);

                    } catch (DuplicateKeyException dke) {
                        // dke.printStackTrace();
                        duplicateCount++;

                    } catch (DataAccessException e) {
                        e.printStackTrace();
                    }
                    totalCount++;

                } // end of for


        } catch (SQLException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } 

        System.out.println("\r\nTotal : " + totalCount);
        System.out.println("Total duplicate : " + duplicateCount);


    }

Now the issue is, I've about 13-14 similar type of lists. And the xml file consists of records which may already exist in database.

  1. How can I fire queries without making duplicate entries in the PostGres database.
  2. How it could be done in best optimal way? It would be great if executed in batch

1 Answer 1

2

Nono, don't generate a list of SQL statements.Especially don't interpolate them as strings!. Awoogah, awoogah, SQL injection alert.

Don't use a try/catch approach for duplicate handling either.

Improvements are, from simple and easy to harder but best:

  • At bare minimum use a PreparedStatement with bind parameters. Prepare it once. Then execute it for each input, with the parameters from the current data row.

    You cannot rely on drivers throwing DuplicateKeyException and you should also catch SQLException and check the SQLSTATE. Unless of course you plan on using one specific DBMS and your code checks that you're using the expected driver + version.

  • Better, use PostgreSQL's INSERT ... ON CONFLICT DO NOTHING feature to handle conflicts without needing exception handling. This lets you batch your inserts, doing many per transaction for better performance.

  • Further improve performance by using a multi-row VALUES list for INSERT ... ON CONFLICT DO NOTHING.

  • Even better, COPY all the data, including duplicates, into a TEMPORARY table using PgJDBC's CopyManager interface (see PGconnection.getCopyAPI()), create an index on the key used for duplicate detection, then LOCK the destination table and do a bulk

     INSERT INTO real_table
     SELECT ...
     FROM temp_table
     WHERE NOT EXISTS (SELECT 1 FROM real_table WHERE temp_table.key = real_table.key)
    

    or similar. This will be way faster. You can use INSERT ... ON DUPLICATE NO ACTION instead if you're on a new enough PostgreSQL.

Sign up to request clarification or add additional context in comments.

4 Comments

I know my code smells. The query formation logic is not written by me, I'm looking for the optimal solution. I'm using PostgreSQL 9.2.15 . Does INSERT ... ON DUPLICATE NO ACTION gonna solve this issue?
No, because it's not available in PostgreSQL 9.2. That's why you include the version in your questions ;) . You can use the COPY approach on 9.2, or at the bare minimum use a PreparedStatement and bind parameters instead of string interpolation to build the queries. If you're working with some legacy monstrosity that simply cannot let you do it the correct way, you must implement proper literal-quoting and interpolate the parameters safely yourself, and deal with it being horrifically slow.
I'm allowed and had already changed the query as of the PreparedStatement . But is there any other techniques besides COPY approach. How to eliminate duplicate entries?
LOCK the table in EXCLUSIVE mode so nobody else can insert/update/delete, then do an insert ... where not exists select (...). If you want it faster, create a temp table with the data including duplicates then do a bulk insert with de-duplication, like I suggested above. For how to merge two tables and remove duplicates, there are many other questions with answers for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.