1

I will have a CSV file (say, ids.csv) that I need to ETL into a SQL script (say, update_products.sql). The CSV will be headerless and will consist of comma-delimited numbers (product IDs in a database), for instance:

29294848,29294849,29294850,29294851,29294853,29294857,29294858,29294860,29294861,29294863,29294887,29294888,
29294889,29294890,29294891,29294892,29294895,29294897,29294898,29294899,29294901,29294903,29294912,29294916

Starting with a SQL "template" file (template.sql) that looks something like this:

UPDATE products SET quantity = 0 WHERE id = %ID%;

I'm looking for a way via bash, awk, sed (or any other type of shell scripting tool), to templatize %IDS% with the values in the CSV, hence turning the generated SQL into something like:

UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
... etc, for all the IDs in the CSV...

Super flexible here:

  • Don't care which tool gets the job done (awk, sed, bash, whatever...as long as I can run it from the command line)
  • Don't necessarily NEED a template file (template.sql) to start with, perhaps the solution can just "inject" this template into the script as an argument
  • Ideally it would read the input CSV but this is not a hard requirement, if the solution requires me pasting the contents of the CSV file into the script as an argument, I'm OK with that, but not thrilled...
  • Ideally it would generate an actual SQL file (update_products.sql) for me, but if we're limited to console output thats OK to (just not preferred)

Any ideas how I might be able to accomplish this?

1
  • 1
    What database are your targeting? Most have ways to import csv files directly. Commented Aug 17, 2019 at 0:43

4 Answers 4

1

I'd probably start with

$: sed "s/ *= %ID%/ IN ( $(echo $(<ids.csv) ) )/" template.sql > update_products.sql

but if it's a lot of id's I'm not sure what your limits are, and I honestly don't remember whether that's an ANSI standard structure...

SO...

$: while IFS=, read -a ids
> do for id in ${ids[@]}
>    do echo "UPDATE products SET quantity = 0 WHERE id = $id;"
>    done
> done < ids.csv > update_products.sql
$: cat update_products.sql
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Paul (+1) - is there any way to specify output to the update_products.sql with your sed example?
Simple output redirection - c.f.
1

Don't need to use %ID% - the ids.txt only needs to be prefixed with SQL like so, writing the output to product_updates.sql output file:

awk -F, '{printf "%s (%s)\n", "UPDATE products SET quantity = 0 WHERE id IN ", $0}' ids.txt > product_updates.sql

4 Comments

Thanks @suspectus (+1) -- is there any way to specify the output gets sent to an "output file", such as product_updates.sql? Thanks again!
@hotmeatballsoup Yes that's easy to do - will update post
This is interesting because it makes an update statement per line of the CSV. Would it be more efficient to make just one big one, or does this add efficiency at scale by breaking it into manageable chunks?
Yes, it is more efficient to minimise the number of SQL statements and include many columns in an IN clause list.
0

I propose to be safe rather than sorry.

May be deemed pedantic, but working with business database is serious matter.

So here it is based on @Paul Hodges's answer

#!/usr/bin/env bash

{
  # Use the prepared statements `zeroproduct` 
  # to protect against SQL injections
  printf 'PREPARE zeroproduct FROM '\''%s'\'';\n' \
    'UPDATE products SET quantity = 0 WHERE id = ?'

  # Work inside a transaction, so if something goes wrong,
  # like the sql file is incomplete, it can be rolled-back.
  printf 'START TRANSACTION;\n'

  while IFS=, read -r -a ids; do
    for id in "${ids[@]}"; do
      # Set the value of the @id argument in SQL
      # And execute the SQL statement with the @id argument
      # that will replace the '?'
      printf 'SET @id='\''%8d'\''; EXECUTE zeroproduct USING @id;\n' \
        "$((id))" # Ensure id is an integer
    done
  done <ids.csv

  # Now commit all these changes since we are finally here
  printf 'COMMIT;\n'

  # Deallocate the prepared statement once we are done
  printf 'DEALLOCATE PREPARE zeroproduct;\n'

} >update_products.sql

# Good to have if this is transmitted remotely
sha512sum update_products.sql >update_products.sql.sha512sum

# can later check with:
sha512sum -c update_products.sql.sha512sum

From the provided sample csv, here is the content of update_products.sql:

PREPARE zeroproduct FROM 'UPDATE products SET quantity = 0 WHERE id = ?';
START TRANSACTION;
SET @id='29294848'; EXECUTE zeroproduct USING @id;
SET @id='29294849'; EXECUTE zeroproduct USING @id;
SET @id='29294850'; EXECUTE zeroproduct USING @id;
SET @id='29294851'; EXECUTE zeroproduct USING @id;
SET @id='29294853'; EXECUTE zeroproduct USING @id;
SET @id='29294857'; EXECUTE zeroproduct USING @id;
SET @id='29294858'; EXECUTE zeroproduct USING @id;
SET @id='29294860'; EXECUTE zeroproduct USING @id;
SET @id='29294861'; EXECUTE zeroproduct USING @id;
SET @id='29294863'; EXECUTE zeroproduct USING @id;
SET @id='29294887'; EXECUTE zeroproduct USING @id;
SET @id='29294888'; EXECUTE zeroproduct USING @id;
SET @id='29294889'; EXECUTE zeroproduct USING @id;
SET @id='29294890'; EXECUTE zeroproduct USING @id;
SET @id='29294891'; EXECUTE zeroproduct USING @id;
SET @id='29294892'; EXECUTE zeroproduct USING @id;
SET @id='29294895'; EXECUTE zeroproduct USING @id;
SET @id='29294897'; EXECUTE zeroproduct USING @id;
SET @id='29294898'; EXECUTE zeroproduct USING @id;
SET @id='29294899'; EXECUTE zeroproduct USING @id;
SET @id='29294901'; EXECUTE zeroproduct USING @id;
SET @id='29294903'; EXECUTE zeroproduct USING @id;
SET @id='29294912'; EXECUTE zeroproduct USING @id;
SET @id='29294916'; EXECUTE zeroproduct USING @id;
COMMIT;
DEALLOCATE PREPARE zeroproduct;

Comments

0

In addition to the answer by @suspectus which provides a nice use of printf to output each line wanted, a slightly more procedural use of awk incorporating a for loop over the fields would be:

awk -F, '{
    for (i=1;i<=NF;i++)
        print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
}' file.csv

Where the single rule simply loops over each of the comma-separated fields using string-concatenation to form the desired output. In detail the awk command:

  • awk -F, sets the field-separator (FS) equal to a comma to split the input,
  • for (i=1;i<=NF;i++) simply loops over each field, and
  • print "UPDATE products SET quantity = 0 WHERE id = " $i ";" outputs the wanted text incorporating the field within using string-concatenation.

Example Use/Output

With your data in file.csv (presumed to be a single line, but it really doesn't matter) your output would be:

$ awk -F, '{
>     for (i=1;i<=NF;i++)
>         print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
> }' file.csv
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

Look things over and let me know if you have further questions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.