Skip to main content
added 58 characters in body
Source Link
Marcus Müller
  • 53.1k
  • 4
  • 80
  • 123
  1. never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
  2. not a use case for cat "$1" | …not a use case for cat "$1" | …, but simply for … < $1. Could be written more compact, but simply for … < $1as you like.
  3. no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
  4. yes, it really runs in parallel.
  5. your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
  6. Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.
  1. never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
  2. not a use case for cat "$1" | …, but simply for … < $1.
  3. no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
  4. yes, it really runs in parallel.
  5. your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
  6. Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.
  1. never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
  2. not a use case for cat "$1" | …, but simply for … < $1. Could be written more compact, but as you like.
  3. no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
  4. yes, it really runs in parallel.
  5. your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
  6. Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.
Source Link
Marcus Müller
  • 53.1k
  • 4
  • 80
  • 123

  1. never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
  2. not a use case for cat "$1" | …, but simply for … < $1.
  3. no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
  4. yes, it really runs in parallel.
  5. your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
  6. Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.