Return to Answer

added 58 characters in body

Source Link

edited Jan 26, 2022 at 8:06

Marcus Müller

never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
not a use case for cat "$1" | …^{not a use case for cat "$1" | …, but simply for … < $1.}Could be written more compact, but simply for … < $1as you like.
no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
yes, it really runs in parallel.
your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.

never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
not a use case for cat "$1" | …, but simply for … < $1.
no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
yes, it really runs in parallel.
your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.

never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
^{not a use case for cat "$1" | …, but simply for … < $1.}Could be written more compact, but as you like.
no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
yes, it really runs in parallel.
your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.

Source Link

answered Jan 25, 2022 at 13:14

Marcus Müller

never pipe find into xargs unless you're using -print0 with find and -0 with xargs; otherwise, file names with spaces or newlines will be your end.
not a use case for cat "$1" | …, but simply for … < $1.
no, cat won't "mix up" up anything, since there's not one cat process, but 4, with completely isolated argument lists, and output.
yes, it really runs in parallel.
your db-client of course runs multiple times in parallel as well, but the whole point of a database system is that it keeps things consistent, so unless you misdesigned your INSERT statement to be multiple statements and not atomic, this is safe.
Parallelism doesn't help at all here; both your CSV file access as well as the database write access are not bounded by CPU but by inherently serialized IO underneath; so, you're not really solving a problem here. Since insertion is a write operation, and must be synchronized with other concurrent writes, parallelization probably makes things slower, not faster, unless you really know your database allows for sharded writes, and the bandwidth into your database server is wider than your storage read bandwidth - um, considering your question, this is unlikely.