remove line in csv file if string found (from another text file) in bash

Question

Due to a power failure issue, I am having to clean up jobs which are run based on text files. So the problem is, I have a text file with strings like so (they are uuids):

out_file.txt (~300k entries)

<some_uuidX>
<some_uuidY>
<some_uuidZ>
...

and a csv like so:

in_file.csv (~500k entries)

/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location3/,<some_uuidX>.json.<some_string3>
/path/to/some/location4/,<some_uuidY>.json.<some_string4>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
/path/to/some/location6/,<some_uuidZ>.json.<some_string6>
...

I would like to remove lines from out_file for entries which match in_file. The end result:

/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
...

Since the file sizes are fairly large, I was wondering if there is an efficient way to do it in bash.

any tips would be geat.

Please add to your question (no comment): What have you searched for, and what did you find? What have you tried, and how did it fail? — Cyrus
– Cyrus, Commented Mar 24, 2022 at 22:41
Does it need to be faster than grep -vFwf out_file.txt in_file.csv? Perhaps awk -F"[,.]" 'FNR==NR { a[$1]; next } !($2 in a)' out_file.txt in_file.csv? — jared_mamrot
– jared_mamrot, Commented Mar 24, 2022 at 22:44
@jared_mamrot OMG! how elegant. Please put this as an answer. Thank you :) — AJW
– AJW, Commented Mar 24, 2022 at 23:42
@jared_mamrot just to clarify, I just did as per your suggestion grep -vFwf out_file.txt in_file.csv >out.csv and the numbers add up correctly :) — AJW
– AJW, Commented Mar 24, 2022 at 23:44

jared_mamrot · Accepted Answer · 2022-03-24 23:54:13Z

1

Here is a potential grep solution:

grep -vFwf out_file.txt in_file.csv

And a potential awk solution (likely faster):

awk -F"[,.]" 'FNR==NR { a[$1]; next } !($2 in a)' out_file.txt in_file.csv

NB there are caveats to each of these approaches. Although they both appear to be suitable for your intended purpose (as indicated by your comment "the numbers add up correctly"), posting a minimal, reproducible example in future questions is the best way to help us help you.

edited Mar 24, 2022 at 23:54

answered Mar 24, 2022 at 23:47

jared_mamrot

26.5k5 gold badges27 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

remove line in csv file if string found (from another text file) in bash

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related