0

Due to a power failure issue, I am having to clean up jobs which are run based on text files. So the problem is, I have a text file with strings like so (they are uuids):

out_file.txt (~300k entries)

<some_uuidX>
<some_uuidY>
<some_uuidZ>
...

and a csv like so:

in_file.csv (~500k entries)

/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location3/,<some_uuidX>.json.<some_string3>
/path/to/some/location4/,<some_uuidY>.json.<some_string4>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
/path/to/some/location6/,<some_uuidZ>.json.<some_string6>
...

I would like to remove lines from out_file for entries which match in_file. The end result:

/path/to/some/location1/,<some_uuidK>.json.<some_string1>
/path/to/some/location2/,<some_uuidJ>.json.<some_string2>
/path/to/some/location5/,<some_uuidN>.json.<some_string5>
...

Since the file sizes are fairly large, I was wondering if there is an efficient way to do it in bash.

any tips would be geat.

4
  • 1
    Please add to your question (no comment): What have you searched for, and what did you find? What have you tried, and how did it fail? Commented Mar 24, 2022 at 22:41
  • 3
    Does it need to be faster than grep -vFwf out_file.txt in_file.csv? Perhaps awk -F"[,.]" 'FNR==NR { a[$1]; next } !($2 in a)' out_file.txt in_file.csv? Commented Mar 24, 2022 at 22:44
  • @jared_mamrot OMG! how elegant. Please put this as an answer. Thank you :) Commented Mar 24, 2022 at 23:42
  • @jared_mamrot just to clarify, I just did as per your suggestion grep -vFwf out_file.txt in_file.csv >out.csv and the numbers add up correctly :) Commented Mar 24, 2022 at 23:44

1 Answer 1

1

Here is a potential grep solution:

grep -vFwf out_file.txt in_file.csv

And a potential awk solution (likely faster):

awk -F"[,.]" 'FNR==NR { a[$1]; next } !($2 in a)' out_file.txt in_file.csv

NB there are caveats to each of these approaches. Although they both appear to be suitable for your intended purpose (as indicated by your comment "the numbers add up correctly"), posting a minimal, reproducible example in future questions is the best way to help us help you.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.