Looking for a suggestion that would be much faster. I have a large (232GB) file mongo backup. I want to take out only the April 24th lines and make a new file containing only this date or any date of my choosing. The grep statement below works with "cat" but it takes a long time about 1.5 hours. I am piping commands to run others behind the cat before the grep. Can anyone suggest a better way to accomplish this? In this mongo file there is a log entry per line, so greping by the specific string works. This command runs on a 56 Core, 500 Gig RAM machine, but old spinning disks. Sadly I don't have access to the daily backups with a single day after the monthly file is built.
cat /mnt/backup/mongoexport/logs_2025-04.json | grep -E '.*"entrytimestamp":{"\$date":"2025-02-24' >> /tmp/logs_2025-04-24.json
File:
{"_id":{"$oid":"1"},"SEID":"bf2abd4c","entrytimestamp":{"$date":"2025-01-05T00:00:00.000Z"}}
{"_id":{"$oid":"2"},"SEID":"bf2abd4c","entrytimestamp":{"$date":"2025-01-07T00:00:00.000Z"}}
{"_id":{"$oid":"3"},"SEID":"bf2abd4c","entrytimestamp":{"$date":"2025-01-27T00:00:00.000Z"}}
{"_id":{"$oid":"4"},"SEID":"613200b325f2","entrytimestamp":{"$date":"2025-02-24T00:00:00.000Z"}}
{"_id":{"$oid":"5"},"SEID":"613200b325f2","entrytimestamp":{"$date":"2025-02-24T00:00:00.000Z"}}
{"_id":{"$oid":"6"},"SEID":"83ba","entrytimestamp":{"$date":"2025-03-06T00:00:00.000Z"}}
{"_id":{"$oid":"7"},"SEID":"83ba","entrytimestamp":{"$date":"2025-03-08T00:00:00.000Z"}}
{"_id":{"$oid":"8"},"SEID":"83ba","entrytimestamp":{"$date":"2025-03-29T00:00:00.000Z"}}
{"_id":{"$oid":"9"},"SEID":"2302","entrytimestamp":{"$date":"2025-05-07T00:00:00.000Z"}}
{"_id":{"$oid":"10"},"SEID":"2302","entrytimestamp":{"$date":"2025-05-07T00:00:00.000Z"}}
Expected output file:
{"_id":{"$oid":"4"},"SEID":"613200b325f2","entrytimestamp":{"$date":"2025-02-24T00:00:00.000Z"}}
{"_id":{"$oid":"5"},"SEID":"613200b325f2","entrytimestamp":{"$date":"2025-02-24T00:00:00.000Z"}}
takes a long time... 5 mins? 30 mins? 3 hrs? 4 days?grepon the host where the data file resides?awk(orperl,python, etc) to halt processing once you're 'past' the date of interest; obviously the biggest time savings would come with dates early in the month; for dates later in the month you could trytac file | awk '{scan_for_24th; exit_on_seeing_23rd}' | taccatandgrep? Simply usegrep '"entrytimestamp":{"\$date":"2025-02-24' /mnt/backup/mongoexport/logs_2025-04.json