Efficient way to parse txt file in bash/perl

Question

I have a myriad of text files of size 300k+ lines.

The files are in this general format:

Username <user> filename <file>
<some large amount of text on one line>
...

The text file has this strict format- one line of formatted header text, followed by one really long line, which is the meat and potatoes of the file.

What I want to do is go through the file and for every set of lines (a set consisting of headers and the one line) look for some matching string within this long line .

If the string is there, then I want to print user and file. If not, then we continue on and don't print anything. And for those who will ask, the point of this exercise is just to print this out and then i will do some manipulation at a later point.

I know how to do this, but it is sort of brute force- just store the user and file when you detect them and if we detect the matching string, we print user and file. If not, just continue. However, this is extremely inefficient:

#!/usr/bin/sh
##not exact, just roughly what i am doing
while read line; do
if [[ $line =~ Username ([^ ]+) filename ([^ ]+) ]];then
    #store our variables
    continue
fi
if [[ $line =~ "string" ]];then
     #print user and file
fi
done < inputfile

Basically, is there some efficient way to detect the string I am looking for, then look back x number of lines (x corresponding to number of header lines) and then pull out the info I need? Thanks

PS Not so set on doing this in bash- perl works too.

EDIT: DESIRED OUTPUT

 <user>, <file>
 <user>, <file>
 ...

Are there a fixed number of <more header text> lines between the Username line and the line you want to match? Can you also include some example data of what to match and what to not match? — Mr. Llama
– Mr. Llama, Commented Nov 6, 2014 at 23:10
I made a small edit- let's just assume that there is only one header line, and the matching string really doesn't matter... just important to know that it matches some $string — user3979986
– user3979986, Commented Nov 6, 2014 at 23:13
@user3979986: That's very hazy! You want to print the user and file fields if the line immediately following matches any $string. Meaning any random string anywhere? How odd. — Borodin
– Borodin, Commented Nov 7, 2014 at 0:13
The way to explain this is to give a short example of input which includes some user and file values that are printed and some that are not. Even a complete working version of your shell script would help enormously. — Borodin
– Borodin, Commented Nov 7, 2014 at 0:14
Any time you write a loop in shell just to transform text you have the wrong approach. Show some actual sample input, not just a description of your input format, along with the desired output given that input. And your sample input line does NOT need to be "really long" to demonstrate your problem. — Ed Morton
– Ed Morton, Commented Nov 7, 2014 at 3:41

glenn jackman · Accepted Answer · 2014-11-07 01:39:25Z

1

For really heavy text processing like this, perl is a good choice:

perl -nE '
  if ($. % 2 == 1) {
    ($user, $file) = (split ' ')[1,3];
  } 
  elsif (/search string/) {
    say "$user, $file";
  }
' file1 file2 ...

That can be "golfed" down to a more terse one-liner, if you like that kind of thing.

answered Nov 7, 2014 at 1:39

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Thomas Foster · Accepted Answer · 2014-11-07 01:46:03Z

1

Awk solution, relying on each record being two lines (and the first line of the file being the header for the first record):

NR%2 { name = $2; file =$4; next }
/string/ { print name, file }

answered Nov 7, 2014 at 1:46

Thomas Foster

3211 silver badge4 bronze badges

Collectives™ on Stack Overflow

Efficient way to parse txt file in bash/perl

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related