2

I have a myriad of text files of size 300k+ lines.

The files are in this general format:

Username <user> filename <file>
<some large amount of text on one line>
...

The text file has this strict format- one line of formatted header text, followed by one really long line, which is the meat and potatoes of the file.

What I want to do is go through the file and for every set of lines (a set consisting of headers and the one line) look for some matching string within this long line .

If the string is there, then I want to print user and file. If not, then we continue on and don't print anything. And for those who will ask, the point of this exercise is just to print this out and then i will do some manipulation at a later point.

I know how to do this, but it is sort of brute force- just store the user and file when you detect them and if we detect the matching string, we print user and file. If not, just continue. However, this is extremely inefficient:

#!/usr/bin/sh
##not exact, just roughly what i am doing
while read line; do
if [[ $line =~ Username ([^ ]+) filename ([^ ]+) ]];then
    #store our variables
    continue
fi
if [[ $line =~ "string" ]];then
     #print user and file
fi
done < inputfile

Basically, is there some efficient way to detect the string I am looking for, then look back x number of lines (x corresponding to number of header lines) and then pull out the info I need? Thanks

PS Not so set on doing this in bash- perl works too.

EDIT: DESIRED OUTPUT

 <user>, <file>
 <user>, <file>
 ...
5
  • Are there a fixed number of <more header text> lines between the Username line and the line you want to match? Can you also include some example data of what to match and what to not match? Commented Nov 6, 2014 at 23:10
  • I made a small edit- let's just assume that there is only one header line, and the matching string really doesn't matter... just important to know that it matches some $string Commented Nov 6, 2014 at 23:13
  • @user3979986: That's very hazy! You want to print the user and file fields if the line immediately following matches any $string. Meaning any random string anywhere? How odd. Commented Nov 7, 2014 at 0:13
  • The way to explain this is to give a short example of input which includes some user and file values that are printed and some that are not. Even a complete working version of your shell script would help enormously. Commented Nov 7, 2014 at 0:14
  • 2
    Any time you write a loop in shell just to transform text you have the wrong approach. Show some actual sample input, not just a description of your input format, along with the desired output given that input. And your sample input line does NOT need to be "really long" to demonstrate your problem. Commented Nov 7, 2014 at 3:41

2 Answers 2

1

For really heavy text processing like this, perl is a good choice:

perl -nE '
  if ($. % 2 == 1) {
    ($user, $file) = (split ' ')[1,3];
  } 
  elsif (/search string/) {
    say "$user, $file";
  }
' file1 file2 ...

That can be "golfed" down to a more terse one-liner, if you like that kind of thing.

Sign up to request clarification or add additional context in comments.

Comments

1

Awk solution, relying on each record being two lines (and the first line of the file being the header for the first record):

NR%2 { name = $2; file =$4; next }
/string/ { print name, file }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.