awk inline command and full script has different output

Question

I want to count the number of starting space at the beginning of line. My sample text file is following

aaaa bbbb cccc dddd
  aaaa bbbb cccc dddd
    aaaa bbbb cccc dddd
aaaa bbbb cccc dddd

Now when I write a simple script to count, I notice the different between inline command and full script of awk ouput.

First try

#!/bin/bash
while IFS= read -r line; do
    echo "$line" | awk '
        {
            FS="[^ ]"
            print length($1)
        }
    '
done < "tmp"

The output is

Second try

#!/bin/bash
while IFS= read -r line; do
    echo "$line" | awk -F "[^ ]" '{print length($1)}'
done < "tmp"

The output is

I want to write a full script which has inline type output.
Could anyone explain me about this different? Thank you very much.

Hint: Try awk 'BEGIN { FS="[^ ]" } { print length($1) }' in your first one. — Shawn
– Shawn, Commented Oct 22, 2020 at 4:03
@Shawn thank you. But how can I change FS later on my script? — quyleanh
– quyleanh, Commented Oct 22, 2020 at 4:06
@rowboat The "inline type output" means the output of second try. I just notice that shell loop is slow, so I want to improve by changing pipeline subprocess. Is there any workaround? Should I use perl? — quyleanh
– quyleanh, Commented Oct 22, 2020 at 4:48

James Brown · Accepted Answer · 2020-10-22 05:03:21Z

3

Fixed your first try:

$ while IFS= read -r line; do
    echo "$line" | awk '
                   BEGIN {              # you forgot the BEGIN
                       FS="[^ ]"        # gotta set FS before record is read
                   }
                   {
                       print length($1)
                   }' 
  done < file

Output now:

And to speed it up, just use awk for it:

$ awk '
BEGIN {
    FS="[^ ]"
}
{
    print length($1)
}' file

answered Oct 22, 2020 at 5:03

James Brown

37.7k8 gold badges52 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

RavinderSingh13 · Accepted Answer · 2020-10-22 04:44:02Z

3

Could you please try following without changing FS. Written and tested it in https://ideone.com/N8QcC8

awk '{if(match($0,/^ +/)){print RSTART+RLENGTH-1} else{print 0}}' Input_file

OR try:

awk '{match($0,/^ */); print RLENGTH}' Input_file

Output will be:

Explanation: in first solution simply using if and else condition. In if part I am using match function of awk and giving regex in it to match initial spaces of line in it. Then printing sum of RSTART+RLENGTH-1 to print number of spaces. Why it prints it because RSTART and RLENGTH are default variables of awk who gets set when a regex match is found.

On 2nd solution as per rowboat suggestion simply printing RLENGTH which will take care of printing 0 too without using if else condition.

edited Oct 22, 2020 at 4:44

answered Oct 22, 2020 at 4:31

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

3 Comments

quyleanh Over a year ago

thanks. It's fine now. But could you add more detail explanation for your script?

RavinderSingh13 Over a year ago

@rowboat, Thank you for letting know I have added that solution too now thank you cheers.

RavinderSingh13 Over a year ago

@quyleanh, please check now a detailed explanation is added let me know in case of any queries.

stack0114106 · Accepted Answer · 2020-10-24 04:53:30Z

0

You can try Perl. Simply capture the leading spaces in a group and print its length. "a"=~/a/ is just to reset the regex captures at the end of each line.

perl -nle ' /(^\s+)/; print length($1)+0; "a"=~/a/ '  count_space.txt
0
2
4
0

answered Oct 24, 2020 at 4:53

stack0114106

8,8934 gold badges16 silver badges40 bronze badges

Collectives™ on Stack Overflow

awk inline command and full script has different output

First try

Second try

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

First try

Second try

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related