1

I want to count the number of starting space at the beginning of line. My sample text file is following

aaaa bbbb cccc dddd
  aaaa bbbb cccc dddd
    aaaa bbbb cccc dddd
aaaa bbbb cccc dddd

Now when I write a simple script to count, I notice the different between inline command and full script of awk ouput.

First try

#!/bin/bash
while IFS= read -r line; do
    echo "$line" | awk '
        {
            FS="[^ ]"
            print length($1)
        }
    '
done < "tmp"

The output is

4
4
4
4

Second try

#!/bin/bash
while IFS= read -r line; do
    echo "$line" | awk -F "[^ ]" '{print length($1)}'
done < "tmp"

The output is

0
2
4
0

I want to write a full script which has inline type output.
Could anyone explain me about this different? Thank you very much.

3
  • 3
    Hint: Try awk 'BEGIN { FS="[^ ]" } { print length($1) }' in your first one. Commented Oct 22, 2020 at 4:03
  • @Shawn thank you. But how can I change FS later on my script? Commented Oct 22, 2020 at 4:06
  • @rowboat The "inline type output" means the output of second try. I just notice that shell loop is slow, so I want to improve by changing pipeline subprocess. Is there any workaround? Should I use perl? Commented Oct 22, 2020 at 4:48

3 Answers 3

3

Fixed your first try:

$ while IFS= read -r line; do
    echo "$line" | awk '
                   BEGIN {              # you forgot the BEGIN
                       FS="[^ ]"        # gotta set FS before record is read
                   }
                   {
                       print length($1)
                   }' 
  done < file

Output now:

0
2
4
0

And to speed it up, just use awk for it:

$ awk '
BEGIN {
    FS="[^ ]"
}
{
    print length($1)
}' file
Sign up to request clarification or add additional context in comments.

Comments

3

Could you please try following without changing FS. Written and tested it in https://ideone.com/N8QcC8

awk '{if(match($0,/^ +/)){print RSTART+RLENGTH-1} else{print 0}}' Input_file

OR try:

awk '{match($0,/^ */); print RLENGTH}' Input_file

Output will be:

0
2
4
0

Explanation: in first solution simply using if and else condition. In if part I am using match function of awk and giving regex in it to match initial spaces of line in it. Then printing sum of RSTART+RLENGTH-1 to print number of spaces. Why it prints it because RSTART and RLENGTH are default variables of awk who gets set when a regex match is found.

On 2nd solution as per rowboat suggestion simply printing RLENGTH which will take care of printing 0 too without using if else condition.

3 Comments

thanks. It's fine now. But could you add more detail explanation for your script?
@rowboat, Thank you for letting know I have added that solution too now thank you cheers.
@quyleanh, please check now a detailed explanation is added let me know in case of any queries.
0

You can try Perl. Simply capture the leading spaces in a group and print its length. "a"=~/a/ is just to reset the regex captures at the end of each line.

perl -nle ' /(^\s+)/; print length($1)+0; "a"=~/a/ '  count_space.txt
0
2
4
0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.