2

I'm processing a file in awk.

I want to pass along the rows in the file that have blanks in column positions 25 through 34 and I want to do work on the rows that have blanks in column positions 10 through 19. Specifically I want to replace the blanks in columns positions 10 through 19 with 0s. That way the output file will have the original rows with blanks in 25-34 untouched. and the rows with blanks in 10-19 with have been replaced with '0's. So the output file will be the same as the input file only with zeros in the relevant rows in positions 10-19. The file looks like this:

###########################################
#########          ########################
###########################################
###########################################
###########################################
###########################################
###########################################
###########################################
#########          #####          #########
###########################################
###########################################
###########################################

I know I have to use an if block but I've never used one before in awk. The syntax below is what I think I need but please help me with the details. Specifically what I'm using to specify 'blanks' in the if statements.

I apologize ahead of time for the bad syntax. This is my first time using an If block in awk. I know the syntax doesn't work, which is one of the reasons I'm posting this.

cat scr2 | awk 'BEGIN {
    pos1=substr($0,25,10); 
    pos2=substr($0,10,10);

      if (pos1 = ^[[:blank:]]$) 
         printf $0 
      else if (pos2 == ^[[:blank:]]$)
         {val=substr($0,25,10)} 
         gsub(/ /,0,val){$0=substr($0,1,24) val substr($0,35)} 1}'`

The sample output would be :

###########################################
#########0000000000########################
###########################################
###########################################
###########################################
###########################################
###########################################
###########################################
#########          #####          #########
###########################################
###########################################
###########################################

So the row with blanks only at positions 10-19 gets changed and the row with blanks at both 10-19 and 25-34 get left alone.

4
  • 5
    Please do post expected sample output in your question. Commented Jul 15, 2021 at 16:52
  • How do you expect anyone to read that script? For pity's sake — use newlines; use newlines liberally. You seem to be missing /…/ delimiters around some regexes, and to be using == where you need ~ (the regex match operator). Or you're missing double quotes around strings. There is nothing in $0 inside the BEGIN block of an Awk script — nothing has been read when the BEGIN block is executed. Commented Jul 15, 2021 at 16:55
  • 1
    As an aside, get rid of the useless cat. Commented Jul 15, 2021 at 18:57
  • Follow-up question: stackoverflow.com/questions/68412741/… Commented Jul 16, 2021 at 19:54

4 Answers 4

3

With your shown samples, please try following awk code, written and tested in GNU awk, should work in any awk.

awk '
substr($0,10,10) ~ /^ +$/ && substr($0,20) !~ / / {
  $0=substr($0,1,9) "0000000000" substr($0,20)
}
1
' Input_file

Explanation: Simple explanation would be, checking 2 conditions in main program of awk. 1st to make sure position 10th to 20th contains only space AND 2nd rest of the line's values are NOT having spaces in it, if this is the case then enter zeroes in place of spaces and print edited/non-edited lines.

Sign up to request clarification or add additional context in comments.

10 Comments

@DavidC.Rankin, Thank you sir.
Always an interesting read, awk is getting less of a riddle to me now :-)
@Thefourthbird, your welcome, you are a champ in regex, I have same feeling for your answers too cheers.
Hi Guys, The answers are all great. I see some of the answers are 'seeing' both blocks of space and only taking action on the first. I think I should have been more specific. The data I'm working with is Bank data so I couldn't put actual numbers down. I thought excluding blanks anywhere would be fine. In my case, on the lines we don't want, most of the columns are blank so I tried to just use '#' for the 'generic' data. In reality the rows we want to screen out are mostly blank. They contain marco info. I'll add the new example rows in the main section.
@Carbon, Request you to please revert your question's latest update to previous one as many users had replied as per your previous question and it will be waste of their efforts as well as it will confuse users. You could open a fresh question for same and it could be discussed there.
|
3

I'd use sed here:

sed -E 's/^(.{9}) {10}(.{5}[^ ]{10})/\10000000000\2/' file

1 Comment

Very nicely done. (and much shorter than mine) I'll give it a nod.
3

Another option is using match() to fill the RSTART built-in variable specifying the start of the block of spaces. You can then use substr() in a regex comparison to verify the remainder of the line is comprised only of '#' characters. For example:

awk '{if (match($0,/[ ]{10}/) && RSTART == 10 && substr($0,20) ~ /^#*$/) sub(/[ ]{10}/,"0000000000")}1' file

The above will match() each line with 10-spaces beginning at column 10 and replace them with 10 '0's.

Example Use/Output

With your input in the file named file, you would have:

$ awk '{if (match($0,/[ ]{10}/) && RSTART == 10 && substr($0,20) ~ /^#*$/) sub(/[ ]{10}/,"0000000000")}1' lines
###########################################
#########0000000000########################
###########################################
###########################################
###########################################
###########################################
###########################################
###########################################
#########          #####          #########
###########################################
###########################################
###########################################

Comments

2

You can do it in awk without any if statement:

awk '{print gensub(/^(.{9}) {10}([^ ]{24})/, "\\10000000000\\2", "g")}' file

This will replace 10 blanks by 10 0 in positions 10 to 19 only on the lines where there are no blanks in positions 20 to 43, which is what you want, I guess.

1 Comment

Only note would be that gensub() is gawk and may not be available in other awks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.