0

I could do this easily in R with grepl and row indexing, but wanted to try this in shell. I have a text file that looks like what I have below. I would like to find rows where It matches TWGX and wherever it match, I would like to concatenate column 1 and column 2 separated by _ and make it column values for both column 1 and column 2.

text:

NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   1   1
TWGX-MAP    10064-8036056040    0   0   0   -9
TWGX-MAP    11570-8036056502    0   0   0   -9
TWGX-MAP    11680-8036055912    0   0   0   -9

This is the result I want:

NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   1   1
TWGX-MAP_10064-8036056040   TWGX-MAP_10064-8036056040   0   0   0   -9
TWGX-MAP_11570-8036056502   TWGX-MAP_11570-8036056502   0   0   0   -9
TWGX-MAP_11680-8036055912   TWGX-MAP_11680-8036055912   0   0   0   -9
2
  • What's your field separator? Commented Jul 22, 2020 at 23:00
  • @Cyrus It is \t Commented Jul 22, 2020 at 23:00

1 Answer 1

1

The regex /TWGX/ selects the lines containing that string and applies the action that follows. The 1 is an awk shorthand that will print both the modified and unmodified lines.

$ awk 'BEGIN{FS=OFS="\t"} /TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}1' file
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   1   1
TWGX-MAP_10064-8036056040   TWGX-MAP_10064-8036056040   0   0   0   -9
TWGX-MAP_11570-8036056502   TWGX-MAP_11570-8036056502   0   0   0   -9
TWGX-MAP_11680-8036055912   TWGX-MAP_11680-8036055912   0   0   0   -9

BEGIN { FS = OFS = "\t" }
# Just once, before processing the file, set FS (file separator) and OFS (output file separator) to be the tab character

/TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}
# For every line that contains a match for TWGX create a mashup of the first two columns, and assign it to each of columns 1 and 2. (Note that in awk string concatenation is done by simply putting expressions next to one another)

1
# This is an awk idiom that consists of the pattern 1, which is always true. By not explicitly specifying an action to go with that pattern, the default action of printing the whole line will be executed.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.