2

I have a file such as

head testSed.fastq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:NGTCACTN+TATCCTCTCTTGAAGA
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:NATCAGCN+TAGATCGCCAAGTTAA
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:NCAGCAGN+TATCTTCTATAAATAT
NCAGCAGN

And I am attempting to replace the string after the final colon with 0 (in this example on lines 1,5,9 - but globally) using a regular expression.

I have checked my regex using egrep egrep '[ATGCN]{8}\+[ATGCN]{16}$' testSed.fastq which returns all the lines I would expect.

However when I try to use sed -i 's/[ATGCN]{8}\+[ATGCN]{16}$/0/g' testSed.fastq the original file is unchanged and no replacement occurs.

How can I fix this? Is my regex not specific enough?

3
  • 2
    you need to escape {} or use -E/-r Commented Oct 24, 2017 at 16:38
  • can you elaborate? Commented Oct 24, 2017 at 16:45
  • 1
    @skurp, user 123 is suggesting sed -E -i ... to enable egrep-style extended regular expressions. Commented Oct 24, 2017 at 16:45

2 Answers 2

2

Do you need a regex for this?

awk -F: -v OFS=: '/^@/ {$NF = "0"} 1' testfile

That won't save in-place. If you have GNU awk you can

gawk -F: -v OFS=: -i inplace '...' file

ref: https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html

Sign up to request clarification or add additional context in comments.

1 Comment

Note -i inplace is only 4.0+
1

Your regex is structured as an ERE rather than a BRE, which is sed's default interpretation. Not all sed implementations support ERE, but you can check man sed in your environment to determine whether it's possible for you. Look for -r or -E options. You can alternately use bounds by preceding the curly braces with backslashes.

That said, rather than matching the precise text in the last field, why not just look for the string that starts with a colon, and is followed by no-more-colons? The following RE is both BRE and ERE compatible.

$ sed '/^@/s/:[^:]*$/:0/' testq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:0
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:0
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:0
NCAGCAGN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.