0

I want to add a header t my input strings. The header should be > directly followed by the string and the number after the string separated with a _ To add a header I used this awk '{print ">"$0;print}' However I dont kno how to add the number behind.

input:

CTTCTATGATGAATTTGATTGCATTGATCGTCTGACATGATAATGTATTT  2
AGAACGAAAGTCGGAGGTTCGAAGACGATC   14
TACCCTGTAGAACCGAANTTGT   1
TCCCTGTGGTCTAGTGGTTAGGATTCTGCGCTCTCACCGCCGCGGCCCGGG     2
GGGCCAGGATGAAACCTAATTTGAGTGGCCATCCATGGATGAGAAATGCGG 4
TAATACGGCCGGGTAATGATGGA 0
CCAGATGATGAACTTATTGACGGGCGGACAGAAACTGTGTGCTGATTGTCA 7240
CGCCCGATCTCGTCTGATCTCG  34
GCAGGGGTGGTTCAGTGGTAGAATTCTCGCC 3

output:

>CTTCTATGATGAATTTGATTGCATTGATCGTCTGACATGATAATGTATTT_2
 CTTCTATGATGAATTTGATTGCATTGATCGTCTGACATGATAATGTATTT
>AGAACGAAAGTCGGAGGTTCGAAGACGATC_14
 AGAACGAAAGTCGGAGGTTCGAAGACGATC
....
0

1 Answer 1

2
$ awk '{printf ">%s_%s\n %s\n",$1,$2,$1;}' file
>CTTCTATGATGAATTTGATTGCATTGATCGTCTGACATGATAATGTATTT_2
 CTTCTATGATGAATTTGATTGCATTGATCGTCTGACATGATAATGTATTT
>AGAACGAAAGTCGGAGGTTCGAAGACGATC_14
 AGAACGAAAGTCGGAGGTTCGAAGACGATC
>TACCCTGTAGAACCGAANTTGT_1
 TACCCTGTAGAACCGAANTTGT
>TCCCTGTGGTCTAGTGGTTAGGATTCTGCGCTCTCACCGCCGCGGCCCGGG_2
 TCCCTGTGGTCTAGTGGTTAGGATTCTGCGCTCTCACCGCCGCGGCCCGGG
>GGGCCAGGATGAAACCTAATTTGAGTGGCCATCCATGGATGAGAAATGCGG_4
 GGGCCAGGATGAAACCTAATTTGAGTGGCCATCCATGGATGAGAAATGCGG
>TAATACGGCCGGGTAATGATGGA_0
 TAATACGGCCGGGTAATGATGGA
>CCAGATGATGAACTTATTGACGGGCGGACAGAAACTGTGTGCTGATTGTCA_7240
 CCAGATGATGAACTTATTGACGGGCGGACAGAAACTGTGTGCTGATTGTCA
>CGCCCGATCTCGTCTGATCTCG_34
 CGCCCGATCTCGTCTGATCTCG
>GCAGGGGTGGTTCAGTGGTAGAATTCTCGCC_3
 GCAGGGGTGGTTCAGTGGTAGAATTCTCGCC

How it works

The awk script consists of a single command:

printf ">%s_%s\n %s\n",$1,$2,$1

By default, awk splits up input lines into fields based on white space. So, For the first line for example, field 1 is CTTCTATGATGAATTTGATTGCATTGATCGTCTGACATGATAATGTATTT and field 2 is 2. The printf allows us to rearrange the input into the desired format. For each input line, two lines are written. The first one, with format >%s_%s\n writes > followed by field 1 followed by _ followed by field 2 followed by a newline character. The format for the second output line is%s\n which outputs a space followed by field one followed by a newline character.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.