-2

I have a text file which has several lines of codons each line has a set of three nucleotide sequence , it can be either an A,T,G,C but only three of them in a line. (eg. ATC) now, I want to write a while loop that can read these lines and count them and give me the output the codon and the number of times it occurred in the file being the highest to the lowest.

you cant use awk in this loop but using only grep and uniq. Thanks

7
  • 2
    Why no awk? Is this some kind of a homework? Also, sort would be convenient. Commented Nov 3, 2019 at 21:08
  • 2
    I want to write Then do it. You can find much help online on how to read a file line by line or like counting unique lines. If you want others to do the job for you, try freelancing sites, where you offer money for others work. Commented Nov 3, 2019 at 21:11
  • Why use only grep and uniq? Why do you even need grep? Commented Nov 3, 2019 at 21:39
  • Thats how I was asked to do it. So only grep and uniq. Commented Nov 3, 2019 at 22:40
  • From your reply, plus the comments below the dash-o answer, your question now seems more complex. Could you please (a) show a simple example of the input (codons, other text) and the output you need, and (b) give some more details as to why exactly would someone only use grep and uniq, when other simpler and equally common tools exist. Especially because any solution with grep + uniq would be probably less efficient and harder for maintainers of your code than sort + uniq (which are very common). Or do you need to simply filter with grep -P '^[ACGT]{3}$' before sort | uniq -c' Commented Nov 4, 2019 at 2:57

1 Answer 1

2

You can combine grep (to filter lines that only have ATGC sequences, sort and uniq to count the distinct lines, then extra sort to order highest to lowest

grep '^[ATGC]\+$' | sort | |  uniq -c | sort -k1nr

This will work for reasonable size file (for sure for <1M lines). For larger files, consider awk/Perl/Python solution to avoid the overhead of sorting the complete file.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the reply. I know I can sort and uniq. I dont know how to use grep to search. usually if its a word or pattern then i can use grep -c 'xx'. In my case it could be an A, T, G or C and it can be only three of them per line.
Do you mean that there are other lines in the file that need to be filtered fro the sort ?
Yes. its a text file with several lines. I need to do parsing these and rank the words based on the number of times these words get repeated.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.