0

So I got a DNA sequence.

ACCAGAGCGGCACAGCAGCGACATCAGCACTAGCACTAGCATCAGCATCAGCATCAGC
CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT
ACACCCCCCCCGGTGTGTGTGGGGGGTTAAAAATGATGAGTGATGAGTGAGTTGTGTG
CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT
TTCTATCATCATTCGGCGGGGGGATATATTATAGCGCGCGATTATTGCGCAGTCTACG
TCATCGACTACGATCAGCATCAGCATCAGCATCAGCATCGACTAGCATCAGCTACGAC

I need to count the bases.

Also for some reason it can sometimes it can alternate between upper or lowercase in the same string.

2
  • Please mark homework with the [homework] tag. Commented Nov 16, 2009 at 3:11
  • I think the question is perfectly fine. Even if it was homework, its an interesting problem. Why not ask it here ? +1 from me Commented Dec 2, 2012 at 12:07

1 Answer 1

7
for base in 'ACGT':
  print base, thesequence.count(base) + thesequence.count(base.lower())
Sign up to request clarification or add additional context in comments.

3 Comments

Out of curiosity, is there a reason you don't do thesequence.lower().count(base.lower()), instead? I'm guessing it's to make it faster, but I'm not 100% sure.
It's not necessarily faster this way, but it takes less memory. Since DNA sequences can be long this can be important.
Yep, as you need to do two passes anyway, it's better to have both be counting ones (memory-thrifty) rather than have one take up O(N) extra temporary memory. If you do have memory to spare, a single tmp = sequence.lower() outside the loop (then loop over 'acgt' in lowercase doing just tmp.count(base)) is going to be faster. A single pass with a finditer on a case-insensitive RE might be fastest, but a lot less simple than these approaches;-).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.