0

I have a character array list and wish to tally the number of substring occurrences against an index held in a numerical vector chr:

list =
CCNNCCCNNNCNNCN

chr =

     1
     1
     1
     1
     2
     2
     2
     2
     2
     2
     2
     2
     2
     2
     2

Ordinarily, I am searching for adjacent string pairs i.e. 'NN' and utilise this method:

Count(:,1) = accumarray(chr(intersect([strfind(list,'CC')],find(~diff(chr)))),1);

Using ~diff(chr) to ensure the pattern matching does not cross index boundaries.

However, now I want to match single letter strings i.e. 'N' - how can I accomplish this? The above method means the last letter in each index is missed and not counted.

The desired result for the above example would be a two column matrix detailing the number of 'C's and 'N's within each index:

C     N
2     2
5     6

i.e. there are 2C's and 2N's within index '1' (stored in chr) - the count then restarts from 0 for the next '2' - where there are 5C's and 6N's.

1 Answer 1

3
[u, ~, v] = unique(list);          %// get unique labels for list in variable v
result = full(sparse(chr, v, 1));  %// accumulate combinations of chr and v

This works for an arbitrary number of letters in list, an arbitrary number of indices in chr, and chr not necessarily sorted.

In your example

list = 'CCNNCCCNNNCNNCN';
chr = [1 1 1 1 2 2 2 2 2 2 2 2 2 2 2].';

which produces

result =
     2     2
     5     6

The letter associated with each column of result is given by u:

u =
CN
Sign up to request clarification or add additional context in comments.

2 Comments

Very concise +1. What is the ~ doing within [u,~,v]?
~ is just a way of discarding the second output of unique. It would be functionally equivalent to [u, x, v] = unique(list); clear x

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.