2

I have a character array (this can also be stored as a cell array if more useful) (list) and wish to tally the number of substring occurrences against two different indexes held in two separate variables type and ind.

list =
C C N N C U C N N N C N U N C N C

ind =
1 1 2 2 2 3 3 3 4 1 1 2 3 3 3 4 4 

type = 
15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16

No spaces exist in the character array - added for clarity.

Using the above example, the desired output would tally all instances of unique letters in list, for each ind and for each type - creating three columns (for C/N/U), each with 4 rows (for each ind) - per type. This is done using the order in which the entries in each array appear.

Desired output of above example (the labels are added for clarity only):

            Type 15              Type 16
   Ind  C      N      U      C      N      U
    1   2      0      0      1      1      0
    2   1      2      0      0      1      0
    3   1      1      1      1      1      1
    4   0      1      0      1      1      0

I am only aware of how to do this with a single index (using unique, full and sparse).

How can I bet go about doing this with a dual index?

5
  • Does list always contain only two letters? Commented Aug 22, 2015 at 10:00
  • No - as shown in this example, it always contains three (N,C and U). Commented Aug 22, 2015 at 10:01
  • Sorry, I had only seen C and N:-) Commented Aug 22, 2015 at 10:02
  • Then accumarray seems to be the best approach Commented Aug 22, 2015 at 10:03
  • @AnnaSchumann I added a solution using crosstab, seems to be the most appropriate for me. Commented Aug 22, 2015 at 10:46

2 Answers 2

3

One possibility could be to transform your letters to doubles by substracting e.g. -64 to map the number 3 to the letter C.

Then you can use unique with 'rows' and 'stable', to get the following result:

list = char('CCNNCUCNNNCNUNCNC')
ind = [1 1 2 2 2 3 3 3 4 1 1 2 3 3 3 4 4]
type = [15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16]

data = [type(:) ind(:) (list(:) - 64)]
[a,~,c] = unique(data,'rows','stable')
occ = accumarray(c,ones(size(c)),[],@numel)

output = [a, occ]

output =

    15     1     3     2
    15     2    14     2
    15     2     3     1
    15     3    21     1
    15     3     3     1
    15     3    14     1
    15     4    14     1
    16     1    14     1
    16     1     3     1
    16     2    14     1
    16     3    21     1
    16     3    14     1
    16     3     3     1
    16     4    14     1
    16     4     3     1

If you have the Statistics Toolbox you should consider using grpstats.


If you don't mind a mind twisting output then crosstab is the far easiest solution:

output = crosstab(type(:),ind(:),list(:)-64)

%// type in downwards, ind to the right
output(:,:,1) =   %// 'C'

     2     1     1     0
     1     0     1     1


output(:,:,2) =   %// 'N'

     0     2     1     1
     1     1     1     1


output(:,:,3) =  %// 'U'

     0     0     1     0
     0     0     1     0

The following one liner looks close like your desired output:

output2 = reshape(crosstab(ind(:),list(:)-64,type(:)),4,[],1)

output2 =

     2     0     0     1     1     0
     1     2     0     0     1     0
     1     1     1     1     1     1
     0     1     0     1     1     0

Also in this toolbox, you can find the tabulate function which offers another option in combination with accumarray:

[~,~,c] = unique([type(:) ind(:)],'rows','stable')
output = accumarray(c(:),list(:),[],@(x) {tabulate(x)} )

enter image description here

Which also allows the following output:

d = unique([type(:) ind(:) list(:)-64],'rows','stable')
output2 = [num2cell(d(:,[1,2])) vertcat(output{:})]

output2 = 

    [15]    [1]    'C'    [2]    [    100]
    [15]    [2]    'N'    [2]    [66.6667]
    [15]    [2]    'C'    [1]    [33.3333]
    [15]    [3]    'U'    [1]    [33.3333]
    [15]    [3]    'C'    [1]    [33.3333]
    [15]    [3]    'N'    [1]    [33.3333]
    [15]    [4]    'N'    [1]    [    100]
    [16]    [1]    'N'    [1]    [     50]
    [16]    [1]    'C'    [1]    [     50]
    [16]    [2]    'N'    [1]    [    100]
    [16]    [3]    'U'    [1]    [33.3333]
    [16]    [3]    'N'    [1]    [33.3333]
    [16]    [3]    'C'    [1]    [33.3333]
    [16]    [4]    'N'    [1]    [     50]
    [16]    [4]    'C'    [1]    [     50]
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant answer. +1 for crosstab and the one-liner. Very concise and simple. Thank you very much.
0

Use accumarray:

Output = accumarray([type',ind'],list');

Could be you need to convert type and list to numbers first using str2num and then use accumarray and transform the result back to numbers using num2str.

1 Comment

I'm having problems implementing this approach due to the data types at hand. I've tried to simplify this by using simple vectors for ind and type that contain purely numerical data. However 'list' must either be a cell array or character array. str2num returns an empty cell array when I attempt to convert it for use with accumarray.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.