1

I have a folder in which there are many files and I want to create a matrix that holds filenames with a specific pattern. For example: The folder contains files with names starting with a subject number (e.g. 03T1A.xxx.nii, 03T1A.yyy.nii) as well as filenames with specific patterns in the middle (e.g. 03T1A.c100.nii, 03T1A.c200.nii, 03T1A.c300.nii). In this specific case I am looking to extract all the filenames with the pattern c1 and c2 in the middle (e.g. 03T1A.c100.nii and 03T1A.c200.nii but not 03T1A.c300.nii).

To this point I have used the following code to create a pattern matching variable in 'pattern' which I would like to apply to the cell array of filenames I have extracted into the variable 'all_files' via the dir call.

func_path = char(strcat(input_dir, '/', subs(files), '/Func'));
pattern = 'c[12]*.nii'
all_files = dir(func_path); 
all_files = {all_files.name};

I'd like to use (read. practice) regexp and doing it with string input seems easy but I am 100% stumped as to how to do it with cell input. I started trying to do something like this:

files = all_files(cellfun(@(x)regexp(x, pattern));

But it doesn't work, obviously. Could someone help me figure out what to do here if my ultimate goal is to get a matrix output with just the relevant filenames? I've been searching MATLAB answers and other Stack Overflow posts but part of my problem is I don't understand what's happening in their code snippets. I took the above line (or at the least the beginning of it) from another post but I don't know what, for example, 'x' is (an output variable?) or what's going on in the larger command such as

fin = cellfun(@(x)regexp(x, '\.', 'split'), res, 'UniformOutput', false)

Which I found in another thread. So basically, can someone help me figure out a command that will work while explaining it to me?

5
  • 2
    regexp works natively on cell arrays, there is no need to use cellfun. See also: Regex 101 as a playground for building the expression. Commented Jan 27, 2017 at 14:30
  • I had originally tried to use the command: x = regexp(all_files, pattern, 'match') But it returns an empty cell array of the same size as 'all_files.' Commented Jan 27, 2017 at 14:37
  • Then your pattern didn't match anything. Commented Jan 27, 2017 at 14:39
  • Holy crap, you were absolutely right. If I had files c1004.EXAMPLE.nii and c2004.EXAMPLE.nii, for some reason the pattern 'c[12]*.nii' returns nothing but the pattern 'c[12]' returns both correctly. Is there any reason why this is? Commented Jan 27, 2017 at 14:43
  • 1
    @chainhomelow Yes, because c[12]*.nii will only match a "c" followed by only 1's and 2's before the .nii. You would need c[12].*\.nii so that you can match everything else in the string between c1 or c2 and the extension Commented Jan 27, 2017 at 14:53

1 Answer 1

2

A couple of recommendations for doing this sort of thing

  1. Do not use strcat and '/' characters to construct file paths. strcat trims whitespace from all inputs prior to concatenation (filenames may have actual leading or trailing whitespace) and also rather than hard-coding a file path separator such as '/' , use filesep or better yet use fullfile to construct the path to ensure that it will work on various platforms without problems.

    func_path = fullfile(input_dir, subs(files), 'Func');
    
  2. regexp works directly on cell arrays therefore you can simply do:

    all_files = dir(func_path); 
    
    % Search for the pattern in all filenames
    matches = regexp({all_files.name}, pattern);
    
    % Get the filenames of those that matched
    all_files = {all_files(~cellfun('isempty', matches)).name};
    
  3. Your pattern isn't matching any files because it currently would match only strings that contain a "c" with only zero or more 1's or 2's before the file extension. Instead, you'll want to use .* to match anything between the "c1" or "c2" and the filename. Also you'll want to not use a * after [12] since that will actually match c3 since that has zero 1's or 2's. Also you'll want to escape the . in .nii so that it's not treated like a wildcard. For your pattern I would use something like

    pattern = 'c[12].*\.nii';
    
  4. If you really don't want to work with regular expressions, you could avoid all of this by simply using wildcards in your dir call

    c1_files = dir(fullfile(func_path, '*c1*.nii'));
    c2_files = dir(fullfile(func_path, '*c2*.nii'));
    
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, this does the trick for sure. I was looking at the regexp usage and yeah the wildcard thing threw me off. Apparently you can also use 'c[12]\S*nii' as well but the syntax change threw me. The strcat comment was very helpful as well, and the 'isempty' line works for completeness to give me exactly the new array I need. Really appreciate it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.