1

I have a file which contains repeating strings. File is very large so I give a simple example:

a    b    c
w    a    g
b    v    f

I want to extract a b to an array. How can I do this in MATLAB?

2
  • Is there a reason/pattern how the string you want to extract (a b) relates to the repeating strings in your document? Commented May 14, 2013 at 17:45
  • so basically, you are looking for a specific string, which would be given as input, in a file. Does strfind do that for you? Commented May 14, 2013 at 17:56

2 Answers 2

2

Try using TEXTSCAN. You can split the file by '\n' and then by whitespace with cell2mat.

fid = fopen('your_string_file.ext');
input = textscan(fid, '%s', 'delimiter', '\n');
cellmatrix = cell2mat(input{1});


cellmatrix =
     a b c
     d f a
     b v f

Then if there is a specific pattern you want you can walk the cellmatrix. Assuming you want the a b pattern within a single row you could do the following:

pattern = ['a', 'b'];
patindex = 1;
dims = size(cellmatrix);
for i=1:dims(1)
    patindex = 1;
    for j=1:dims(2)
        if strcmp(cellmatrix(i,j), ' ')
            continue
        end
        if strcmp(cellmatrix(i,j), pattern(patindex))
            patindex = patindex+1;
            if patindex > length(pattern)
                FOUND... store location/do what you want
                patindex = 1;
            end
        else
            patindex = 1;
        end
    end
end

You can change your check to find whatever pattern you want from the matrix.

This assumes your file will fit into memory -- if it's too large to fit in half your memory you'll need to do something much trickier with incremental passes and file writing.

Sign up to request clarification or add additional context in comments.

4 Comments

But this just reads the file - I think @newzad wanted to find a string in that file.
Well he didn't specify the pattern he wanted so I interpreted it as he wanted to convert the text to arrays. I'll add how to grab a pattern from this read.
The for-loop is a kind of overkill here. "strcmp" can work already natively on cellarrays.
True if he wants to check for a single character/simple pattern. But since he didn't really specify the pattern very well I don't think this is really overkill.
2

After you have the cellmatrix from the answer 1!, you can use strcmp to create a true/falls matrix regarding you pattern:

strcmp(cellmatrix,'a')

If your file is very large, so it doesnt fit into you memory, try to read the file line-by-line using fgets:

fid = fopen('VERYBIGFILE');
tline = fgets(fid);
while ischar(tline)
    disp(tline)
    tline = fgets(fid);
    %% DO SOME STUF WITH THE LINE
end
fclose(fid);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.