Loading text file as a 2D array of strings without specifying the number of columns

Question

Suppose I have a plaintext file test.dat:

foo bar baz
qux ham spam

I know want to load this into Octave (or Matlab if necessary) as a two-dimensional cell array, preserving the structure encoded in whitespace and newlines. According to my understanding of the documentation, the following should be the way to go:

format = '%s';
file = fopen('test.dat');
data = textscan(file,format);
fclose(file);
disp(data);

However this only loads the data as a one-dimensional array:

{
  [1,1] = 
  {
    [1,1] = foo
    [2,1] = bar
    [3,1] = baz
    [4,1] = qux
    [5,1] = ham
    [6,1] = spam
  }
}

Explicitly specifying Delimiter, Whitespace, and EndOfLine does not help (what’s the point of the latter then?); neither does using other loading functions like textread or dlmread. What does work is using format = '%s%s%s' in the above but this requires that I somehow identify the number of columns, which the function should be able to do itself.

Thus I ask: Is there any built-in function that does what I want? I am not interested in ways to write such a function myself – I am confident that I can do this, but that’s exactly what I want to avoid (as I need to use this for demonstrating good practice, and thus not re-inventing the wheel).

Related Q&As (that all work with knowing the number of columns):

If you use %s as a format, textscan will treat the whole line as one string, so yes you do need to know the number of columns. Your only other option is to scan each line at a time using fgetl and then parse the resulting line using whatever separator you have to split each line into separate strings. — am304
– am304, Commented Jan 26, 2018 at 14:27
@am304: If you use %s as a format, textscan will treat the whole line as one string – No, it doesn’t. It loads each of the six elements individually; just the arrangement gets lost. — Wrzlprmft
– Wrzlprmft, Commented Jan 26, 2018 at 14:29

Wolfie · Accepted Answer · 2018-01-26 14:57:09Z

You can use readtable

data = readtable('test.txt', 'ReadVariableNames', false, 'Delimiter', ' ')

Output:

Var1     Var2      Var3 
_____    _____    ______

'foo'    'bar'    'baz' 
'qux'    'ham'    'spam'

If you wanted a cell, not a table, you could use

data = table2cell( data );

>> data = {'foo'    'bar'    'baz' 
           'qux'    'ham'    'spam'}

I'm not sure that readtable is an Octave method, it seems to be on GitHub but I have no installation to check. It was introduced to Matlab in 2013b.

You could use lower level actions, reading the lines one by one

fid = fopen('test.txt','r');
data = {};
while ~feof(fid)
    line = fgets(fid);       % Read line
    A = strsplit(line, ' '); % Split on spaces
    data(end+1, :) = A;      % Append to output
end
fclose(fid);

>> data = {'foo'    'bar'    'baz' 
           'qux'    'ham'    'spam'}

This method assumes each row of data will have the same number of elements (same number of delimiters in each line). If you can't assume that, then a safer way would be to do data{end+1,1} = A, then splitting the lines afterward.

The only function used in this method which isn't low level file I/O is strsplit. This is a built-in for Octave and Matlab.

rahnema1 · Accepted Answer · 2018-01-26 15:41:54Z

3

In Octave you can use csv2cell from the package io:

pkg load io
result = csv2cell('test.dat',' ')

answered Jan 26, 2018 at 15:41

rahnema1

15.9k3 gold badges17 silver badges28 bronze badges

Comments

Aristotelis · Accepted Answer · 2018-01-26 14:35:21Z

0

I would suggest that you have a look at fgetl() or fgets() functions. Basically you read the lines of the file and then you can apply your code with textscan() and get the "columns".

answered Jan 26, 2018 at 14:35

Aristotelis

1411 silver badge10 bronze badges

2 Comments

Wrzlprmft Over a year ago

This does not address my question which was explicitly about not implementing this myself. @Wolfie: This is not a valid comment either (it does nothing comments are for).

Wolfie Over a year ago

@Wrzlprmft You're right, I've deleted the comment as I planned to, was just trying to help Ari learn how the site works. I've also updated my answer to use Octave built-in methods

NZMark · Accepted Answer · 2020-07-17 09:52:09Z

0

I had the same problem. readtable.m was slow for me in Matlab, and fgetl examples are resizing in a loop. But perhaps an acceptable solution is based on this forum post: https://de.mathworks.com/matlabcentral/answers/476483-how-to-use-textscan-on-a-cell-array-without-a-loop

So, at least in newer Matlab:

fid=fopen(file,'r');
data=textscan(fid,'%s','Delimiter','\r\n');
fclose(fid);
data=split(data{1},';',1);

I haven't tested split.m for speed with large data though.

answered Jul 17, 2020 at 9:52

NZMark

1

1 Comment

NZMark Over a year ago

Sorry, I forgot to add a cell array transpose to bring the data back to col/row shape.

Collectives™ on Stack Overflow

Loading text file as a 2D array of strings without specifying the number of columns

4 Answers 4

Comments

Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related