3

I have a comma separated file with 182 rows and 501 columns, of which 500 columns are of type number (features) while the last column are strings (labels).

Example: 182x501 dimension

1,3,4,6,.........7, ABC
4,5,6,4,.........9, XYZ
3,4,5,3,.........2, ABC 

How can I load this file so it will have a data set with a matrix, B, containing the number as my features, and a vector, C, containing the strings as my labels?

d = dataset(B, C);

3 Answers 3

4

Build a format specifier for textscan based on the number and types of columns, and have it read the file for you.

nNumberCols = 500;
format = [repmat('%f,', [1 nNumberCols]) '%s'];
fid = fopen(file);
x = textscan(fid, format);
fclose(fid);
B = cat(2, x{1:nNumberCols});
C = x{end};
Sign up to request clarification or add additional context in comments.

Comments

3

You could use the textscan function. For example:

fid = fopen('test.dat');

% Read numbers and string into a cell array
data = textscan(fid, '%s %s');

% Then extract the numbers and strings into their own cell arrays
nums = data{1};
str  = data{2};

% Convert string of numbers to numbers
for i = 1:length(str)
    nums{i} = str2num(nums{i}); %#ok<ST2NM>
end

% Finally, convert cell array of numbers to a matrix
nums = cell2mat(nums);

fclose(fid);

Note that I have made a number of assumptions here, based on the file format you have specified. For example, I assume that there are no spaces after the commas following a number, but that there is a space immediately preceding the string at the end of each line.

To can make the above code more flexible by using a more considered format specifier (the second argument to textscan). See the section Basic Conversion Specifiers in the textscan documentation.

Comments

3

For example, if you have the following data in a file named data.txt:

1,3,4,6,7, ABC
4,5,6,4,9, XYZ
3,4,5,3,2, ABC 

you can read it into a matrix B and a cell array C using the code

N = 5; % Number of numeric data to read
fid = fopen('data.txt');
B = []; C = {};
while ~feof(fid)  % repeat until end of file is reached
  b = fscanf(fid, '%f,', N); % read N numeric data separated by a comma
  c = fscanf(fid, '%s', 1);  % read a string
  B = [B, b];
  C = [C, c];
end
C
B
fclose(fid);

to give

C = 
  'ABC'    'XYZ'    'ABC'
B =
 1     4     3
 3     5     4
 4     6     5
 6     4     3
 7     9     2

3 Comments

I think you have an error at C=[C,c]... It should probably either be C={C,c} or, more likely, something like C(end+1)={c}. I don't remember the exact syntax, sorry.
@eykanal. It is not an error. Array concatenation operator [] works for cell arrays as well, as you can confirm using `C = {};c = 'a';C = [C, c], C = [C, c]'.
You mean c={'a'}; c=[c c];, but yeah, I just tested it, you're right.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.