4

I have a textfile with the following structure:

1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05 
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...

The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).

I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.

In case it's important, I'm doing this on a Mac.

3
  • Another point that is unclear... Does val 1 represent the first set of 8 lines, val 2 the second set of 8 lines, etc.? Commented Sep 29, 2009 at 16:55
  • Wait a minute... I just realized that there is probably just one number on each line, as opposed to a comma-separated set of values. I'm guessing 712,175 represents "seven-hundred twelve thousand, one-hundred seventy-five"? Commented Sep 29, 2009 at 17:09
  • Sorry. There should be 8 values each cycle. thanks! Commented Sep 29, 2009 at 17:13

6 Answers 6

9

EDIT: This is a shorter version of the code I previously had in my answer...

If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:

fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(@str2double,[data{2:end}])]';

The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.

Sign up to request clarification or add additional context in comments.

7 Comments

This is exactly what I wanted to do! Thanks a bunch. But - why is reading data so troublesome in matlab?... thanks again
@Fifth: The thing that made your case difficult was the usage of commas within the format of the number. Normally, commas would be used to separate numbers from one another, not to denote separation between thousands and millions within numbers. As you can see from Amro's example, the MATLAB code is trivial for a case with better-formatted numbers.
@Fifth: Actually, I was able to come up with an even shorter version of my code, comparable to Amro's compact answer without needing any preprocessing of the data file.
@Fifth: It should be noted that C-language scanf and C++ iostreams both choke on the commas in your example file. You would have to do a two-pass operation in those languages as well. C# Double.Parse() handles the comma, and I don't know about Java.
This is shorter indeed :) BTW you can you the 'CollectOutput' option on textscan, hence you avoid the call to cellfun: > data = textscan(%...%, 'CollectOutput',1); > M = [datenum(data{1}(:,1)) str2double(data{1}(:,2:end))];
|
4

I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):

cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' | 
            sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv

Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.

The output will look like this:

1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14

Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:

fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);

M = [datenum(a{1}) a{2}]

and the resulting matrix M is:

  730124     1100     1060   1092.5    0   6225   1336605    37
  730125   1122.5   1087.5   1122.5    0   3250    712175    14

Comments

3

Use a script to modify your text file into something that Matlab can read.

eg. make it a matrix:

M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605;  <-- notice the ';'
37
1999-01-05 
1,122.50
1,087.50
1,122.50
0
3,250;   <-- notice the ';'
712,175
14
...
]

import this into matlab and read the various vectors from the matrix.

Note: my matlab is a bit rusty. Might containt errors.

1 Comment

Thanks, I solved it that way. created a script in python to make me matrix the way you suggested!
2

It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.

% Read file as text
text = fileread('c:/data.txt');

% Split by line
x = regexp(text, '\n', 'split');

% Remove commas from numbers
x = regexprep(x, ',', '')

% Number of items per object
n = 8;

% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));

% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';

% Combine dates and numbers
thedata = [dates nums];

You could also look into the function textscan for alternative ways of solving the problem.

Comments

0

Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.

function vectors = readdata(filename)

fid=fopen(filename);

tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
    disp(tline)
    if counter > 0
        vectors{counter} = [vectors{counter} str2double(tline)];
    end
    counter = counter + 1
    if counter > 7
        counter = 0;
    end
    tline = fgetl(fid);
end

fclose(fid);

Comments

0

This has regular expression checking to make sure your data is formatted well.

fid = fopen('data.txt','rt');

%these will be your 8 value arrays
val1 = [];
val2 = [];
val3 = [];
val4 = [];
val5 = [];
val6 = [];
val7 = [];
val8 = [];

linenum = 0; % line number in file
valnum = 0; % number of value (1-8)

while 1
   line = fgetl(fid);
   linenum = linenum+1;
   if valnum == 8
      valnum = 1;
   else
      valnum = valnum+1;
   end

    %-- if reached end of file, end
    if isempty(line) | line == -1
      fclose(fid);
      break;
   end


   switch valnum
      case 1
         pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04)
      case 2
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00)  [valid up to 1billion-1]
      case 3
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00)  [valid up to 1billion-1]
      case 4
         pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50)  [valid up to 1billion-1]
      case 5
         pat = '(?\d+)'; % val5 (e.g. 0)
      case 6
         pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225)  [valid up to 1billion-1]
      case 7
         pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605)  [valid up to 1billion-1]
      case 8
         pat = '(?\d+)'; % val8 (e.g. 37)
      otherwise
         error('bad linenum')
   end

   l = regexp(line,pat,'names'); % l is for line
    if length(l) == 1 % match
      if valnum == 1
         serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date
         val1 = [val1;serialtime];
      else
         this_val = strrep(l.val,',',''); % strip out comma and convert to number
         eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array
      end
   else
      warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]);
   end
end

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.