0

I've a CSV file with following data:

20180101 170000;1.200370;1.201000;1.200370;1.201000;0
20180101 170100;1.200830;1.200950;1.200170;1.200300;0
20180101 170200;1.200350;1.200430;1.200350;1.200430;0
20180101 170300;1.200410;1.200500;1.200310;1.200460;0
20180101 170400;1.200490;1.200490;1.200460;1.200480;0
20180101 170500;1.200500;1.200500;1.200480;1.200480;0
20180101 170600;1.200500;1.200690;1.200320;1.200480;0
20180101 170700;1.200480;1.200540;1.200270;1.200500;0
20180101 170800;1.200510;1.200870;1.200470;1.200870;0
20180101 170900;1.200820;1.200970;1.200760;1.200940;0
20180101 171000;1.200940;1.200950;1.200760;1.200770;0
20180101 171100;1.200840;1.200880;1.200840;1.200880;0
20180101 171200;1.200880;1.200880;1.200790;1.200790;0
20180101 171300;1.200800;1.200800;1.200800;1.200800;0
20180101 171400;1.200770;1.200930;1.200770;1.200930;0
20180101 171500;1.200920;1.201050;1.200360;1.200360;0
20180101 171600;1.200380;1.200380;1.200380;1.200380;0
20180101 171700;1.200390;1.200390;1.200380;1.200380;0
20180101 171800;1.200420;1.200450;1.200400;1.200450;0
20180101 171900;1.200410;1.200500;1.200410;1.200500;0
20180101 172000;1.200530;1.200530;1.200440;1.200450;0
20180101 172200;1.200450;1.200450;1.200450;1.200450;0
20180101 172300;1.200450;1.200550;1.200450;1.200550;0

The first column is a date in the form YYYYMMDD HHmmss.

I want to import the csv file in matlab with the readmatrix command. What I obtain is

data = readmatrix("\Data.csv");
>> data(1:20,:)

ans =

       NaN    1.2004    1.2010    1.2004    1.2010         0
       NaN    1.2008    1.2009    1.2002    1.2003         0
       NaN    1.2004    1.2004    1.2004    1.2004         0
       NaN    1.2004    1.2005    1.2003    1.2005         0
       NaN    1.2005    1.2005    1.2005    1.2005         0
       NaN    1.2005    1.2005    1.2005    1.2005         0
       NaN    1.2005    1.2007    1.2003    1.2005         0
       NaN    1.2005    1.2005    1.2003    1.2005         0
       NaN    1.2005    1.2009    1.2005    1.2009         0
       NaN    1.2008    1.2010    1.2008    1.2009         0
       NaN    1.2009    1.2009    1.2008    1.2008         0
       NaN    1.2008    1.2009    1.2008    1.2009         0
       NaN    1.2009    1.2009    1.2008    1.2008         0
       NaN    1.2008    1.2008    1.2008    1.2008         0
       NaN    1.2008    1.2009    1.2008    1.2009         0
       NaN    1.2009    1.2010    1.2004    1.2004         0
       NaN    1.2004    1.2004    1.2004    1.2004         0
       NaN    1.2004    1.2004    1.2004    1.2004         0
       NaN    1.2004    1.2005    1.2004    1.2005         0
       NaN    1.2004    1.2005    1.2004    1.2005         0

The date is not imported correctly.

Since the file is pretty big, and since I'd like to do not modify it (it's in a public repository, and I should create a local copy, and the file is not only big, but I've plenty of them), is there a way to use readmatrix in order to read correctly also the datetime column?

In case is not possible, what should I do in order to modify the file (I'll create a local copy) in a way that can be imported by readmatrix?

1 Answer 1

1

You may want to use a different approach with textscan. (this is what MATLAB's uiopen-GUI uses if you click on create code).

enter image description here

The problem with readmatrix is that it cannot handle different datatypes. This becomes obvious if you try to concatnate a datetime-vector with a numerical matrix.

So you need to use a talbe, a cell, or split the reading.

function Dat = importfile(filename, startRow, endRow)
%IMPORTFILE Import numeric data from a text file as a matrix.
%   DAT = IMPORTFILE(FILENAME) Reads data from text file FILENAME for the
%   default selection.
%
%   DAT = IMPORTFILE(FILENAME, STARTROW, ENDROW) Reads data from rows
%   STARTROW through ENDROW of text file FILENAME.
%
% Example:
%   Dat = importfile('CSVFILE.csv', 1, 23);
%
%    See also TEXTSCAN.

% Auto-generated by MATLAB on 2020/03/07 21:59:43

%% Initialize variables.
delimiter = ';';
if nargin<=2
    startRow = 1;
    endRow = inf;
end

%% Read columns of data as text:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%s%s%s%[^\n\r]';

%% Open the text file.
fileID = fopen(filename,'r');

%% Read columns of data according to the format.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(1)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for block = 2:length(startRow)
    frewind(fileID);
    dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(block)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end

%% Close the text file.
fclose(fileID);

%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col = 1:length(dataArray)-1
    raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));

for col = [2,3,4,5,6]
    % Converts text in the input cell array to numbers. Replaced non-numeric
    % text with NaN.
    rawData = dataArray{col};
    for row=1:size(rawData, 1)
        % Create a regular expression to detect and remove non-numeric prefixes and
        % suffixes.
        regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
        try
            result = regexp(rawData(row), regexstr, 'names');
            numbers = result.numbers;

            % Detected commas in non-thousand locations.
            invalidThousandsSeparator = false;
            if numbers.contains(',')
                thousandsRegExp = '^[-/+]*\d+?(\,\d{3})*\.{0,1}\d*$';
                if isempty(regexp(numbers, thousandsRegExp, 'once'))
                    numbers = NaN;
                    invalidThousandsSeparator = true;
                end
            end
            % Convert numeric text to numbers.
            if ~invalidThousandsSeparator
                numbers = textscan(char(strrep(numbers, ',', '')), '%f');
                numericData(row, col) = numbers{1};
                raw{row, col} = numbers{1};
            end
        catch
            raw{row, col} = rawData{row};
        end
    end
end

% Convert the contents of columns with dates to MATLAB datetimes using the
% specified date format.
try
    dates{1} = datetime(dataArray{1}, 'Format', 'dd-MMM-yyyy HH:mm:ss', 'InputFormat', 'dd-MMM-yyyy HH:mm:ss');
catch
    try
        % Handle dates surrounded by quotes
        dataArray{1} = cellfun(@(x) x(2:end-1), dataArray{1}, 'UniformOutput', false);
        dates{1} = datetime(dataArray{1}, 'Format', 'dd-MMM-yyyy HH:mm:ss', 'InputFormat', 'dd-MMM-yyyy HH:mm:ss');
    catch
        dates{1} = repmat(datetime([NaN NaN NaN]), size(dataArray{1}));
    end
end

dates = dates(:,1);

%% Split data into numeric and string columns.
rawNumericColumns = raw(:, [2,3,4,5,6]);

%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),rawNumericColumns); % Find non-numeric cells
rawNumericColumns(R) = {NaN}; % Replace non-numeric cells

%% Create output variable
Dat = table;
Dat.date = dates{:, 1};
Dat.a = cell2mat(rawNumericColumns(:, 1));
Dat.b = cell2mat(rawNumericColumns(:, 2));
Dat.c = cell2mat(rawNumericColumns(:, 3));
Dat.d = cell2mat(rawNumericColumns(:, 4));
Dat.e = cell2mat(rawNumericColumns(:, 5));

end

It basically reads everything in as strings (formatSpec = '%s%s%s%s%s%s%[^\n\r]';) and stores it in the cell dataArray. Then it separates the numeric values from other types creating the variable numericData. Then (lines 87-101) it converts the non-numeric date-strings to datetimes

% Convert the contents of columns with dates to MATLAB datetimes using the
% specified date format.
dates{1} = datetime(dataArray{1}, 'Format', 'dd-MMM-yyyy HH:mm:ss','InputFormat', 'dd-MMM-yyyy HH:mm:ss');

Finally, everything is wraped up in a nice table and renamed according to my specifications

%% Create output variable
Dat = table;
Dat.date = dates{:, 1};
Dat.a = cell2mat(rawNumericColumns(:, 1));
Dat.b = cell2mat(rawNumericColumns(:, 2));
Dat.c = cell2mat(rawNumericColumns(:, 3));
Dat.d = cell2mat(rawNumericColumns(:, 4));
Dat.e = cell2mat(rawNumericColumns(:, 5));

I hoped that helped. It memory efficiency is your concern, do the conversion while scanning the csv-file and not as a subsequent step (as the auto-generated function suggests).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.