0

I have a large CSV file of postcode data (~1.1GB), I am trying to filter out the data I need and then write an array of values to a JS file.

The issue is, that i'm always using too much memory and receiving this error:

Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

I have tried increasing the memory using this command: node --max-old-space-size=4096 fileName.js but I still hit my memory limit, it just takes longer!

Here is my code to write to the JS

const csvFilePath = "./data/postcodes.csv";
const csv = require("csvtojson");
const fs = require("fs");

csv()
    .fromFile(csvFilePath)
    .then((jsonArray) => {

        const inUsePostcodes = jsonArray.filter((x) => x["In Use?"] === "Yes").map((y) => y.Postcode);

        fs.writeFileSync("postcodes.js", inUsePostcodes);

    });

Here is a sample of postcodes.csv:

Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,District Code,Ward Code,Country,County Code,Constituency,Introduced,Terminated,Parish,National Park,Population,Households,Built up area,Built up sub-division,Lower layer super output area,Rural/urban,Region,Altitude,London zone,LSOA Code,Local authority,MSOA Code,Middle layer super output area,Parish Code,Census output area,Constituency Code,Index of Multiple Deprivation,Quality,User Type,Last updated,Nearest station,Distance to station,Postcode area,Postcode district,Police force,Water company,Plus Code,Average Income
AB1 0AA,No,57.101474,-2.242851,385386,801193,NJ853011,"","Aberdeen City","Lower Deeside",S12000033,S13002843,Scotland,S99999999,"Aberdeen South",1980-01-01,1996-06-01,"","",,,"","","Cults, Bieldside and Milltimber West - 02","Accessible small town",,46,,S01006514,,S02001237,"Cults, Bieldside and Milltimber West",,S00090303,S14000002,6808,1,0,2020-02-19,"Portlethen",8.31408,AB,AB1,"Scotland","Scottish Water",9C9V4Q24+HV,
AB1 0AB,No,57.102554,-2.246308,385177,801314,NJ851013,"","Aberdeen City","Lower Deeside",S12000033,S13002843,Scotland,S99999999,"Aberdeen South",1980-01-01,1996-06-01,"","",,,"","","Cults, Bieldside and Milltimber West - 02","Accessible small town",,61,,S01006514,,S02001237,"Cults, Bieldside and Milltimber West",,S00090303,S14000002,6808,1,0,2020-02-19,"Portlethen",8.55457,AB,AB1,"Scotland","Scottish Water",9C9V4Q33+2F,
AB1 0AD,No,57.100556,-2.248342,385053,801092,NJ850010,"","Aberdeen City","Lower Deeside",S12000033,S13002843,Scotland,S99999999,"Aberdeen South",1980-01-01,1996-06-01,"","",,,"","","Cults, Bieldside and Milltimber West - 02","Accessible small town",,45,,S01006514,,S02001237,"Cults, Bieldside and Milltimber West",,S00090399,S14000002,6808,1,0,2020-02-19,"Portlethen",8.54352,AB,AB1,"Scotland","Scottish Water",9C9V4Q22+6M, 

How can I write to the JS file from this CSV, without hitting my memory limit?

2
  • Perhaps a silly question and your use case may not allow for this... But have you considered parsing the postcode CSV sections-at-a-time via loop (maybe give it three or four passes), so that you don't need to worry about hitting your memory limit? Then just append to the JSON file or whatever you need to do. Commented Apr 12, 2020 at 22:18
  • I only need to generate this data once - so it would be OK - however, when I did increase my memory limit to 4GB(i need more ram), it only got through to postcodes beginning with "C", so it would be 8-9 passes. However, I think this issue may crop up again and it would be great to have a programmatic solution Commented Apr 12, 2020 at 22:27

2 Answers 2

2

You need a csv stream parser that will parse it and provide output a line at a time and let you stream it to a file.

Here's one way to do it using the cvs-reader module:

const fs = require('fs');
const csvReader = require('csv-reader');
const { Transform } = require('stream');

const myTransform = new Transform({
    readableObjectMode: true,
    writableObjectMode: true,
    transform(obj, encoding, callback) {
        let data = JSON.stringify(obj);
        if (this.tFirst) {
            // beginning of transformed data
            this.push("[");
            this.tFirst = false;
        } else {
            data = "," + data;    // add comma separator if not first object
        }
        this.push(data);
        callback();
  }
});
myTransform.tFirst = true;
myTransform._flush = function(callback) {
    // end of transformed data
    this.push("]");
    callback();
}

// All of these arguments are optional.
const options = { 
    skipEmptyLines: true,
    asObject: true,             // convert data to object
    parseNumbers: true, 
    parseBooleans: true, 
    trim: true 
};

const csvStream = new csvReader(options);
const readStream = fs.createReadStream('example.csv', 'utf8');
const writeStream = fs.createWriteStream('example.json', {autoClose: false});

readStream.on('error', err => {
     console.log(err);
     csvStream.destroy(err);
}).pipe(csvStream).pipe(myTransform).pipe(writeStream).on('error', err => {
    console.error(err);
}).on('finish', () => {
    console.log('done');
});
Sign up to request clarification or add additional context in comments.

Comments

0

The issue is that the csvtojson node module is trying to store this massive jsonObj in memory!

I found a different solution which involves using the csv-parser node module and then just parsed one row at a time instead of the whole csv!

Here is my solution:

const csv = require('csv-parser');
const fs = require('fs');
var stream = fs.createWriteStream("postcodes.js", {flags:'a'});
let first = false;
fs.createReadStream('./data/postcodes.csv')
  .pipe(csv())
  .on('data', (row) => {
      if (row["In Use?"]) {
          if (!first) {
              first = true;
              stream.write(`const postcodes = ["${row.Postcode},\n"`);
          } else {
            stream.write(`"${row.Postcode},\n"`);
          }

      }
  })
  .on('end', () => {
      stream.write("]");
    console.log('CSV file successfully processed');
  });

It's not very pretty writing strings like const postcodes = to represent JavaScript, but it performs the desired function.

2 Comments

You need flow control on your stream.write() calls. If they return false, you have to wait until the drain event before calling them again.
interesting link to the doc of the drain event : nodejs.org/api/stream.html#event-drain

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.