1

I have a CSV file that contains 21k records(1 word alphanumeric type/line). I need to read these records and send them to an API in JSON key-value pair format for some processing that accepts only 500 elements at a time. I have a solution on my mind but I wanted to know that is there a better or more efficient solution/Algorithm for this?

Algorithm:

  1. Load the CSV into an array
  2. Split this 1D array into N array with fix length of 500 columns(elements)
  3. With each of these N number of 500 element Array, prepare JSON payload and send to API.

Code:

var dataArray = [];

fs.readFile(inputPath, 'utf8', function (err, data) {
    dataArray = data.split(/\r?\n/);  
 })


var temp = [];
for(i=0;i<dataArray.length;){
  temp=[];
 for(j=0;(j<500 && i<dataArray.length);j++){  
    temp.push(data[i]);
    i++;
  }
  // make API call with current values of temp array
  makeCallToAPI(temp);
}
1
  • 'Better' is relative, if efficiency is the goal then I'd suggest you look at streaming the file as opposed to loading the whole thing into memory. Also, given you want to process the file in chunks, seems a more natural fit anyway. Commented Apr 19, 2021 at 18:47

1 Answer 1

1

I'd use lodash or underscore _.chunk(). Also note that both the fs and API are better handled async.

const _ = require('lodash');

async function callApi(chunk) {
  // return a promise that resolves with the result of the api
}

async function readFS(inputPath) {
  return new Promise((resolve, reject) => {
    fs.readFile(inputPath, 'utf8', function (err, data) {
      if (err) reject(err);
      else resolve(data.split(/\r?\n/));
    });
  });
}

async function doTheWork(inputPath) {
  const data = await readFS(inputPath);
  const chunks = _.chunk(data, 500)
  const promises = chunks.map(callApi)
  return _.flatten(Promise.all(promises));
}

Also note the use of _.flatten(), since the last Promise.all() will resolve to an array of arrays of chunks of promises.

Sign up to request clarification or add additional context in comments.

3 Comments

Unless I've misinterpreted this, it looks like it splits the file into chunks of 500 lines, and then calls the API for each of those individual lines? I think the requirement was to post that chunk to the API as a single call (the code example as well demos that). Personal opinion, but I'm not really sure you need lodash here, seems an unnecessary dependency (unless you already have it).
@James, you've interpreted correctly, and on second read, perhaps I've misinterpreted the OP. Those libraries are small, well tested, and full of useful goodies. I think chunk by itself is worth the import. I've fixed my answer (simplified it) to assume the api is a post of an array of data
But I wouldn't introduce a library dependency for a single simple function like this, though. (And I'm the author of a popular utility library.) Note that a simpler chunk could look like this: const chunk = (n) => (xs) => xs .length <= n ? [xs] : [xs .slice (0, n), ...chunk (n) (xs .slice (n))] and a non-recursive version wouldn't be much harder.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.