0

I am loading 2 csv files using d3.js and would like to merge them. I am stuck, however, on something much more basic.

I have the following function that works fine:

function loadData(file) {
   d3.csv(file, function (d){...}, 
        function (data) {displayData(data);});
}

Now I am trying to refactor the code in a way I have loadData() to return data object, so I can call it twice, merge data arrays and call displayData() with the merged array.

I tried returning data:

function loadData(file) {
   d3.csv(file, function (d){...}, 
        function (data) {return data});

   return data;
}

using a global variable

 var gdata1 = {};
 var gdata2 = {};
 function loadData(file) {
   d3.csv(file, function (d){...}, 
        function (data) {gdata = data});

   gdata2 =  data;
}

among many other things, nothing seems to work.

Surprisingly,

using a global variable

 var gdata1 = {};
 var gdata2 = {};
 function loadData(file) {
   d3.csv(file, function (d){...}, 
        function (data) {gdata = data; displayData(gdata)});


}

works fine.

Can anyone please explain what is the best/right way of getting my data array out of displayData function and how to merge two data arrays (I expect data to be an array of maps, e.g. data[0] is a map).

4
  • 2
    Asynchronous functions can't return anything. Either make the second call from the callback function of the first one, and do the merge in the second callback, or use promises. Commented Feb 1, 2017 at 23:32
  • Thanks, I am now trying to call the functions recursively, seems to be working. Any idea how can I merge the data array (faster than going through the loop)? Commented Feb 2, 2017 at 0:15
  • 1
    This is much easier with d3-queue. Commented Feb 2, 2017 at 0:24
  • Seems like I need to install a plugin, also I am not clear that I can preserve data from each function - having a problem at the moment even calling functions recursively Commented Feb 2, 2017 at 1:24

4 Answers 4

2

loadData() should take a callback. Then you can load the second file in the callback of the first file.

function loadData(file, callback) {
    d3.csv(file, function(d) { ...}, callback);
}

loadData(file1, function(err1, data1) {
    loadData(file2, function(err2, data2) {
        // code to combine data1 and data2 and display result
    });
});

This has the disadvantage that it serializes the file accesses, so it's not as performant as using promises with Promise.all(), as in Thomas's answer.

To deal with an arbitrary number of files, you can pull them from an array, using a variable that you increment each time.

function loadNextFile(files, i, dataArray) {
    if (i >= files.length) {
        // merge dataArray and display it
    } else {
        loadData(files[i], function(err, data) {
            dataArray.push(data);
            loadNextFile(files, i+1, dataArray);
        }
    }
}
var filesToLoad = [...];
loadNextFile(filesToLoad, 0, []);
Sign up to request clarification or add additional context in comments.

5 Comments

just sth. small: if you pass the callback function to d3.csv, then it's signature should be (err1, data1) and (err2, data2)
And for further reader, this approach has one major downside: the second request is waiting for the first one to finish. It would be faster to run both requests in paralell. imo the easiest way to handle that would be promises, wich Barmar also mentioned in his first comment.
Since he commented on your answer that he doesn't want to use a plugin to turn this into a promise, and I didn't want to work out all the details to create a promise from scratch, I felt this answer would be good enough.
I didn't intent to say that it ain't good. It is easy to understand and easy to implement. I just wanted to mention for less experienced readers that there is a drawback to this approach or more precise, that it not as fast as it could be (and response-time is often a major factor for applications). I really appreciate simple (easily understandable) solutions because they tend to have fewer bugs.
Since I need to load multiple (> 2) files, is there a way to call the the callback function recursively. So far, I am not able to do it, as the data element seems to be preserved from the first call
2

Promises help you handle several things that are a bit unpleasant with callbacks. you should take a look at them.

And this tiny d3 plugin will make it work: https://github.com/kristw/d3.promise

Promise.all([
    d3.promise.csv(url1, formatter),
    d3.promise.csv(url2, formatter)
]).then(function(results){
    console.log('all files have been loaded');

    var formatedDataFromUrl1 = results[0];
    var formatedDataFromUrl2 = results[1];

    //proceed/combine/render the results as you wish
});

So basically d3.promise.csv replaces your loadData function.

Or you wrap it up as following to always use the same formatter:

function loadData(file) {
    return d3.promise.csv(file, function (d){...});
}

Edit:

unfortunately I cannot use any plugins, only "core" d3

Then you can basically copy-paste the whole plugin into your code, it's not that much, really ;)

For this special case, the core functionality can be boiled down to:

function loadCsv(url){
    return new Promise(function(resolve, reject){
        d3.csv(url, function (d){...}, function(err, data){
            if(err) reject(Error(err));
            else resolve(data);
        });
    });
}

The plugin pretty much just wraps a few more methods (like json, xml, ...) the same way, and is therefore a tiny bit more generic. You should take a look at the source code.

1 Comment

thanks Thomas, unfortunately I cannot use any plugins, only "core" d3
2

Managing the state of multiple concurrent requests and then syncing the results can be quite some work.

Managing state is one of the main purposes of Promises, and Promise.all is syncing and merging the results.

That's also the main purpose of the following code. Two things left to say:

  • this code ain't tested, it may contain some errors

  • I've commented pretty much everything in this code four you to understand what its purpose is/mechanics are, and what it is capable of, and how to approach different use cases for this monster. That's why this answer ended up so darn long.

Since the actual code to load a single file was so short and isolated, I decided to put that into an external function so you can reuse this whole code by only passing a different utility-function to do the actual request.

And because I prefer named mappings over plain arrays accessed by index (it's easier to not confuse names than indices), I've integrated this possibility too. If you don't know exactly what I mean by that, take a look at the examples after the main function.

And as additional sugar, and since it took only a minor tweak, I've made the returned function recursive, so it can deal with pretty much everything you pass to it as a "list" of urls.

function loadFilesFactory( loadFile ){
    function isNull(value){ return value === null }

    //takes an array of keys and an array of values and returns an object mapping keys to values.
    function zip(keys, values){
        return keys.reduce(function(acc, key, index){
            acc[key] = values[index];
            return acc;
        }, Object.create(null));  //if possible
        //}, {});  //if Object.create() doesn't work on the browser you need to support
    }

    //a recursive function that can take pretty much any composition as "url"
    //and will "resolve" to a similar composition of results, 
    //while loading everything in paralell
    //see the examples
    var recursiveLoadFilesFunction = function(arg, callback){
        if(arg !== Object(arg)){
            //arg is a primitive
            return loadFile(arg, callback);
        }

        if(!Array.isArray(arg)){
            //arg is an object
            var keys = Object.keys(arg);
            return recursiveLoadFilesFunction(keys.map(function(key){
                return arg[key];
            }), function(error, values){
                if(error) callback(error)
                else callback(null, zip(keys, values));
            });
        }

        //arg contains an array
        var length = arg.length;
        var pending = Array(length)
        var values = Array(length);

        //If there is no request-array anymore, 
        //then some (sync) request has already finished and thrown an error
        //no need to proceed 
        for(var i = 0; pending && i<length; ++i){
            //I'd prefer if I'd get the request-object to be able to abort this, in case I'd need to
            pending[i] = recursiveLoadFilesFunction(
                arg[i], 
                createCallbackFor(i) 

            //but if I don't get a sufficient value, I need at least to make sure that this is not null/undefined
            ) || true;
        }

        var createCallbackFor = function(index){
            return function(error, data){
                //I'm done, this shouldn't have been called anymore
                if(!pending || pending[index] === null) return;

                //this request is done, don't need this request-object anymore
                pending[index] = null;
                if(error){
                    //if there is an error, I'll terminate early 
                    //the assumption is, that all these requests are needed
                    //to perform, whatever the reason was you've requested all these files.
                    abort();
                    values = null; 
                }else{
                    //save this result
                    values[index] = data;
                }

                if(error || pending.every( isNull )){
                    pending = null; //says "I'm done"
                    callback(err, values);
                }
            }
        }

        var abort = function(){
            if(pending){
                //abort all pending requests
                pending.forEach(function(request){
                    if(request && typeof request.abort === "function") 
                        request.abort();
                });
                //cleanup
                pending = null;
            }
        }

        return { 
            //providing the ability to abort this whole batch.
            //either manually, or recursive
            abort: abort 
        }
    }

    return recursiveLoadFilesFunction;
}

This is the only part, that would change if you'd want to reuse this whole thing for let's say JSON files, or a different csv-formatting, or whatever

var loadCsvFiles = loadFilesFactory(function(url, callback){

    if(!url || typeof url !== "string"){
        callback(JSON.stringify(url) + ' is no valid url');
        return;
    }

    return d3.csv(url, function(d){ ... }, callback);
});

what can this code handle?

//plain urls, sure
loadCsvFiles('url', function(err, result){ ... })

//an array of urls, it's inital purpose
loadCsvFiles(['url1', 'url2', 'url3'], function(err, results){
    console.log(results[0], results[1], results[2]);
});

//urls mapped by property names, I've already mentioned that I prefer that over array indices
loadCsvFiles({
    foo: 'file1.csv',
    bar: 'file2.csv'
}, function(err, results){
    //where `results` resembles the structure of the passed mapping
    console.log(results.foo, results.bar);
})

//and through the recursive implementation, 
//pretty much every imaginable (non-circular) composition of the examples before
//that's where it gets really crazy/nice
loadCsvFiles({
    //mapping a key to a single url (and therefoere result)
    data: 'data.csv',

    //or one key to an array of results
    people: ['people1.csv', 'people2.csv'],

    //or a key to a sub-structure
    clients: {
        jim: 'clients/jim.csv',
        //no matter how many levels deep
        joe: {
            sr: 'clients/joe.sr.csv',
            jr: 'clients/joe.jr.csv',
        },
        //again arrays
        harry: [
            'clients/harry.part1.csv', 
            'clients/harry.part2.csv', 
            //and nested arrays are also possible
            [
                'clients/harry.part3a.csv',
                'clients/harry.part3b.csv'
            ]
        ]
    },

    //of course you can also add objects to Arrays
    images: [
        {
            thumbs: 'thumbs1.csv',
            full: 'full1.csv'
        },
        {
            thumbs: 'thumbs2.csv',
            full: 'full2.csv'
        }
    ]
}, function(err, results){
    //guess what you can access on the results object:
    console.log(
        results.data,
        results.people[0],
        results.people[1],
        results.clients.jim,
        results.clients.joe.sr,
        results.clients.joe.jr,
        results.clients.harry[0],
        results.clients.harry[1],
        results.clients.harry[2][0],
        results.clients.harry[2][1],
        results.images[0].thumbs,
        results.images[0].full,
        results.images[1].thumbs,
        results.images[1].full
    )
});

Especially this last example may not make any sense to you, in terms of an absurd structure for csv-files, but that's not the point. The point is, that it is completely up to you how you structure your data. Just pass it to this file loader and it will handle that.


And if you want this to support multiple file formats at once, it is also possible with a simple tweak:

var loadDifferentFiles = loadFilesFactory(function(url, callback){
    if(!url || typeof url !== "string"){
        callback(JSON.stringify(url) + ' is no valid url');
        return;
    }

    if(url.endsWith('.csv')){
        return d3.csv(url, callback);
    }

    if(url.endsWith('.json')){
        return d3.json(url, callback);
    }

    //return d3.text(url, callback);
    callback('unsupported filetype: ' + JSON.stringify(url));
});

or sth. like this

var loadDifferentFiles = loadFilesFactory(function(value, callback){
    if(typeof value !== "string"){
        if(value.endsWith('.csv')){
            return d3.csv(value, callback);
        }

        if(value.endsWith('.json')){
            return d3.json(value, callback);
        }
    }

    //but in this case, if I don't know how to handle a value
    //instead of resolving to an error, just forwarding the passed value to the callback, 
    //implying that it probably wasn't meant for this code.
    callback(null, value);
});

Comments

0

Thanks everyone. I ended up calling the function recursively:

var files = ['file1', 'file2', ...]
var alldata = [];


function loadData(files) {
   if(files.length == 0)
   {
    displayData('', alldata);
    return;
   }
   d3.csv(files[0],


    function(error, data) {
    ....
    alldata = alldata.concat(data);
    files.shift()
    loadData(files);
   });
}

I am sure the other solutions work too.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.