I was thinking the same a few weeks ago but didn't ask the question as the distinction between software and data sources is still unresolved, my last question on that side was too data source oriented and got closed, so I was waiting for a bit more clarification. I guess the "upload source code from web interface" feature should make the question on topic. Personally I wasn't even looking for custom made programs to be uploaded (as this narrows down the options) but to begin with simply a database of benchmarks of commonly used libraries, instead of having to do it myself each time like in Fastest Python library to read a CSV file or What are the fastest libraries to compute continuous 1-D wavelet transform (like Matlab's cwt())?.
Anyway the only platform that I'm aware of that does allow users to upload your code and benchmark is http://mlcomp.org/ (free / web interface / I don't know about JavaScript support). Another interesting example is http://www.wordvectors.org/ , which focuses on just one task, namely training word vectors, and compare the algorithm performance not directly but by benchmarking their output.
Now one aspect that raises the complexity of such platform is that the algorithm performance can be measured in several ways: RAM use, disk IO, runtime, and also how good the results are (as you know there is often a trade-off between result quality and runtime). Sometimes even the standard metrics are pretty off such as BLEU ("The system which was ranked highest by the human judges was only ranked 6th by BLEU"), so it makes me wonder whether ideally there should be some crowdsourcing module to assess the accuracy of some type of algorithms.
Another one is that users might want to try with their own data, since performances can be highly dependent on how the data look like, and it's hard to provide a benchmark that covers all cases. Also, in well explored fields over time some datasets have been established as the standard for the benchmark such as some subfields of natural language processing, and we suspect that some of the "state-of-the-art" algorithms are a bit too customized to yield a high accuracy on them (e.g. Google had some surprises when applying the same NLP algorithms to web data), just like compilers, web browsers, GPUs and so on always pay attention to how they compare on the most used benchmarks, so people sometimes wonder to what extent the results apply to their own situation.
OK this was mostly some random notes (a bit biased toward machine learning) in case one day I decide to launch such website if nobody still hasn't done it...