From your example we don't see much details like data types.
data.table join is currently single threaded. Some small parts of it uses multiple cores but this is AFAIR only finding order in the join columns. Computing matches goes in single thread.
Have in mind that parallelizing join is non trivial and will not scale that well as many other operations, so potential gains are much lesser than for in terms of grouping.
Anyway this computing matches is still super fast.
We run a benchmark where we compare join, one of the questions (question 5) is "big to big join" which seems to correspond to your scenario. https://h2oai.github.io/db-benchmark/
Below is 100M data size for join task. Q5 is a join of 100M LHS to 100M RHS:

You can see that data.table is pretty much on the top. Note that we are joining on single integer column there, so there is likely to be a difference to your scenario where you join on two columns.
Benchmark does not take into account possibility to pre-sort data. Try using setkey (instead of setorder) on your tables to sort them by join columns.
To be fair I believe it might not be easy to beat this kind of setup.
In future version computing matches of join will be parallelized as well, a drafts of that is already in repository.
setkeyis also appreciated.