47

I have heard that google app engine can run any programming language that can be transformed to Java bytecode via it's JVM. I wondered if it would be possible to convert LLVM bytecode to Java bytecode as it would be interesting to run languages that LLVM supports in the Google App Engine JVM.

3
  • AFAIK LLVM is a hardware/OS abstraction layered library rather than a byte code virtual machine. It provides some of the same advantages but need to be compiled from source for each target platform. Commented Feb 8, 2011 at 15:09
  • 3
    @Peter: No, you can interpret it and JIT-compile it (lli). But yes, the instructions are way more low-level and it's not really similar to other virtual machines. Commented Feb 8, 2011 at 15:15
  • @Ben, please reconsider the accepted answer in light of what I mention in stackoverflow.com/a/13540256/304330, thanks. Commented Jul 4, 2013 at 15:38

4 Answers 4

36

It does now appear possible to convert LLVM IR bytecode to Java bytecode, using the LLJVM interpreter.

There is an interesting Disqus comment (21/03/11) from Grzegorz of kraytracing.com which explains, along with code, how he has modified LLJVM's Java class output routine to emit non-monolithic Java classes which agree in number with the input C/C++ modules. He suggests that his technique seems to avoid the excessively long 'compound' Java Constructor method argument signatures usually generated by LLJVM, and he provides links to his modifications and examples.

Although LLJVM doesn't look like it's been in active development for a couple of years now, its still hosted on Github and some documentation can still be found at its former repository at GoogleCode:

LLJVM @ Github
LLJVM documentation @ GoogleCode

I also came across the 'Proteuscc' project which also utilises LLVM to output Java Byte code (it suggests that this is specifically for C/C++, although I assume the project could be modified or fed LLVM Intermediate Representation (IR)). From http://proteuscc.sourceforge.net:

The general process of producing a Java executable with Proteus then can be summarised as below.

  1. Generate human readable representation of the LLVM intermediate representation (ll file)
  2. Pass this ll file as an argument to the proteus compilation system
  3. The above will produce a Java jar file which can be executed or used as a library

I've extended a bash script to compile the latest versions of LLVM and Clang on Ubuntu, it can found be as a Github Gist,here.

[UPDATE 31/03/14] - LLJVM has seemed to have been dead for somewhile, however Howard Chu (https://github.com/hyc) looks to have made LLJVM compatible with the latest version of LLVM (3.3). See Howard's LLJVM-LLVM3.3 branch at Github, here

Sign up to request clarification or add additional context in comments.

Comments

9

I doubt you can, at least not without significant effort and run-time abstractions (e.g. building half a Von Neumann machine to execute certain opcodes). LLVM bitcode allows the full range of low-level unsafe "do what you want but we won't clean up the mess" features, from direct, raw, constructor-free memory allocation up to completely unchecked casts - real casts, not conversions -you can take i32 and bitcast it to to a %stuff * if you wish. Also, JVMs are heavily geared towards objects and methods, while the LLVM guys are lucky they have function pointers and structs.

On the other hand, it seems that C can be compiled to Java bytecode and LLVM bitcode can be compiled to Javascript (although many features, e.g. dynamic loading and stdlib functions, are lacking), so it should be possible, given enough effort.

4 Comments

So basically LLVM bitcode is far closer to assembly than Java Bytecode so I would have to somehow 'reclaim' all the information 'lost' when a program is converted to the lower-level representation if I wanted to run it in a JVM. Which I guess is pretty impossible.
@Ben: Yes, it's pretty much portable (well, kind of) assembly... in an even more low-level fashion than C. Not only you'd have to do quite a lot of work when reverse-engineering e.g. Ada code compiled with llvm-gcc, at least C and C++ can do many things Java bytecode simply doesn't permit (for better or worse). Likewise, LLVM permits these things but the JVM doesn't.
The classic example I go to: char *vga = (char *) 0xB8000. LLVM can handle that just fine. Pretty sure JVM bytecode cannot.
You can actually do any raw memory operations (and raw casts) you like in Java via sun.misc.Unsafe. Any byte code calling Unsafe methods is, of course, accessing the raw menory via native code (JNI), so any LLVM constructs that were translated to byte code which calls Unsafe methods are essentially doing the memory operations via C. It would extremely clunky doing some things via Unsafe. But one can imagine a more extensive library of functions than sun.misc.Unsafe specifically designed for supporting LLVM (or other C-like) memory ops, also implemented by cross-platform native calls.
7

Late to the discussion: Sulong executes LLVM IR on the JVM. It creates executable nodes (which are Java objects) from the LLVM IR instead of converting the LLVM IR to Java bytecode. These executable nodes form an AST interpreter. You can check out the project at https://github.com/graalvm/sulong or read a paper about it at http://dl.acm.org/citation.cfm?id=2998416. Disclaimer: I'm working on this project.

Comments

0

Read this: http://vmkit.llvm.org/. I am not sure that it will help you but it seems to be relevant.

Note: This project is not more maintained.

2 Comments

It's the reverse (allows building LLVM-based VMs that run e.g.Java/JVM languages on LLVM; OP wants to run LLVM languages on the JVM).
Fwiw, following that link: "The VMKit project is retired."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.