6

In my ongoing effort to quench my undying thirst for more programming knowledge I have come up with the idea of attempting to write a (at least for now) simple programming language that compiles into bytecode. The problem is I don't know the first thing about language design. Does anyone have any advice on a methodology to build a parser and what the basic features every language should have? What reading would you recommend for language design? How high level should I be shooting for? Is it unrealistic to hope to be able to include a feature to allow one to inline bytecode in a way similar to gcc allowing inline assembler? Seeing I primarily code in C and Java which would be better for compiler writing?

4
  • Dupe of stackoverflow.com/questions/479013/… among many others. Also, you are asking too many questions - one at a time is a good rule. Commented Jul 30, 2009 at 18:18
  • 1
    And this stackoverflow.com/questions/1669/learning-to-write-a-compiler is the definitive SO answer on the subject area. Commented Jul 30, 2009 at 18:19
  • ok I'm sorry I didn't see it was a duplicate should it just be closed as a duplicate or should I delete the question? Commented Jul 30, 2009 at 18:23
  • possible duplicate of creating-your-own-language Commented Jul 21, 2014 at 12:48

3 Answers 3

3

There are so many ways...

You could look into stack languages and Forth. It's not very useful when it comes to designing other languages, but it's something that can be done very quickly.

You could look into functional languages. Most of them are based on a few simple concepts, and have simple parsing. And, yet, they are very powerful.

And, then, the traditional languages. They are the hardest. You'll need to learn about lexical analysers, parsers, LALR grammars, LL grammars, EBNF and regular languages just to get past the parsing.

Targeting a bytecode is not just a good idea – doing otherwise is just insane, and mostly useless, in a learning exercise.

Do yourself a favour, and look up books and tutorials about compilers.

Either C or Java will do. Java probably has an advantage, as object orientation is a good match for this type of task. My personal recommendation is Scala. It's a good language to do this type of thing, and it will teach you interesting things about language design along the way.

Sign up to request clarification or add additional context in comments.

4 Comments

"Targetting a bytecode is not simply a good idea" As opposed to targeting a real machine (e.g. x86), writing an interpreter, or something else? On the subject, does writing a compiler that targets even an "ideal" virtual machine (as opposed to a CPU where you have to worry about register allocation et al.) tend to be significantly more difficult than writing an interpreter? I'd imagine one could make compilation fairly easy by compiling to a tree instead of a flat byte string, but I've never done it before, and I'd like to know what other people have to say about it.
@Joey As opposed to targetting a real machine, indeed. Even compilers that generate machine code often generate an intermediary bytecode output (though the high end compilers may avoid doing so for maximum gains in compilation speed and available optimizations). Writing an interpreter is easier, indeed, particularly if you choose to write a dynamic language. As for compiling into a tree, a tree is the output of parsing, so it is definitely easier -- though not really all that much.
Oops, I got confused by your sentence (looked like you said targeting bytecode is insane and useless, but you said the opposite). Also, I was talking more along the lines of a tree tuned for execution, distinct from basic parser output (though in simple settings, they might be pretty much the same structure).
@Joey The point I was making is that the execution tree can be obtained through a transformation on the AST.
1

You might want to read a book on compilers first.

For really understanding what's going on, you'll likely want to write your code in C.

Java wouldn't be a bad choice if you wanted to write an interpreted language, such as Jython. But since it sounds like you want to compile down to machine code, it might be easier in C.

Comments

1

I recommend reading the following books:

ANTLR

Language Design Patterns

This will give you tools and techniques for creating parsers, lexers, and compilers for custom languages.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.