Yes, the two things you identified are pretty much the two critical components. Well, the third one is of course the language itself.
Take Language Workbenches for Language-Oriented Programming like the Intentional Domain Workbench by Intentional Software or MPS – Meta Programming System by JetBrains, for example: both of these work very similar, the actual program is represented as a semantic graph of objects; it is viewed and edited through projectional editors, i.e. editors which show you a projection of that semantic graph, as text, as a table, as a graph, whatever, and the editing actions you perform are then interpreted as graph transformations on the semantic graph. Both of them also have their own version control system which stores graphs and graph transformations instead of text files and diffs.
Typical Smalltalks are another example: everything is an object, even classes, methods, stack frames, local variables, the debugger, the editor, the IDE, the compiler, are objects. If you want to create a new subclass of String with some methods, you don't fire up the editor to write a new class, no, you call the subclass: method on String and it will return to you a class which is a subclass of String. In fact, you couldn't write a class if you wanted to: there is no syntax for writing classes in Smalltalk. Now, to add a method, you open a class browser and click on "Add method", or alternatively, you just call the method, and in the NoMethodError message that comes up (actually, Smalltalk uses the OO messaging metaphor much more pervasively than other languages, so the error is actually called MessageNotUnderstood), there will be a button that says "add method". And since Smalltalk exceptions are resumable, unlike Java's, C♯'s, Ruby's, Python's, ECMAScript's, etc., when you have added the method, you can just resume the program at the point right before the exception was raised. (Really, in Smalltalk you debug in the editor and you code in the debugger.) Again, you couldn't write a method even if you wanted to: there is no syntax for method definitions in Smalltalk, instead you define a method by calling a method to define a method and pass it the bytecode, which you in turn got by calling the compiler and passing it the method body. (Or, well, the IDE does that for you.)
There is no textual representation of a Smalltalk program. A Smalltalk system is just an object graph. You don't start or stop Smalltalk programs, they are always running. When you "stop" a Smalltalk program, what you are really doing, is just serialize the entire object graph to disk (this is called "creating an image" of the object memory), and the other way around for "starting" a program. It's actually the same as hibernating your laptop. You never "stop" a Smalltalk program, and you never create a new version. You edit the running program, while it is running, there is no distinction between design time and runtime, programming and debugging, IDE and program. And again, Smalltalks have their own version control systems, or more recently, different Smalltalk dialects have begun to standardize on Montecello.
Lisp is of course another example: the Lisp programming language is defined in terms of data structures, not in terms of text. A function definition is not defined as "the letter d followed be the letter e followed by the letter f followed by space followed by an identifier denoting the name followed by the character ( followed by …". A function definition is defined as "a list with four elements, the first element being the symbol (a built-in datatype similar to an interned string in Java) def, the second element being a symbol denoting the name, the third element being a list of symbols denoting the parameters and the fourth element being a list denoting the body of the function". All code is defined in terms of data structures. A function call is defined as a list of n+1 elements, the first being a symbol denoting the name of a variable which references a function object, followed by n arguments.
One difference between Lisp and the other examples is that Lisp has a standardized textual representation for parentheses-delimited, space-separated lists, quote-delimited strings, numbers, symbols, etc.
Graphical languages like Thyrd are also relevant.
Actually, if you think about it, modern IDEs also work this way: they work very hard to construct a full semantic model from your flat boring text files. Then, they work very hard to turn this rich, powerful, expressive semantic model back into a flat, slightly less boring (because its now colored, yay!) text editor. And when you edit something in that editor, they then again work very hard to infer from the textual input you made, the actual semantic transformations of the semantic graph. So, a modern IDE basically does what you want, except blindfolded and with its hand tied behind its back. You can sort-of see this in IDEA, which actually uses the same semantic tree for all languages, and where you can e.g. copy some code from a Scala file and paste it into a Java file, and it will actually appear as Java code.