Good algorithm for generating call graphs?

Question

I am writing some code to generate call graphs for a particular intermediate representation without executing it by statically scanning the IR code. The IR code itself is not too complex and I have a good understanding of what function call sequences look like so all I need to do is trace the calls. I am currently doing it the obvious way:

Keep track of where we are
If we encounter a function call, branch to that location, execute and come back
While branching put an edge between the caller and the callee

I am satisfied with where I am getting at but I want to make sure that I am not reinventing the wheel here and face corner cases. I am wondering if there are any accepted good algorithms (and/or design patterns) that do this efficiently?

UPDATE: The IR code is a byte-code disassembly from a homebrewn Java-like language and looks like the Jasmine specification.

Could you say a bit more about the IR that you want to analyze? What does it look like, and what its features related to binding and function calls? — phooji
– phooji, Commented Mar 6, 2011 at 17:16
@phooji: Sure. It is an IR generated by byte-code disassembly produced by JReveal Jasmine disassembler. It does not seem to describe anything about re-binding but I will post back if I find anything new. But I can definitely say there are no pointers because it is disassembling a Java programs. — Legend
– Legend, Commented Mar 6, 2011 at 21:49
I have taken a quick look at the Jasmine bytecode spec. The only thing that immediately caught my attention is the virtual method invocation; see updated answer below. — phooji
– phooji, Commented Mar 6, 2011 at 22:31
@phooji: Thank you for your time. I just accepted your answer and added a comment. — Legend
– Legend, Commented Mar 6, 2011 at 22:41

phooji · Accepted Answer · 2011-03-06 22:24:58Z

4

From an academic perspective, here are some considerations:

Do you care about being conservative / correct? For example, suppose the code you're analyzing contains a call through a function pointer. If you're just generating documentation, then it's not necessary to deal with this. If you're doing a code optimization that might go wrong, you will need to assume that 'call through pointer' means 'could be anything.'
Beware of exceptional execution paths. Your IR may or may not abstract this away from you, but keep in mind that many operations can throw both language-level exceptions as well as hardware interrupts. Again, it depends on what you want to do with the call graph later.
Consider how you'll deal with cycles (e.g. recursion, mutual recursion). This may affect how you write code for traversing the graphs later on (i.e., they will need some sort of 'visited' set to avoid traversing cycles forever).

Cheers.

Update March 6:

Based on extra information added to the original post:

Be careful about virtual method invocations. Keep in mind that, in general, it is unknowable which method will execute. You may have to assume that the call will go to any of the subclasses of a particular class. The standard example goes a bit like this: suppose you have an ArrayList<A>, and you have class B extends A. Based on a random number generator, you will add instances of A and B to the list. Now you call x.foo() for all x in the list, where foo() is a virtual method in A with an override in B. So, by just looking at the source code, there is no way of knowing whether the loop calls A.foo, B.foo, or both at run time.

edited Mar 6, 2011 at 22:24

answered Mar 6, 2011 at 6:18

phooji

10.4k3 gold badges41 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Ira Baxter Over a year ago

OP also has to be aware of run-time binding of symbols; he may not get a true picture of the actual call graph. Many langauges, including Python I think, will (re)bind a symbol when new code is loaded. So a static analysis of a call in function F to X interpreted by itself might not reflect that X is bound to a hairball later (or worse, bound to something that eventually calls F again).

phooji Over a year ago

@Ira Baxter: Great point. I'm not sure what language being analyzed looks like; I will ask the OP for clarification.

Legend Over a year ago

@phooji: +1 for the tips. I have accepted this as an answer. If you have some time, could you please elaborate a little more on virtual method invocations? There are three types currently in the code: invoke-static, invoke-direct and invoke-virtual. While I understood the part where you mentioned the random number generator example, I am still a bit unclear on when byte-code is diassembled to invoke-virtual and how dynamic re-binding helps. If all I am interested is constructing approximate call graphs, could you suggest any steps that I should be taking to solve the problem partially?

phooji Over a year ago

@Legend: Thanks for the upvote. I'm not familiar with the details of the JVM (or Jasmin)specification. However, if you can do a full roundtrip (compile-disassemble), then you can probably find out by doing some experimentation. Here is my guess for the foo virtual method if your code looks like A x = new B(); x.foo(): "push object ref x; push args (none); invokevirtual A.foo()." Note that, if this is indeed the case, then you will have to mimick virtual method lookups in your analysis code.

phooji Over a year ago

(Oh and 'dynamic re-binding' is just another example of a situation in which a (variable, method, class, module) name can be modified at run time.)

|

Russ · Accepted Answer · 2011-03-06 05:30:21Z

2

I don't know the algorithm, but pycallgraph does a decent job. It is worth checking out the source for it. It is not long and should be good for checking out existing design patterns.

answered Mar 6, 2011 at 5:30

Russ

11.4k13 gold badges46 silver badges57 bronze badges

1 Comment

Legend Over a year ago

+1 Thank you. Sorry about not clarifying properly. I updated my question. I came across pycallgraph but I do not want to execute the code but like to generate the call graph by parsing my intermediate code.

Collectives™ on Stack Overflow

Good algorithm for generating call graphs?

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related