I want to write a program which is able to search in source code files for specific patterns ... in other words: the input is a piece of code for example:
int fib (int i) {
int pred, result, temp;
pred = 1;
result = 0;
while (i > 0) {
temp = pred + result;
result = pred;
pred = temp;
i = i-1;
}
return(result);
}
The output are files that contain this piece of code or similar code.
In the Open Source World code is reused in other projects. Especially libraries are often copied into projects. To make bug fixing easier I need to be able to know in which projects specific libraries or code is used.
Therefore I want to try to use apache solr. I don't know if its a good idea (I am would be happy about everything that could help me)
My plan is to index my source code files ... therefore I need some tools? to tokenize source code files. Like give me all names of functions, variables etc. The output I can use to feed the solr index. But I am not sure maybe there are already tokenizer or dataimporthandler in apache solr that do the trick?