I have a process which takes input data, processes it and outputs the data.During this it generates two logs IN.log and OUT.log
IN.log contains when the data came in and the of the data. OUT.log contains when the data was processed and the of the data. so... IN.log contains in-time id
OUT.log contains out-time id
Now, as part of processing using hadoop streams using python, I would like to join these two files and come with diff of intime and out time and the id of the data .
For eg:
2seconds id123
3seconds id112
Any pointers as to how this can be achieved using PYTHON?