0

For a small project of my own, I'm writing a parser that parses event logs from a certain application. Normally I'd have little issue with handling such a thing, but the problem is that strings from these logs do not always have the same parameters. For example, one such string could be:

DD/MM HH:MM:SS.MSEC TYPE_OF_EVENT SOURCE, SOURCE_FLAGS, TARGET, TARGET_FLAGS, PARAM1

On another occasion, the string could have a series of parameters, all the way up to 27 of them, the other has 16. Reading through the documentation, there is some logic in the parameters, for example, the 17th Parameters will always hold an integer. While that is good, unfortunately the 17th parameter might be the 7th thing on the string. The only thing that is really constant on every string is the time stamp and the 6th first parameters.

How would I go around parsing strings like these? I'm sorry if my question is a tad unclear, I find it difficult to word my problem.

4
  • 1
    Is there anything 'fixed' in the rest of the string that would let you figure out how many parameters would be present. e.g. is the number of parameters dependent on the TYPE_OF_EVENT field? Commented Feb 16, 2011 at 16:01
  • Any piece of code to show us ? Since your strings always begin with the same (timestamp & 6 parameters), you should start with that. Commented Feb 16, 2011 at 16:07
  • @Marc Yes, the parameters are added on a basis of TYPE_OF_EVENT, with source and source-flags (idem for target) being the only guaranteed fields. After it adds more parameters depending on the event. Commented Feb 16, 2011 at 16:09
  • @soju Unfortunately the log is created by a closed-source program, and I'm still in the phase where I'm trying to think of a solution to my problem before I really start coding anything (I only use conceptual code for this). Commented Feb 16, 2011 at 16:10

4 Answers 4

1

Ok, followup for my comment up at the top.

If the log's format is "constant" based on the TYPE_OF_EVENT field, you'll just have to do some simple pre-parsing, after which the rest should follow easily.

  1. read a line
  2. extract the universally common fields: timestamp, type of event, source/target
  3. based on type_of_event, do further analysis

    switch (event type) {
    case 'a': parse out 'a' event parameters
    case 'b': parse out 'b' event parameters
    default: log unknown event type for future analysis
    }

and so on.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that's the solution I came on too. Having someone else come with the same idea is a good help. Cheers. :)
1

I would use a different logging solution, or find a way to modify it so that you have empty place holders, item,,item3,,,item6 etc.

Just my opinion without knowing too much about this app - this app doesn't sound too good. I usually judge apps by factors like this, if there is not a good reason for the log file to be non-standardized then what do you think the rest of the code look like? :)

1 Comment

It's thoroughly documented, but the logging file is a little weird, the parameters change depending on the event type. There is some logic, like, all events starting with DAMAGE_ share a few parameters, then PERIODIC adds a few parameters. I could explain how the log works in full, but then I'd basically make this a far too long question. :P --- tl;dr: I know the log isn't great, but I got no alternative options.
1

That's not an input that can be "parsed" as such, because there are no fixed keywords to look out for. But regular expressions seem sufficient to extract and split up the contents.

http://regular-expressions.info/ has a good introduction, and https://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world lists a few cool tools that help in designing regular expressions.

In your case you would need \d+ for matching decimals, use delimiters literally, und you probably can get away with .*? separated by the , comma delimiters to find the individual parts. Maybe:

preg_match('#(\d+/\d+) (\d+:\d+:\d+.\d+) (\w+) (.*?),(.*),(.*),...#');

If there is a variable length of attributes, then you should prefer two regexps (though it can be done in one). First get the .* remainder of each line, then split it afterwards.

3 Comments

This got me off to a good start, not great, but not bad either. Unfortunately it seems I really have no choice to first parse the EVENT_TYPE, and on basis of that, start pulling other data. The answer is fine, the event log just isn't :(
@Jesse: That's a common problem with text based lists. They are easy to write out, and suitable for human consumption; but automated evaluation is always an issue due to its flexibility.
I suppose I would have to write a small library for parsing this kind of log then. Would be kind of messy, as I'd have to use different kind of functions for each event, or massive if/else statements (or a switch). At least you got me started, thanks. :)
0

How about splitting the string by the ", " separator and putting everything in an array. That way you'll have a numeric index to check if a parameter exists or not.

1 Comment

That was my initial assumption, but this doesn't work. For example, parameter 'AMOUNT' is the 16th parameter. But it isn't always there on the string, as there is no empty parts for things that are missing. Admittedly, the event log is a little crude, but I don't have an alternative.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.