String Parsing in PHP

Question

For a small project of my own, I'm writing a parser that parses event logs from a certain application. Normally I'd have little issue with handling such a thing, but the problem is that strings from these logs do not always have the same parameters. For example, one such string could be:

DD/MM HH:MM:SS.MSEC TYPE_OF_EVENT SOURCE, SOURCE_FLAGS, TARGET, TARGET_FLAGS, PARAM1

On another occasion, the string could have a series of parameters, all the way up to 27 of them, the other has 16. Reading through the documentation, there is some logic in the parameters, for example, the 17th Parameters will always hold an integer. While that is good, unfortunately the 17th parameter might be the 7th thing on the string. The only thing that is really constant on every string is the time stamp and the 6th first parameters.

How would I go around parsing strings like these? I'm sorry if my question is a tad unclear, I find it difficult to word my problem.

Is there anything 'fixed' in the rest of the string that would let you figure out how many parameters would be present. e.g. is the number of parameters dependent on the TYPE_OF_EVENT field? — Marc B
– Marc B, Commented Feb 16, 2011 at 16:01
Any piece of code to show us ? Since your strings always begin with the same (timestamp & 6 parameters), you should start with that. — soju
– soju, Commented Feb 16, 2011 at 16:07
@Marc Yes, the parameters are added on a basis of TYPE_OF_EVENT, with source and source-flags (idem for target) being the only guaranteed fields. After it adds more parameters depending on the event. — Jesse Brands
– Jesse Brands, Commented Feb 16, 2011 at 16:09
@soju Unfortunately the log is created by a closed-source program, and I'm still in the phase where I'm trying to think of a solution to my problem before I really start coding anything (I only use conceptual code for this). — Jesse Brands
– Jesse Brands, Commented Feb 16, 2011 at 16:10

Marc B · Accepted Answer · 2011-02-16 16:18:56Z

1

Ok, followup for my comment up at the top.

If the log's format is "constant" based on the TYPE_OF_EVENT field, you'll just have to do some simple pre-parsing, after which the rest should follow easily.

read a line
extract the universally common fields: timestamp, type of event, source/target
based on type_of_event, do further analysis

switch (event type) { case 'a': parse out 'a' event parameters case 'b': parse out 'b' event parameters default: log unknown event type for future analysis }

and so on.

answered Feb 16, 2011 at 16:18

Marc B

362k44 gold badges433 silver badges508 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jesse Brands Over a year ago

Thanks, that's the solution I came on too. Having someone else come with the same idea is a good help. Cheers. :)

Payson Welch · Accepted Answer · 2011-02-16 15:59:21Z

1

I would use a different logging solution, or find a way to modify it so that you have empty place holders, item,,item3,,,item6 etc.

Just my opinion without knowing too much about this app - this app doesn't sound too good. I usually judge apps by factors like this, if there is not a good reason for the log file to be non-standardized then what do you think the rest of the code look like? :)

answered Feb 16, 2011 at 15:59

Payson Welch

1,4282 gold badges18 silver badges29 bronze badges

1 Comment

Jesse Brands Over a year ago

It's thoroughly documented, but the logging file is a little weird, the parameters change depending on the event type. There is some logic, like, all events starting with DAMAGE_ share a few parameters, then PERIODIC adds a few parameters. I could explain how the log works in full, but then I'd basically make this a far too long question. :P --- tl;dr: I know the log isn't great, but I got no alternative options.

Community · Accepted Answer · 2017-05-23 12:04:23Z

1

That's not an input that can be "parsed" as such, because there are no fixed keywords to look out for. But regular expressions seem sufficient to extract and split up the contents.

http://regular-expressions.info/ has a good introduction, and https://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world lists a few cool tools that help in designing regular expressions.

In your case you would need \d+ for matching decimals, use delimiters literally, und you probably can get away with .*? separated by the , comma delimiters to find the individual parts. Maybe:

preg_match('#(\d+/\d+) (\d+:\d+:\d+.\d+) (\w+) (.*?),(.*),(.*),...#');

If there is a variable length of attributes, then you should prefer two regexps (though it can be done in one). First get the .* remainder of each line, then split it afterwards.

edited May 23, 2017 at 12:04

CommunityBot

11 silver badge

answered Feb 16, 2011 at 16:02

mario

146k20 gold badges243 silver badges293 bronze badges

3 Comments

Jesse Brands Over a year ago

This got me off to a good start, not great, but not bad either. Unfortunately it seems I really have no choice to first parse the EVENT_TYPE, and on basis of that, start pulling other data. The answer is fine, the event log just isn't :(

mario Over a year ago

@Jesse: That's a common problem with text based lists. They are easy to write out, and suitable for human consumption; but automated evaluation is always an issue due to its flexibility.

Jesse Brands Over a year ago

I suppose I would have to write a small library for parsing this kind of log then. Would be kind of messy, as I'd have to use different kind of functions for each event, or massive if/else statements (or a switch). At least you got me started, thanks. :)

Cristian Radu · Accepted Answer · 2011-02-16 15:59:29Z

0

How about splitting the string by the ", " separator and putting everything in an array. That way you'll have a numeric index to check if a parameter exists or not.

answered Feb 16, 2011 at 15:59

Cristian Radu

8,4122 gold badges21 silver badges11 bronze badges

1 Comment

Jesse Brands Over a year ago

That was my initial assumption, but this doesn't work. For example, parameter 'AMOUNT' is the 16th parameter. But it isn't always there on the string, as there is no empty parts for things that are missing. Admittedly, the event log is a little crude, but I don't have an alternative.

Collectives™ on Stack Overflow

String Parsing in PHP

4 Answers 4

1 Comment

1 Comment

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related