1

In supporting a legacy system, I'm faced with a field data collector that stores data in the following format:

# This is a comment <-beacuse it starts at the begining of the file
# This is a comment <- see above
# 1. Item one <- not a comment because it starts with 1.
# Description of Item 1 <- not a comment as it is after a line that starts with a number
data point 1
data point 2
data point etc
3 <-- represents number of data points under Item one

# 2. Item two <-- not a comment
# Description of item 2 <-- not a comment
data point 1
data point ..
data point 100
100
#3. Item three <--- not a comment
# Item three description
0

I'm not sure what is the correct way to parse for that file to include each Item as its own list. Note that sometimes but not always the data adds a random space between two different items.

What is the correct way to parse such a file?

2 Answers 2

1

I would do this in three steps:

  1. Remove all comments from the start of the file
  2. Split on a regular expression to find all the other comments in the file (see here for an example of how to split using a regular expression)
  3. Parse the remaining lines
Sign up to request clarification or add additional context in comments.

Comments

1

You could use REGEX and do a split by: ^(?=\# ?\d+\.)

Explained example here: http://regex101.com/r/gB3xD1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.