0

I have a text file that takes the form of:

first thing:    content 1
second thing:   content 2
third thing:    content 3
fourth thing:   content 4

This pattern repeats throughout the entire text file. However, sometimes one of the rows is completely gone like so:

first thing:    content 1
second thing:   content 2
fourth thing:   content 4

How could I search the document for these missing rows and just add it back with a value of "NA" or some filler to produce a new text file like this:

# 'third thing' was not there, so re-adding it with NA as content
first thing:    content 1
second thing:   content 2
third thing:    NA 
fourth thing:   content 4

Current code boilerplate:

with open('original.txt, 'r') as in:
    with open('output.txt', 'wb') as out:
        #Search file for pattern (Maybe regex?)
        #If pattern does not exist, add the line

Thanks for any help you all can offer!

6
  • is there an identifier for the lines that we can detect the missing lines? Commented Mar 23, 2016 at 11:46
  • Unfortunately no, in this file any row could be missing so I will have to account for that, but what I can tell you that the format of the text files the same, in that there are a block of 4, or less, rows, and then a blank line in between them. This pattern lasts anywhere from 5-50 times. Thanks. Commented Mar 23, 2016 at 11:49
  • Is there a delimiter between the blocks? If not, line 1, 2, 3, 4 could really be two blocks: line 1 & 2 with missing 3 & 4 plus missing 1 & 2 followed by line 3 & 4... Commented Mar 23, 2016 at 11:51
  • In raw text it would look like line1\n line2\n line3\n line4\n \n line 1 \n line 2 \n line 4 \n, etc. Commented Mar 23, 2016 at 11:55
  • I don't know if that's an answer to my question (include @Username in answers ;), but does that mean that there's a blank line between the blocks? (the extra \n between the blocks) Commented Mar 23, 2016 at 12:03

2 Answers 2

1

You must look for 1-3 lines (less than 4) followed by newline:

^\n([^\n]*\n){1,3}\n

Demo: https://regex101.com/r/rL3eA5/2

Sign up to request clarification or add additional context in comments.

Comments

1

This isn't pretty, but it works. Here's a regex to detect where lines are missing:

(?:^|\n)(second thing:\s*[^\n]+\n)|(first thing:\s*[^\n]+\n(?!second thing:))|(second thing:\s*[^\n]+\n(?!third thing:))|(third thing:\s*[^\n]+\n(?!fourth thing:))|(third thing:\s*[^\n]+\n\n)

regex101 demo here

Notice the Single Line flag.

When you've got a match, check which match group that matches. If it's the first one, the first line is missing. If it's the second one, the second line is missing and so on for third and fourth.

Here's an example how to replace if the 1'st group got a match.

Here's an example how to replace if the 3'rd group got a match.

Here's an example how to replace if the 4'rd group got a match.

You'll probably have to do some tweaking, but it should get you on your way ;)

Regards.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.