2

I have one problem with regexp and match text file, I'm new into python. My file looks like :

epg_slo3.txt:10346224:        Service_ID: 1 (0x0001)  [=  --> refers to PMT program_number]
epg_slo3.txt:10346236:            Start_time: 0xdce0112500 [= 2013-09-09 11:25:00 (UTC)]
epg_slo3.txt:10346237:            Duration: 0x0001000 [=  00:10:00 (UTC)]
epg_slo3.txt:10346246:                  event_name: "..©port"  -- Charset: ISO/IEC 8859  special table

What i need to to, I need something like this:

Service_ID: 1 (0x0001)  [=  --> refers to PMT program_number]: --> Program 1
Start_time: 0xdce0112500 [= 2013-09-09 11:25:00 (UTC)]: --> Start 2013-09-09 11:25:00 (UTC)
Duration: 0x0001000 [=  00:10:00 (UTC)] --> Duration 00:10:00 (UTC)
event_name: "..©port"  -- Charset: ISO/IEC 8859  --> Category ©port

My code looks like:

#!/usr/bin/python
import codecs
import re

BLOCKSIZE = 1048576

with codecs.open('epg_slo10.txt', "r", "iso-8859-2") as sourceFile:
    with codecs.open('epg_slo.txt', "w", "utf-8") as targetFile:
        while True:
            contents = sourceFile.read(BLOCKSIZE)
            if not contents:
                break
            targetFile.write(contents)


input_file  = open('epg_slo.txt', "r")
output_file = open('epg_slo_kategorije.txt', "w")

for line in input_file:
    line = line.replace("Service_ID:","Program")
    line = line.replace("Start_time:","Start")
    line = line.replace("event_name:","Title")
    output_file.write(line)

Can you help me with this,

thx for reading. BR!

2
  • Are you trying just to get rid of all the epg_slo3.txt:10346224: like blocks? Commented Sep 11, 2013 at 7:14
  • Yes i wanna get rid of them all from file. Commented Sep 11, 2013 at 7:37

2 Answers 2

1

Before line = line.replace in your code, add this line:

line = re.sub(r'^epg_slo3.txt:\d{8}:\s*','', line)

eg.
If

line = "epg_slo3.txt:10346224:        Service_ID: 1 (0x0001)  [=  --> refers to PMT program_number]"

then after calling re.sub:

line = "Service_ID: 1 (0x0001)  [=  --> refers to PMT program_number]"
Sign up to request clarification or add additional context in comments.

Comments

1

replace the regex given below with empty string ""

/^epg_slo3.txt:\d{8}:\s*/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.