Python Regular expressions to return line starting with specific string

Question

i have this file (output.txt)

Username:traider

domain:domain.net

 
TECH-1366


Username:traider1

domain:domain.net

 
TECH-1367

I can get values after Username and domain

 traider,domain.net
 traider1,domain.net

but don't know how to get TECH-XXX

desired output:

traider,domain.net,TECH-1366
traider1,domain.net,TECH-1367

Code:

with open ("output.txt", "r") as myfile:
  data=myfile.read()

people = re.findall(r'\bUsername:(\S+)\s+domain:(\S+)\s', data)

for personinfo in people:
    print(','.join(personinfo))

I can return only [TECH] but it's incomplete and has brackets

tech =  re.findall(r'TECH-*', data)

Just finish your current pattern, \bUsername:(\S+)\s+domain:(\S+)\s+(.+) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 12, 2018 at 20:04
Tried it yet? Is it working? Or should there be an explicit check for TECH-\d+? Then replace .+ above with this. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 12, 2018 at 20:15
re.findall(r'\bUsername:(\S+)\s+domain:(\S+)\s+(TECH-\d+)', data) and got nothing need match for line starting with TECH — Milister
– Milister, Commented Apr 12, 2018 at 20:19
See rextester.com/AGF36233. What do you mean by got nothing need match for line starting with TECH? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 12, 2018 at 20:23
Do not add anything. Just update the regex in your current code. people = re.findall(r'\bUsername:(\S+)\s+domain:(\S+)\s+(TECH-\d+)', data) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 12, 2018 at 20:27

RishiG · Accepted Answer · 2018-04-12 20:06:31Z

1

Try

people = re.findall(r'\bUsername:(\S+)\s+domain:(\S+).*(TECH-\d+)', data)

answered Apr 12, 2018 at 20:06

RishiG

2,8301 gold badge17 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Campbell McDiarmid · Accepted Answer · 2018-04-12 21:17:24Z

0

This can be done by splitting the text into items, further splitting to obtain the useful text within each item, followed by some simple conditional formatting:

txt="""Username:traider

domain:domain.net


TECH-1366


Username:traider1

domain:domain.net


TECH-1367"""

out = ''
for item in txt.split():
    desired_value = item.split(':')[-1]
    out += desired_value
    if ':' in desired_value:
        out += ','           
    else:
        out += '\n'

Or using comprehension:

''.join('%s,' % item.split(':')[-1] if ':' in item else '%s\n' % item for item in txt.split())

Output:

traider,domain.net,TECH-1366
traider1,domain.net,TECH-1367

answered Apr 12, 2018 at 21:17

Campbell McDiarmid

4943 silver badges14 bronze badges

Comments

handle · Accepted Answer · 2018-04-13 17:51:14Z

0

You don't need a Regular Expression for this, you can use the built-in str.split() and then e.g. a List Comprehension to "bundle" your data:

txt="""Username:traider

domain:domain.net


TECH-1366


Username:traider1

domain:domain.net


TECH-1367"""

l = txt.split()

#udt = [ l[i:i + 3] for i in range(0, len(l), 3)]
# equivalent to list-comprehension above
udt = []
for i in range(0, len(l), 3):
    udt.append( l[i:i + 3] )

print(udt)

prints

[['Username:traider', 'domain:domain.net', 'TECH-1366'], ['Username:traider1', 'domain:domain.net', 'TECH-1367']]

To print that as desired:

for e in udt:
    print(",".join(map(lambda f:f.split(":")[-1], e)))

prints

traider,domain.net,TECH-1366
traider1,domain.net,TECH-1367

and combined

d = [e.split(":")[-1] for e in txt.split()]
for i in range(0, len(d), 3):
    print( ",".join(d[i:i+3]) )

edited Apr 13, 2018 at 17:51

answered Apr 12, 2018 at 20:28

handle

6,5004 gold badges63 silver badges93 bronze badges

6 Comments

Wiktor Stribiżew Over a year ago

It won't work if the input can contain blocks of text without TECH-\d+ pattern at its end.

handle Over a year ago

@Wiktor It won't work for a lot of cases, but that one is not part of the question, nor the sample data...

Milister Over a year ago

only get TECH-1366/67

handle Over a year ago

@Milister Can you be more specific? And please also provide proper sample data. Please see minimal reproducible example and update your question.

Milister Over a year ago

[[u'TECH-1366']] [[u'TECH-1367']] is all what i got, i posted sample file in question

|

Milister · Accepted Answer · 2018-04-17 11:46:28Z

0

Finally found why nothing above worked: it's because of ^M i had in file It's visible only when open it in vim, when open it using cat it's not visible,once removed it with

sys.stdout = open('out.txt','wt')
with open ("output.txt", "r") as myfile:
  data=myfile.read()
print data.replace('\r','')

and using @Wiktor Stribiżew code:

people = re.findall(r'\bUsername:(\S+)\s+domain:(\S+)\s+First Name:(\S+)\s+Last Name:(\S+)\s+(TECH-\d+)', data)

i got desired results, thanks everyone !!

answered Apr 17, 2018 at 11:46

Milister

6581 gold badge17 silver badges35 bronze badges

Collectives™ on Stack Overflow

Python Regular expressions to return line starting with specific string

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related