0

I need some help with regexp in Python. I have string such as:

17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;127.0.0.1 2013-10-19
17:22:32;http://example2.com;example2.com;127.0.0.1 2013-10-19 
20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com;127.0.0.1 2013-10-19

How can I get this list?

['http://example1.com/viewtopic.php?f=8&t=189', 'http://example2.com', 'http://example3.com/threads/example-text-in-url.27304/']
0

3 Answers 3

3

You don't need regex here, use a csv parser.

Assuming your data is in a file called data.csv:

import csv
reader = csv.reader(open("data.csv"), delimiter=";")
referers = [line[1] for line in reader]
Sign up to request clarification or add additional context in comments.

Comments

1

I'm going to give a Regex solution since that is what you asked for. Basically, all you need to do is capture text between http:// and ;. Below is a demonstration:

from re import findall

mystr = """
17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;127.0.0.1 2013-10-19
17:22:32;http://example2.com;example2.com;127.0.0.1 2013-10-19 
20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com;127.0.0.1  2013-10-19
"""

print findall("(http://.+?);", mystr)

Output:

['http://example1.com/viewtopic.php?f=8&t=189', 'http://example2.com', 'http://example3.com/threads/example-text-in-url.27304/']

Comments

1

just try this. maybe it fit your needs :)

Regex

/^(.*;)/gm

String

17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;127.0.0.1 2013-10-19
17:22:32;http://example2.com;example2.com;127.0.0.1 2013-10-19 
20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com;127.0.0.1 2013-10-19

Matches

1.  [0-66]    `17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;`
2.  [87-129]  `17:22:32;http://example2.com;example2.com;`
3.  [151-228] `20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com

3 Comments

While a link to a tester is handy, it's a good idea to put your regex in there too. Links expire, for example, and if the link breaks your answer won't be useful anymore.
The {1} is completely redundant.
yep.. sorry for this ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.