How to combine string | REGEX

Question

import re

def tst():
  text = '''
  <script>
  '''
  if proxi := re.findall(r"(?:<td\s[^>]*?><font\sclass\=spy14>(.*?)<script.*?\"\+(.*?)\)<\/script)", text):
    for proxy, port in proxi:
      yield f"{proxy}:{''.join(port)}"
    
    if dtt := re.findall(r"<td colspan=1><font class\=spy1><font class\=spy14>(.*?)</font> (\d+[:]\d+) <font class\=spy5>([(]\d+ \w+ \w+[)])", text):
      for date, time, taken in dtt:
        yield f"{date} {' '.join([time, taken])}"
   
    return None
  return None

for proxy in tst():
  print(proxy)

output that i get

51.155.10.0:8000
178.128.96.80:7497
98.162.96.41:4145
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)

so i use this regex below to capture group from output

(\w+[.]\w+[.]\w+[.]\w+[:]\w+)|(\w+.*)

i want the result like this, how to combine it from output?

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

What is dynamic about your input? The number of lines? The order? What is the common pattern to all possible inputs? — trincot
– trincot, Commented Oct 29, 2022 at 7:59
Sorry for my bad English. I don't know how to explain it in English. But if you can take a look at the full code, Maybe it can answer your question. — xnoob
– xnoob, Commented Oct 29, 2022 at 11:12
A question should have all necessary information to understand the question -- it should not be behind a link. — trincot
– trincot, Commented Oct 29, 2022 at 11:24

trincot · Accepted Answer · 2022-10-29 11:30:09Z

1

Assuming that the code at the top of your (edited) question has regular expressions that work perfectly, and they run the same number of matches, you could use zip:

import re

def tst():
    text = '''
    <script>
    '''
    proxi = re.findall(r"(?:<td\s[^>]*?><font\sclass\=spy14>(.*?)<script.*?\"\+(.*?)\)<\/script)", text)
    dtt = re.findall(r"<td colspan=1><font class\=spy1><font class\=spy14>(.*?)</font> (\d+[:]\d+) <font class\=spy5>([(]\d+ \w+ \w+[)])", text)
    if proxi and dtt:
        for (proxy, port), (date, time, taken) in zip(proxi, dtt):
            yield f"{proxy}:{''.join(port)} {date} {' '.join([time, taken])}"
   
for proxy in tst():
    print(proxy)

answered Oct 29, 2022 at 11:30

trincot

357k38 gold badges282 silver badges339 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim Biegeleisen · Accepted Answer · 2022-10-29 07:55:26Z

1

This approach reads all lines into a list, then iterates the IP lines and date lines in tandem to generate the output.

text = '''157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)'''
lines = text.split('\n')
output = []
for i in range(0, len(lines) / 2):
    val = lines[i] + ' - ' + lines[i + len(lines)/2]
    output.append(val)

print('\n'.join(output))

This prints:

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

Note that this answer assumes each IP line would always have exactly one matching date line. It also assumes that the lines are ordered, and that all IP lines come before the date lines.

answered Oct 29, 2022 at 7:55

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

2 Comments

xnoob Over a year ago

I just edit posted. That the question.

Tim Biegeleisen Over a year ago

@xnoob The preface of how you end up with the lines doesn't affect the validity of my answer.

Ramesh · Accepted Answer · 2022-10-29 08:21:45Z

1

using regex

import re

text = '''
157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)
'''
ip_regex = r"(?:\d{1,3}\.){3}\d{1,3}\:\d{4}"
time_regex = r'\d{2}\-\w+\-\d{4}\s\d{2}\:\d{2}\s\(.+\)'

ip_list = re.findall(ip_regex, text)
time_list = re.findall(time_regex, text)

for i in range(len(ip_list)):
    print(f'{ip_list[i]} - {time_list[i]}')


>>> 157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
>>> 184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
>>> 202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

answered Oct 29, 2022 at 8:21

Ramesh

5854 silver badges20 bronze badges

Comments

jackal · Accepted Answer · 2022-10-29 08:05:41Z

0

Providing the text is guaranteed to contain N lines of IP addresses followed by N lines of "timestamps" then you could do this:

text = '''157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)'''

lines = text.splitlines()

for ip, t in zip(lines, lines[len(lines)//2:]):
    print(f'{ip} - {t}')

Output:

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

answered Oct 29, 2022 at 8:05

jackal

29.1k3 gold badges10 silver badges28 bronze badges

1 Comment

xnoob Over a year ago

i just edit posted. That the question.

Collectives™ on Stack Overflow

How to combine string | REGEX

4 Answers 4

Comments

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related