0

I extract some data from a html page. My output is:

 0
0
0
0

131997
https://www.google.com.ar/
google.com.ar
 0
0
0
0

134930
https://www.a-a.com/
a-a.com

And I'm looking for this kind of output:

[['0','0','0','0','131997','https://www.google.com.ar/','google.com.ar'],['0','0','0','0','134930','https://www.a-a.com/','a-a.com']]

Here is my python code:

sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
sitios2 = sitios.text
html = sitios2
soup = BeautifulSoup(html, 'lxml') #add the 'lxml' parser
for item in soup.find_all(['nombre', 'url', 'sitio_id', 'ultimas24hrs']):
   a = item.text + ','
   print a

3 Answers 3

1

This can be done in two lines using List comprehensions.

Now you have a string as:

string = '''
 0
0
0
0

131997
https://www.google.com.ar/
google.com.ar
 0
0
0
0

134930
https://www.a-a.com/
a-a.com'''

parts = [i for i in string.replace('\n',',').split(',') if i]
list_of_links = [parts[i:i+7] for i in range(0,len(parts),7)]
print(list_of_links)
[['0', '0', '0', '0', '131997', 'https://www.google.com.ar/', 'google.com.ar'], [' 0', '0', '0', '0', '134930', 'https://www.a-a.com/', 'a-a.com']]

Though it seems that this solution may be confusing to you but still it demonstrates that your problem can be solved in two lines as well.

Read this for details on what those above lines did.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks bro! It is all I need!
1

You can try something like this:

data = ['0','0','0','0','131997','https://www.google.com.ar/','google.com.ar','0','0','0','0','134930','https://www.a-a.com/','a-a.com']
a = []
count = 1
b = []
for item in data:
    if count == 7:
        a.append(b)
        count = 1
        b = []
    else:
        b.append(item)
        count = count + 1       
print(a)

Comments

0
a = []
sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
html = sitios.text
soup = BeautifulSoup(html, 'lxml') #add the 'lxml' parser
for item in soup.find_all(['nombre', 'url', 'sitio_id', 'ultimas24hrs']):
    a.append(item.text.split('\n'))

1 Comment

Please add some explanation. Code only answers leave an impression that SO is a code-writing service.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.