0

I'm scraping a website and I was able to reduce a variable called "gender" to this :

[<span style="text-decoration: none;">
                        Lass Christian, du Danemark, à Yverdon-les-Bains, avec 200 parts de CHF 100
                    </span>, <span style="text-decoration: none;">associé gérant </span>]

And now I'd like to have only "associé" in the variable but I can't find a way to split this html code.

The reason is that I want to know if it's "associé" (male) or "associée" (female).

does anyone have any ideas ?

Cheers

----- edit ---- here my code which gets me the html output

url = "http://www.rc2.vd.ch/registres/hrcintapp-pub/companyReport.action?rcentId=5947621600000055031025&lang=FR&showHeader=false"

r = requests.get(url)
soup = BeautifulSoup(r.content,"lxml")
table = soup.select_one("#adm").find_next("table") #select_one finds only the first tag that matches a selector:
table2 = soup.select_one("#adm").find_all_next("table")


output = table.select("td span[style^=text-decoration:]", limit=2)  #.text.split(",", 1)[0].strip()

print(output)

2
  • Please show what code produced this output? Thanks. Commented Sep 15, 2016 at 20:25
  • yes sure I edit right now Commented Sep 15, 2016 at 21:06

1 Answer 1

1

Whatever the parent of the two elements is you can call span:nth-of-type(2) to get the second span, then just check the text:

html = """<span style="text-decoration: none;">
                        Lass Christian, du Danemark, à Yverdon-les-Bains, avec 200 parts de CHF 100
                    </span>
           <span style="text-decoration: none;">associé gérant </span>"""

soup = BeautifulSoup(html)

text = soup.select_one("span:nth-of-type(2)").text

Or if it not always the second span you can search for the span by the partial text associé:

import re
text = soup.find("span", text=re.compile(ur"associé")).text

For your edit, all you need is to extract the text last element and use .split(None, 1)[1] to get the gender:

text = table.select("td span[style^=text-decoration:]", limit=2)[-1].text
gender = text.split(None, 1)[1] # > gérant 
Sign up to request clarification or add additional context in comments.

2 Comments

it gives me an error : TypeError: expected string or buffer
@J.jaques, . Nothing in my code would do that if you used it correctly. What exactly are you passing?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.