3

I am new to web development and scraping in general and I am trying to challenge myself by scrape websites like LinkedIn. Since they have embers and dynamically changing ids it is a bit more struggle to scrape properly.

I am trying to scrape the "experience section" of a LinkedIn profile by looking using the following code:

experience = driver.find_element_by_xpath('//section[@id = "experience-section"]/ul/li[@class="position"]')

the driver got the entire Linkedin profile webpage. I would like to have all the position under the "experience-section". The error message is:

Unable to locate element: {"method":"xpath","selector":"//section[@id = "experience-section"]/ul/li/div[@class="position"]"}

I am able to scrape other stuff on Linkedin, but the experience section is a big struggle for me. Is the xpath wrong? if yes, what could I change?

Thank you

<section id="experience-section" class="pv-profile-section experience-section ember-view"><header class="pv-profile-section__card-header">
  <h2 class="pv-profile-section__card-heading t-20 t-black t-normal">
    Experience
  </h2>

<!----></header>

<ul id="ember1620" class="pv-profile-section__section-info section-info pv-profile-section__section-info--has-no-more ember-view"><li id="ember1622" class="pv-profile-section__sortable-item pv-profile-section__section-info-item relative pv-profile-section__list-item sortable-item ember-view"><div id="ember1623" class="pv-entity__position-group-pager ember-view">            <li id="392598211" class="pv-profile-section__sortable-card-item pv-profile-section pv-position-entity ember-view"><!----><a data-control-name="background_details_company" href="/company/8736/" id="ember1626" class="ember-view">      <div class="pv-entity__logo company-logo">
  <img class="lazy-image pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 loaded" alt="Bill &amp; Melinda Gates Foundation" src="https://media.licdn.com/dms/image/C560BAQHvFIyUvuKtQA/company-logo_400_400/0?e=1556755200&amp;v=beta&amp;t=Qhh8_KnrE-OiuXAutFyeI69tgUF3c1ptC9N12siDO4o">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
  <h3 class="t-16 t-black t-bold">Co-chair</h3>

  <h4 class="t-16 t-black t-normal">
    <span class="visually-hidden">Company Name</span>
    <span class="pv-entity__secondary-title">Bill &amp; Melinda Gates Foundation</span>
  </h4>

    <div class="display-flex">
    <h4 class="pv-entity__date-range t-14 t-black--light t-normal">
      <span class="visually-hidden">Dates Employed</span>
      <span>2000 – Present</span>
    </h4>
      <h4 class="t-14 t-black--light t-normal">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item-v2">19 yrs</span>
      </h4>
  </div>

<!---->
</div>

</a>
<!---->
</li>


</div>
</li><li id="ember1630" class="pv-profile-section__sortable-item pv-profile-section__section-info-item relative pv-profile-section__list-item sortable-item ember-view"><div id="ember1631" class="pv-entity__position-group-pager ember-view">            <li id="392599749" class="pv-profile-section__sortable-card-item pv-profile-section pv-position-entity ember-view"><!----><a data-control-name="background_details_company" href="/company/1035/" id="ember1634" class="ember-view">      <div class="pv-entity__logo company-logo">
  <img class="lazy-image pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 loaded" alt="Microsoft" src="https://media.licdn.com/dms/image/C4D0BAQEko6uLz7XylA/company-logo_400_400/0?e=1556755200&amp;v=beta&amp;t=XQhwV5ruWfGBfjgQylV9gkeXD8VnQRBHGd1bOfTs2tw">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
  <h3 class="t-16 t-black t-bold">Co-founder</h3>

  <h4 class="t-16 t-black t-normal">
    <span class="visually-hidden">Company Name</span>
    <span class="pv-entity__secondary-title">Microsoft</span>
  </h4>

    <div class="display-flex">
    <h4 class="pv-entity__date-range t-14 t-black--light t-normal">
      <span class="visually-hidden">Dates Employed</span>
      <span>1975 – Present</span>
    </h4>
      <h4 class="t-14 t-black--light t-normal">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item-v2">44 yrs</span>
      </h4>
  </div>

<!---->
</div>

</a>
<!---->
</li>


</div>
</li>
</ul>
<!----></section>

---- Update: I used the solution provided by Sers

driver.get('https://www.linkedin.com/in/williamhgates/')
experience = driver.find_elements_by_xpath('//section[@id = "experience-section"]/ul//li')
for item in experience:
    print(item.text)
    print("")

and I somehow get the results twice:

Co-chair
Company Name
Bill & Melinda Gates Foundation
Dates Employed
2000 – Present
Employment Duration
19 yrs

Co-chair
Company Name
Bill & Melinda Gates Foundation
Dates Employed
2000 – Present
Employment Duration
19 yrs

Co-founder
Company Name
Microsoft
Dates Employed
1975 – Present
Employment Duration
44 yrs

Co-founder
Company Name
Microsoft
Dates Employed
1975 – Present
Employment Duration
44 yrs

3
  • 1
    Can your please post the html where you are getting problem. Commented Jan 27, 2019 at 20:24
  • added the html code Commented Jan 27, 2019 at 20:32
  • I don't see any elements with @class value equal to "position". Which node are you targeting? Did you mean to test whether it contains "position" in the @class? Commented Jan 28, 2019 at 1:18

1 Answer 1

1

The problem in you xpath is li not directly under ul, try xpath below:

//section[@id = "experience-section"]/ul//li

Update

driver.get('https://www.linkedin.com/in/williamhgates/')
experience = driver.find_elements_css_selector('#experience-section .pv-profile-section')
for item in experience:
    print(item.text)
    print("")
Sign up to request clarification or add additional context in comments.

7 Comments

I knew it was something basic. It worked! Thank you very much. However, if I use it the following way: experience = driver.find_elements_by_xpath('//section[@id = "experience-section"]/ul//li') I get duplicates when I print the results: for item in experience: print(item.text)
I got the Co-Chair role twice in my results. What could be the problem?
@meecrob share you code. Also feel free to accept the answer
I did update the original post. I ll mark your answer as solved
@meecrob check my update, I changed xpath to css selector.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.