1

I'm trying to get information from the experience and Education section.

For example of this linkedin profile:https://www.linkedin.com/in/kendra-tyson/

I want to get the information of all the experience section and education section.

For now I've been working on the Experience section. I want to get the container that has all the different jobs in the Experience sections I can iterate through the container and get the individual jobs (ie, Talent Acquisition & Human Resources Manager, Technical Recruiter)

I'm using find elements by xpath with selenium but it times out/doesn't find the xpath.

   experience = wait.until(EC.visibility_of_all_elements_located((By.XPATH, make_xpath_experience)))

The xpaths that I have tried are :

make_xpath_experience = "//div[@id='experience']/div[.//h2[text()='Experience']]//ul[contains(@class, 'pvs-list')]"
make_xpath_experience = "//section[@id='experience']//li[contains(@class, 'pvs-list__outer-container')]"

and I also tried CSS selector per this stackoverflow question with the updated information as the parameters used in that answer are no longer available: Linkedin Webscrape w Selenium

experience = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#experience . pvs-list__outer-container')))

I also tried following this geeksforgeeks tutorial with beautifulsoup (https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/) but the information is outdated and does not work.

How can I target the Experience section of the profile and then be able to extract the individual jobs and information (ie. full time, timeline, location)?

2
  • Can't you do this more easily using the API instead of web scraping? Commented Feb 22, 2023 at 23:42
  • I cannot find good enough documentation/examples to make the API work for me so I am using selenium and python. Commented Feb 23, 2023 at 2:12

1 Answer 1

1

The following code create a dictionary and populate it with jobs name, company, date, location and description.

from selenium.common.exceptions import NoSuchElementException
exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
    exp['job']     += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
    exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
    exp['date']    += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
    try:
        exp['location'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"]:nth-child(4) span').text]
    except NoSuchElementException:
        exp['location'] += ['*missing value*']
    try:
        exp['description'] += [job.find_element(By.CSS_SELECTOR, 'ul li ul span[aria-hidden=true]').text]
    except NoSuchElementException:
        exp['description'] += ['*missing value*']

import pandas as pd
pd.DataFrame(exp)

enter image description here

and if you want you can export the table in a csv file.

Update 3

Using javascript we can avoid using try except blocks. If location or description is missing, there will be None instead of *missing value*, as you can see from image below.

exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
    exp['job']     += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
    exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
    exp['date']    += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
    exp['location']    += [driver.execute_script('return arguments[0].querySelector("span[class=\'t-14 t-normal t-black--light\']:nth-child(4) span")?.innerText', job)]
    exp['description'] += [driver.execute_script('return arguments[0].querySelector("ul li ul span[aria-hidden=true]")?.innerText', job)]

enter image description here

The javascript command

driver.execute_script('return arguments[0].querySelector(css_selector)?.innerText', arg0)

works in this way

  • arguments[0] is a placeholder for arg0. In our case arg0 is a webelement.

  • arguments[0].querySelector(css_selector) searches for element given by css_selector inside the element arg0. If you replace arguments[0] with document, then it will search in all the HTML.

  • .innerText extract the text contained inside the node found by the querySelector

  • ?.innerText means that .innerText is executed only if querySelector finds something.

  • execute_script returns None if querySelector doesn't find anything, thanks to this we can avoid the try except block

Sign up to request clarification or add additional context in comments.

7 Comments

This works and it's what I'm looking for but could you explain [job.text.split('\n')[0] for job in jobs] further? Specifically the inline for loop. Also, for example, I understand that the [0] in that code says to give the first position of the job section but If I wanted to modify it so I can get for example the years in that job or the second position from the job section. How can I modify it to also get that information? Does the inline for loop make a difference? Would getting the second line of the job description simply be [job.text.split('\n')[1] for job in jobs] under first line?
@JesperEzra (1/2) That kind of loop is called "list comprehension", it is the one line equivalent of for job in jobs: job.text.split('\n')[0]. About getting the second line, yes in principle with [i] you get the i-th line, but it not always true, see for example the code I added in the answer. However, using .text.split('\n') is the worst thing one can do.
@JesperEzra (2/2) The good practice would be to select each element by a unique attribute, but linkedin doesn't use good class names, moreover elements are nested inside many other elements, so it's a bit annoying to find a good xpath. For example the company is inside a <span class="t-14 t-normal">, year and location are inside <span class="t-14 t-normal t-black--light"> and so on, so from class names you don't understand what's inside them. Anyway I will try to do it
@JesperEzra Look at code in update 2
This is what I was looking for thank you!! I have a question. Some profiles don't include a location or description. In order to account for that I include a try and except for those two to account for both cases. Is there a more efficient/better way to account for both cases where it might include location/description or it might not?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.