0
import requests
from bs4 import BeautifulSoup
import pandas as pd
start_url="https://www.indeed.co.in/jobs?q=teacher&l=India"
page_data=requests.get(start_url)
soup=BeautifulSoup(page_data.content,"html.parser")
Title=[]
Company=[]
Salary=[]
Summary=[]
Location=[]
link_to_apply=[]

for job_tag in soup.find_all("div",class_="jobsearch-SerpJobCard unifiedRow row result"):
    title=job_tag.find("h2",class_="title")
    company=job_tag.find("span",class_="company")
    location=job_tag.find(class_="location accessible-contrast-color-location").text.strip()
    summary=job_tag.find("div",class_="summary")
    link=job_tag.find("a",href=True)
    base_url="https://www.indeed.com"
    final_link=base_url+link["href"]

    Title.append(title.text.strip())
    Company.append(company.text.strip())
    Location.append(location)
    Summary.append(summary.text.strip())
    link_to_apply.append(final_link)


data=list(zip(Title,Company,Location,Summary))
p=pd.DataFrame(data,columns=["Title","Company","Location","Summary"])
p.to_csv("new.csv")
pd.set_option("display.max_colwidth",None)
pd.set_option("display.max_rows",None)
pd.set_option("display.max_rows",None)
pd.set_option("display.width",None)
pd.set_option('display.max_columns',None)
the output of the following code

                                                                 Title  \
0                                              all Subject Teacher\nnew   
1                                           Part time Teacher / Trainer   
2   Online Tutor / Teachers - Women Candidates only - Non Metro...\nnew   
3                                          Science & Maths Teacher\nnew   
4                                                Primary School Teacher   
5                                                Preschool Teacher\nnew   
6       Online Tutor / Teachers - Women Candidates only- Across Indi...   
7                                                         Maths Teacher   
8                               wanted faculty for teaching cbse school   
9                                               Primary English Teacher   
10                                                 Training Facilitator   
11                                                         Head Teacher   
12                                                    Math Teacher\nnew   
13                                                     Teacher for Kids   
14                                            Preschool Teacher - Vizag   

                                                  Company  \
0                                               Home Guru   
1                                               Home Guru   
2                 Whitehat Education Technology Pvt. Ltd.   
3                                               Home Guru   
4                       Newdimension International School   
5                                               Home Guru   
6                 Whitehat Education Technology Pvt. Ltd.   
7                            Dheeraj International School   
8   TILFORD SCHOOL,JANGAREDDIGUDEM,W.G.DIST,ANDHRA PRA...   
9                                       Stones2Milestones   
10                                      Stones2Milestones   
11                                           Wunderschool   
12                             GEMS Public School,Patiala   
13                                            ANAR EdTech   
14                                        Koala Preschool   

                                  Location  \
0                                   Remote   
1                                   Remote   
2                            Kochi, Kerala   
3                                   Remote   
4                  Bhongir, Andhra Pradesh   
5                      Mumbai, Maharashtra   
6                     Bengaluru, Karnataka   
7                        Pune, Maharashtra   
8   Jangareddi Gudem Bazar, Andhra Pradesh   
9                         Gurgaon, Haryana   
10                        Gurgaon, Haryana   
11                  Chandigarh, Chandigarh   
12                Univ P O Patiala, Punjab   
13                           Kochi, Kerala   
14           Visakhapatnam, Andhra Pradesh   

                                                                                                                                                                Summary  
0    We are recognized leaders in one to one coaching of all subjects & Courses, both online and offline.\nStudents and Gurus from all across India are invited to get…  
1    We are recognized leaders in one to one coaching of all subjects & Courses, both online and offline.\nStudents and Gurus from all across India are invited to get…  
2              Teacher, Work from Home, Online Tutor, Teaching, Home Tutor, coding teacher, computer teacher, home teacher, kids teacher.\nYou've found your dream job.  
3    We are recognized leaders in one to one coaching of all subjects & Courses, both online and offline.\nStudents and Gurus from all across India are invited to get…  
4      Proven experience as a teacher.\nCollaborate with other teachers, parents and stakeholders and participate in regular meetings.\nTotal work: 1 year (Preferred).  
5     We Provide Virtual recorded sessions of Preschool curriculum,.\nHave to record Virtual Classes of preschool curriculum.\nMust have preschool teaching experience.  
6              Teacher, Work from Home, Online Tutor, Teaching, Home Tutor, coding teacher, computer teacher, home teacher, kids teacher.\nYou've found your dream job.  
7                       We are looking for qualified and experienced candidates who can join immediately for the following posts at Dheeraj International School, Pune.  
8                                    WANTED FACULTY FOR TEACHING @CBSE SCHOOL.\nSalary not constraint for deserving candidates. *.\nSalary: Up to ₹50,000.00 per month.  
9        We are looking to bring into our fold an incredible teacher/facilitator to conduct online classes for children of ages 6-10.\nWorking with: Consumer App Team.  
10      We are looking to bring into our fold an incredible teacher/facilitator to conduct online classes for children of ages 6-10.\nWorking with: Consumer App Team*.  
11       The Head teacher shall influence the thinking and practice of students, teachers and parents.\nPlanning, Documentation , Evaluation, Mentorship, Guidance and…  
12  Welcome to Gems Public School, Patiala...\nGEMS is an international education company.\nIt is a global advisory and educational management firm, with a network of…  
13     We are* looking for dynamic and vibrant lady tutors with a passion for inspiring our kids to flourish and reach their potentials.\nWork at your convenient time.  
14    Our growing Preschool facility is looking for a Young energetic Female Preschool Teacher who can help to create a fun environment and incorporate educational…  

i want the ouput to be in a proper tabular column.Also the table needs to be saved to a CSV FILE thats the reason why i choose pandas.please reply fast and thanks guys anybody know why i am not geting proper dataframe? of pandas as output is there any other way of doing so? also in created csv file the output is coming properly but in python its not

3
  • also see that some of the data is truncated ie its coming as dots tell me a way to fix that also Commented Sep 4, 2020 at 20:36
  • Pandas is not intended for fancy output. Open the CSV file in Excel and make it look the way it suits you. Commented Sep 4, 2020 at 20:38
  • @DYZ see the data is coming properly in csv file as a proper tabular columns but when i display in python it its not coming, and pandas has problem for output only in normal python idle ..i am using jupyter notebook! Commented Sep 5, 2020 at 7:10

1 Answer 1

2

For the most part, the current code seems to work. If you remove the newlines, that may help. As for the '...', the data is truncated on the website. You will need a web automation tool like Selenium to click each link. You can use tabulate to format the output table in the console.

Here is the code with newlines removed and indents fixed.

import requests
from bs4 import BeautifulSoup
import pandas as pd

start_url="https://www.indeed.co.in/jobs?q=teacher&l=India"
page_data=requests.get(start_url)
soup=BeautifulSoup(page_data.content,"html.parser")
Title=[]
Company=[]
Salary=[]
Summary=[]
Location=[]
link_to_apply=[]

for job_tag in soup.find_all("div",class_="jobsearch-SerpJobCard unifiedRow row result"):
    title=job_tag.find("h2",class_="title")
    company=job_tag.find("span",class_="company")
    location=job_tag.find(class_="location accessible-contrast-color-location").text.strip()
    summary=job_tag.find("div",class_="summary")
    link=job_tag.find("a",href=True)
    base_url="https://www.indeed.com"
    final_link=base_url+link["href"]

    Title.append(title.text.replace('\n'," ").strip())
    Company.append(company.text.replace('\n'," ").strip())
    Location.append(location.replace('\n'," "))
    Summary.append(summary.text.replace('\n'," ").strip())
    link_to_apply.append(final_link.replace('\n'," "))
    
data=list(zip(Title,Company,Location,Summary))
p=pd.DataFrame(data,columns=["Title","Company","Location","Summary"])
p.to_csv("new.csv", index=False)
pd.set_option("display.max_colwidth",None)
pd.set_option("display.max_rows",None)
pd.set_option("display.max_rows",None)
pd.set_option("display.width",None)
pd.set_option('display.max_columns',None)
pd.options.display.max_colwidth = None
print(p.to_string(index=False))

# from tabulate import tabulate
# print(tabulate(p, headers='keys', tablefmt='psql'))  # terminal
#p   # Jupyter only, direct output
print(formatdf(p, 25))   # Idle

Output (Jupyter, truncated)

Jupyter

For Idle, I had to write a function to format the data. This will wrap the column data in the dataframe based on a maximum column width.

def formatdf(df, mxcolwidth):
    outstr = ''
    for c in df.columns:
        df[c] = df[c].str.wrap(mxcolwidth)  # insert newlines
        
    # get max width for each column
    wdic = {}
    for c in df.columns:
       s = df[c]
       mx = 0
       for r in s:
           for ln in r.split('\n'):
               if len(ln) > mx:
                   mx = len(ln)
       wdic[c] = mx  # dictionary, max line length of each column

    # create row divider string 
    rowstr = ''
    for c in df.columns:
        rowstr += '-' * (wdic[c])
    rowstr += '-' * (len(df.columns)*3) + '-\n'


    outstr += '| '  # start row line
    # column headers
    for c in df.columns:
        outstr += c.ljust(wdic[c]) + ' | '
    outstr += '\n'
    outstr += rowstr
    
    # each row in dataframe
    for ir, r in df.iterrows():
        mxln = 0
        for c in df.columns:  # get mxx lines for this data row
            lncnt = len(r[c].split('\n'))
            if lncnt > mxln: mxln = lncnt
            
        for i in range(mxln): # for each line in data cell
            for ic, c in enumerate(df.columns):  # each column
                if ic == 0: outstr += '| '   # left border of table
                lns = r[c].split('\n')  # split, each line of text in data cell
                if i < len(lns):
                    outstr += lns[i].ljust(wdic[c]) + ' | '  # single line of text
                else:
                    outstr += " ".ljust(wdic[c]) + ' | '   # empty line
            outstr+= "\n"
        outstr += rowstr  # row divider
              
    return outstr

Output (Idle, truncated)

| Title                     | Company                   | Location                 | Summary                   | 
----------------------------------------------------------------------------------------------------------------
| Online Tutor / Teachers - | Whitehat Education        | Srinagar, Jammu and      | Teacher, Work from Home,  | 
| Women Candidates only -   | Technology Pvt. Ltd.      | Kashmir                  | Online Tutor, Teaching,   | 
| Non Metro... new          |                           |                          | Home Tutor, coding        | 
|                           |                           |                          | teacher, computer         | 
|                           |                           |                          | teacher, home teacher,    | 
|                           |                           |                          | kids teacher. You've      | 
|                           |                           |                          | found your dream job.     | 
----------------------------------------------------------------------------------------------------------------
| Post Graduate Teacher and | North Eastern Railway     | Gorakhpur, Uttar Pradesh | North Eastern Railway     | 
| Trained Graduate Teacher  |                           |                          | Recruitment 2020 - Post   | 
| new                       |                           |                          | Graduate Teacher and      | 
|                           |                           |                          | Trained Graduate Teacher  | 
|                           |                           |                          | Vacancies - Apply         | 
|                           |                           |                          | NowNorth Eastern Railway  | 
|                           |                           |                          | Recruitment 2020-21:…     | 
----------------------------------------------------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

12 Comments

hey thanks the for your response ..1 quetsion though why cant i view the entire table that was printed using tabulate in both my python idle and jupyter notebook .but when i just copied the code to reply to your answer in stackoverflow it was shoing proper table with scrollbars right and below .why didnt it come in the jupyter notebook then ?
did you try the output in jupyter notebook or some other way..
sorry for the typos in the comment !!
If I just add p to the bottom of the script in jupyter, it outputs nicely.
yep it works. thank you !!...is there a way for the tabulate to also work proper way in jupyter notebook just asking you know
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.