2

Hi guys I am trying to scrap/crawl this json based site using scrapy/Beautifulsoup

https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb

I have write this below code to run read/fetch the json from website:

website_text = response.body.decode("utf-8")
jobs_soup = BeautifulSoup(website_text.replace("<", " <"), "html.parser")
script_tag = jobs_soup.find('script', {"type": 'application/ld+json'}).text
data = json.loads(script_tag, strict=False)

But it will arise this error again and again:

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If anyone knows please help me it will be very helpful for me

2
  • script_tag is not JSON. Check its value! Commented Apr 10, 2022 at 20:40
  • @KlausD. I have change my code to this data = json.loads(response.xpath('//script[@type="application/ld+json"]')).get() But now will arises this error TypeError: the JSON object must be str, bytes or bytearray, not SelectorList Commented Apr 10, 2022 at 21:22

1 Answer 1

3

The json that is located inside <script> isn't valid, so json by default cannot decode it. Quick-and-dirty fix is replace the "description": by re.sub (also, use html5lib as BeautifulSoup parser):

import re
import json
import requests
from bs4 import BeautifulSoup


url = "https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb"
soup = BeautifulSoup(requests.get(url).content, "html5lib")

data = soup.select_one('script[type="application/ld+json"]').contents[0]

# fix "broken" description
data = re.sub(
    r'(?<="description" : )"(.*?)"(?=,\s+")',
    lambda g: json.dumps(g.group(1)),
    data,
    flags=re.S,
)

data = json.loads(data)

print(json.dumps(data, indent=4))

Prints:

{
    "@context": "http://schema.org/",
    "@type": "JobPosting",
    "title": "angular-developer",
    "description": "<p>Designing and developing user interfaces using Angular best practices\n</p><p>\n</p><p>Adapting interface for modern internet applications using the latest front-end technologies\n</p><p>\n</p><p>Developing product analysis tasks and optimizing the user experience\n</p><p>\n</p><p>Proficiency in Angular, HTML, CSS, and JavaScript for rapid prototyping.\n</p><p>\n</p><p>Integration of APIs and RESTful Services.\n</p><p>\n</p><p>Creating Maintaining Mobile and Website Responsive Design and Mobile website.\n</p><p>\n</p><p>Developing Across Browsers\n</p><p>\n</p><p>Creating tools that improve site interaction regardless of the browser.\n</p><p>\n</p><p>Managing software workflow.\n</p><p>\n</p><p>Following SEO best practices Fixing bugs and testing for usability\n</p><p>\n</p><p>Conducting performance tests\n</p><p>\n</p><p>Consulting with the design team\n</p><p>\n</p><p>Ensuring high performance of applications and providing support\n</p><p>\n</p><p>\n</p><p>Job Requirements:\n</p><p>\n</p><p>\n</p><p>Expert knowledge of HTML5, CSS3\n</p><p>\n</p><p>Strong knowledge of JavaScript\n</p><p>\n</p><p>Experience in JS frameworks Angular\n</p><p>\n</p><p>Familiarity with Software version control systems e.g., Git\n</p><p>\n</p><p>Experience in Node.js\n</p><p>\n</p><p>Having knowledge of AWS environment is a plus\n</p><p>\n</p><p>AlienVault experience is a plus\n</p><p>\n</p><p>Jira Cloud experience is a plus\n</p><p>\n</p><p>Knowledge of CSS Pre-processor technologies including SASS\n</p><p>\n</p><p>Able to quickly transform visual designs into accurate HTML/CSS\n</p><p>\n</p><p>Ability to write high-performance, reusable code for UI components\n</p><p>\n</p><p>Strong understanding of security and performance fundamentals required\n</p><p>\n</p><p>Familiarity with the whole web stack, including protocols and web server optimization techniques\n</p><p>\n</p><p>Great communication skills You&#39;ll be interacting with Product and Development teams\n</p><p>\n</p><p>Experience in Grunt, Rollup, or Webpack is a plus\n</p><p>\n</p><p>Good Technical skills, Communication skills, General problem-solving skills, and Coding skills\n</p><p>\n</p><p>Package: Negotiable</p>",
    "identifier": {
        "@type": "PropertyValue",
        "name": "TTS",
        "value": "cddb"
    },
    "datePosted": "2022-02-18T00:00",
    "validThrough": "2022-05-19T00:00",
    "employmentType": "permanent<br>full time",
    "hiringOrganization": {
        "@type": "Organization",
        "name": "TTS",
        "sameAs": "https://pk.profdir.com/companies/tts-ebfu",
        "logo": "https://pk.profdir.com/apple-icon.png"
    },
    "jobLocation": {
        "@type": "Place",
        "address": {
            "@type": "PostalAddress",
            "streetAddress": "R Block DHA Phase 2",
            "addressLocality": "Lahore",
            "addressRegion": "Punjab",
            "postalCode": "53720",
            "addressCountry": "PK"
        }
    },
    "baseSalary": {
        "@type": "MonetaryAmount",
        "currency": "PKR",
        "value": {
            "@type": "QuantitativeValue",
            "value": "70000",
            "unitText": "MONTH"
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.