-1

Sometimes I get the following message:

in process_item item['external_link_rel'] = dict_["rel"]
KeyError: 'rel'

It must be because it doesn't exist. I tried to manage but failed.

from lxml import etree

class CleanItem():

def process_item(self, item, spider):

    try:
        root = etree.fromstring(str(item['external_link_body']).split("'")[1])
        dict_ = {}
        dict_.update(root.attrib)
        dict_.update({'text': root.text})
        item['external_link_rel'] = dict_["rel"]
        return item
   
    except KeyError as EmptyVar:
        if str(EmptyVar) == 'rel': 
            dict_["rel"] = "null"
            item['external_link_rel'] = dict_["rel"]
            return item

Most likely, all problems are due to this line if str(EmptyVar) == 'rel'.


Thank you for guiding me so that an operation is performed only when this error occurs.
Before asking the question, I did a lot of research and did not come to a conclusion
Just for information, the above codes are in the pipelines.py file inside the Scrapy framework

2 Answers 2

1

A better way to do it is to use the dictionary attribute get. You can read on it here

from lxml import etree

class CleanItem():
    def process_item(self, item, spider):
        root = etree.fromstring(str(item['external_link_body']).split("'")[1])
        dict_ = {}
        dict_.update(root.attrib)
        dict_.update({'text': root.text})
        item['external_link_rel'] = dict_.get("rel", "null")
        return item
Sign up to request clarification or add additional context in comments.

1 Comment

Can you help with this question? stackoverflow.com/q/73389010/8519380
1

Why not just use a conditional statement?

from lxml import etree

class CleanItem():
    def process_item(self, item, spider):
        root = etree.fromstring(str(item['external_link_body']).split("'")[1])
        dict_ = {}
        dict_.update(root.attrib)
        dict_.update({'text': root.text})
        if 'rel' not in dict_:            # If 'rel' is not a key in dict
           dict_["rel"] = "null"          
           item['external_link_rel'] = dict_["rel"]  
           return item                    
        item['external_link_rel'] = dict_["rel"]  # else ...
        return item

If you really wanted to use try/except clauses you could do this. I would never recommend using try/except where it isn't necessary though.

def process_item(self, item, spider):
    root = etree.fromstring(str(item['external_link_body']).split("'")[1])
    dict_ = {}
    dict_.update(root.attrib)
    dict_.update({'text': root.text})
    try:
        item['external_link_rel'] = dict_["rel"]
        return item
    except KeyError:
        dict_["rel"] = "null"
        item['external_link_rel'] = dict_["rel"]
        return item

6 Comments

thanks, Can't it be done with try/except command?
@sardar Sure it could. but then you are raising an exception and catching it, which creates unnecessary overhead and makes the code less readable as well.
@Sardar see update to answer for an example.
There is a challenge, whenever a KeyError occurs that exception is thrown, and it may not be a KeyError for rel.
that is why the conditional expression is a much better solution. otherwise you would need to raise the exception yourself with a message in which case you would still be using a conditional expression
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.