0

How can you optimize the following code and write less?

class CleanItem():

   def process_item(self, item, spider):

      PSCV = str(item['page_source_canonical']).split("'")[1]

      if PSCV != "":
          if PSCV != item['page_source']:
              item['page_source_canonical_is_itself'] = False
          else:
              item['page_source_canonical_is_itself'] = True
      else:
              item['page_source_canonical_is_itself'] = True

    
      return item

First it checks if it is empty. If it is empty, it is true. If it is not empty it should be checked and if it is the same then again it is true. Otherwise it is false.

1
  • 4
    PSCV in ("", item['page_source']) gives you your desired boolean value directly. Commented Aug 16, 2022 at 21:50

3 Answers 3

5

You wrote

if PSCV != "":
    if PSCV != item['page_source']:
        item['page_source_canonical_is_itself'] = False
    else:
        item['page_source_canonical_is_itself'] = True
else:
        item['page_source_canonical_is_itself'] = True

You want

item['page_source_canonical_is_itself'] =  PSCV in (
    '', item['page_source'])

That's DRY, and very clearly spells out Author's Intent that we shall assign True upon matching either value, else False.


Style nit: PEP 8 asks that you spell it pscv, lowercase.


Consider deleting the unused spider parameter.

Sign up to request clarification or add additional context in comments.

5 Comments

What is this operator called? and do you have an educational resource?
I think you're referring to the in operator: docs.python.org/3/reference/expressions.html#in Also, linter documentation is here: flake8.pycqa.org/en/latest
the "in" operator is used to check if an element is found in an iterable e.g a list, tuple. Here it is checking if PSCV is '' or item['page_source']
@J_H Can you help with this question? stackoverflow.com/q/73389010/8519380
@zaki98's use of return request.replace(...) seems pretty relevant.
2

Yes, you could simplify:

if PSCV != "":
    if PSCV != item['page_source']:
        item['page_source_canonical_is_itself'] = False
    else:
        item['page_source_canonical_is_itself'] = True
else:
    item['page_source_canonical_is_itself'] = True

to:

if PSCV != "" and PSCV != item['page_source']:
    item['page_source_canonical_is_itself'] = False
else:
    item['page_source_canonical_is_itself'] = True

or even further to:

condition = PSCV != "" and PSCV != item['page_source']
item['page_source_canonical_is_itself'] = not condition

4 Comments

Or shorter... item['page_source_canonical_is_itself'] = PSCV == "" or PSCV == item['page_source']
@OneCricketeer That is indeed even shorter, but also 4 characters above the maximum line length of 79, as recommended by PEP
How about? = PSVC in ("", item['page_source'])
Good one, was also thinking about that, but I'd argue it's a tad bit less intuitive
1
if PSCV != "":
       if PSCV != item['page_source']:
           item['page_source_canonical_is_itself'] = False
       else:
           item['page_source_canonical_is_itself'] = True
   else:
           item['page_source_canonical_is_itself'] = True

If I wanted to refactor the code above, two options

  1. The most explanatory (not necessarily the shortest):

    if (PSCV == ""):
          item['page_source_canonical_is_itself'] = True
    elif (PSCV == item['page_source']):
          item['page_source_canonical_is_itself'] = True
    else:
          item['page_source_canonical_is_itself'] = False
    
  2. The shorter - ternary:

    item['page_source_canonical_is_itself'] = ( (PSCV == "") or (PSCV == item['page_source']) )? true: false
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.