0

I have the following class methods to parse an individual URL:

product = Product(links[0], user_agents)
result = product.parse()

and class code:

class Product:
    soup = None
    url = None

    def __init__(self, url, user_agents):
        self.url = url
        print('Class Initiated with URL: {}'.format(url))
        # Randomize the user agent
        user_agent = get_random_user_agent(user_agents)
        user_agent = user_agent.rstrip('\n')

        if 'linux' in user_agent.lower():
            sec_ch_ua_platform = 'Linux'
        elif 'mac os x' in user_agent.lower():
            sec_ch_ua_platform = 'macOS'
        else:
            sec_ch_ua_platform = 'Windows'

        headers = {
            
        }
        r = create_request(url, None, headers=headers, is_proxy=False)
        if r is None:
            raise ValueError('Could not get data')
        html = r.text.strip()
        self.soup = BeautifulSoup(html, 'lxml')

    def parse(self):
        record = {}
        name = ''
        price = 0
        user_count_in_cart = 0
        review_count = 0
        rating = 0
        is_personalized = 'no'

        try:
            name = self.get_name()
            price = self.get_price()
            is_pick = self.get_is_pick()

Now I want to call parse() in multiprocessing. How do I do it? For a single record I am doing like this:

product = Product(links[0], user_agents)
result = product.parse()
10
  • FWIW, that's a "normal" method, not a class method (which are noted by the classmethod decorator) Commented Sep 27, 2022 at 13:45
  • 1
    If you are ok with modifying Product so that __init__ 1) Uses a default user_agents value that is fixed between all executions, and 2) It calls self.parse, then you can just do multiprocessing.pool.Pool().map_async(Product, links) (or using any other function in the Pool arsenal) Commented Sep 27, 2022 at 13:58
  • @DeepSpace that makes sense. I will change and update you. Commented Sep 27, 2022 at 14:07
  • @DeepSpace should not map_async(Product, links) be map_async(Product.parse, links)? Commented Sep 27, 2022 at 14:22
  • @DeepSpace I did this result.extend(p.map(product.parse, links)) and it gives error: TypeError: parse() takes 1 positional argument but 2 were given. parse has the following signature: def parse(product_url): Commented Sep 27, 2022 at 14:23

1 Answer 1

1

With currecnt class you may need to create function which gets url and it creates product = Product(url,...) and it runs product.parse() - and this new function you can use .map(new_function, links)

Something like this:

def check(url):
    product = Product(url, user_agents)
    result = product.parse()
    return result

for multiprocessing.pool.Pool() as p:
    results = p.map(check, links)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.