0

I am struggling to understand how to return a single array from a function that calls another function several times. Currently, the console.log in the code below outputs a growing array each time the scrapingfunction runs.

The final last time that the scrapingfunction runs is actually what I want, but I want to find a way to return a single array at the end of the hello function so that I can drop each object into my database. I'm guessing this is just not me understanding javascript well enough yet.

const hello = async () => {
      //[launch puppeteer and navigate to page I want to scrape]
      await scrapingfunction(page)
      //[navigate to the next page I want to scrape]
      await scrapingfunction(page)
      //[navigate to the next page I want to scrape]
      await scrapingfunction(page)
    }

const scrapingfunction = async (page) => {
    const html = await page.content()
    const $ = cheerio.load(html)
    const data = $('.element').map((index, element)=>{   
        const dateElement = $(element).find('.sub-selement')
        const date = dateElement.text()
        return {date}
  }).get()
  console.log(data)
}


hello();
2
  • I removed the puppeteer tag since your question is about using async/await, not anything specific to puppeteer. Commented Dec 11, 2019 at 21:43
  • How did you managed to insert your data in DB? One array by one, or did u managed to make some method that merges all arrays objects into one single array? I'm facing similar issue. Thanks! Commented Feb 17, 2021 at 14:04

1 Answer 1

2

The problem you've encountered is that Promises (which is what async/await uses "under the covers") cannot return values outside the Promise chain.

Think of it this way.

You ask me to write a StackOverflow article for you and immediately demand the result of that task, without waiting for me to finish it.

When you set me the task, I haven't yet completed it, so I cannot provide a response.

You will need to restructure your request to return values from your awaits which can then be operated upon by the surrounding async function, such as:

# Assume doubleValue() takes some unknown time to return a result, like
# waiting for the result of an HTTP query.

const doubleValue = async (val) => return val * 2

const run = async () => {
  const result = []
  result.push(await doubleValue(2))
  result.push(await doubleValue(4))
  result.push(await doubleValue(8))
  console.log(result)
}

which will print [4, 8, 16] to the console.

You might think you could return the result from run() and print it to the console as in:

const run = async () => {
  const result = []
  result.push(await doubleValue(2))
  result.push(await doubleValue(4))
  result.push(await doubleValue(8))
  return result
}
console.log(run())

But since Node has no idea when run() has everything it needs to create a result, console.log will not print anything. (That's not expressly true since an async function returns a Promise, but the explanation works for this example.)

The rule is that you can await the result of other functions from within a function marked as async, but you cannot return any useful result to its surrounding context.

Since an async function does return a Promise, you could:

run().then(result => console.log(result))

But note that the result never leaves the Promise chain.

Sign up to request clarification or add additional context in comments.

1 Comment

thank you very much, this is very helpful. I solved it by sending the data to my db each time scrapingfunction finishes, and then clearing the array, but your recommended solution might be a bit leaner since it would be nice to send it all in one batch.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.