2

I want to learn Haskell and I have another small project (currently in Elixir) that I'd like to port as an exercise. It is a simple web scraper that scrapes a list of urls.

Imagine having a list of zip codes, around 2500 items. For each entry, a web page should be scraped, in the form of http://www.acme.org/zip-info?zip={ZIP}. I managed to write the code to crawl a single web page using Scalpel.

But how would I go about scraping the 2500 items? In Elixir I map over the list of postal codes and after each page request there is a short sleep of 1 second, just to ease off pressure on the targeted website. It is not important to me to scrape the website as fast as possible.

How would I do this in Haskell? I read about threadSleep but how do I use that in combination of the list to traverse and the main method, since the sleep is side effect.

Thanks for the insights!

1 Answer 1

4

Presumably you already have a function like:

scrapeZip :: Zip -> IO ZipResult

Then you can write a function with traverse to get an IO action that returns a list of zip results:

scrapeZips :: [Zip] -> IO [ZipResult]
scrapeZips zipCodes = traverse scrapeZip zipCodes

But you want to add a delay, which can be done using threadDelay (you can import it from Control.Concurrent):

scrapeZipDelay :: Zip -> IO ZipResult
scrapeZipDelay zip = do
  x <- scrapeZip zip
  threadDelay 1000000 -- one second in microseconds
  return x

And then you can use this scrapeZipDelay with traverse:

scrapeZipsDelay :: [Zip] -> IO [ZipResult]
scrapeZipsDelay zipCodes = traverse scrapeZipDelay zipCodes

Instead of defining a whole new scrapeZipDelay function you can also write a pretty small version with the <* operator:

scrapeZipsDelay :: [Zip] -> IO [ZipResult]
scrapeZipsDelay zipCodes = 
  traverse (\zip -> scrapeZip zip <* threadDelay 1000000) zipCodes
Sign up to request clarification or add additional context in comments.

2 Comments

thank you for this answer, it makes sense. I'll have to read up on the <* operator, but I will keep that for later. Enough to think and learn about already :). Thank you again, I'll give it a go and mark as an answer according to that.
@JeroenBourgois Not much reading needed, really. If you think of it as being defined as a <* b = do { x <- a; b; return x } -- notice the parallels to scrapeZipDelay -- you will be wrong only in unimportant ways.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.