I have a dataset that, in part, has a URL field indicating the location of a resource. Some URLs are persistent (e.g. handles and DOIs) and thus, need to be resolved to their original URL. I am primarily working with Python and the solution that seems to work, thus far, involves using the Requests HTTP library.
import requests
var_output_url = requests.get("http://hdl.handle.net/10179/619")
var_output_url.url
While this solution works, it is quite slow as I have to loop through ~4,000 files, each with around 2,000 URLs. Is there a more efficient way of resolving the URL redirects?
I tested my current solution on one batch and it took almost 5 minutes; at this rate, it will take me a couple of days (13 days) to process all the batches [...] I know, it will not necessarily be that long and I can run them in parallel