0

I got the following data frame:

     rowid                                                url                                               text  domain_id            domain_id_label  ...   punsafe  pwatermark  aesthetic                 hash  __index_level_0__
0        1  https://cdn.idahopotato.com/cache/4075b86c99bc...               Fattoush Salad with Roasted Potatoes          1        cdn.idahopotato.com  ...  0.000019    0.042545   6.098400 -7769833748550554891                113
1        2  https://lh3.googleusercontent.com/-Gw5LBM0zYU8...  an analysis of self portrayal in novels by vir...          2  lh3.googleusercontent.com  ...  0.000002    0.405680   6.109017  8675719636262469033                877
2        3  https://www.mediaplaynews.com/wp-content/uploa...  Christmas Comes Early to U.K. Weekly Home Ente...          3      www.mediaplaynews.com  ...  0.023502    0.408992   6.023093  -510709293545570516                952
3        4  https://statesofincarceration.org/sites/defaul...  Amy Garcia Wikipedia a legacy of reform: dorot...          4  statesofincarceration.org  ...  0.000006    0.155641   6.431951  7982521258241828259               1163
4        5  https://cdn.shopify.com/s/files/1/0094/8653/26...                  3D Metal Cornish Harbour Painting          5            cdn.shopify.com  ...  0.000008    0.109816   6.167709 -2541341491343729392               1431
..     ...                                                ...                                                ...        ...                        ...  ...       ...         ...        ...                  ...                ...
995    996  https://i.pinimg.com/736x/c6/35/8e/c6358ecfe2e...  Fashion Photography vs Amazing Interiors // Mo...         24               i.pinimg.com  ...  0.777287    0.157218   6.396332 -9073600318089725879             171799
996    997  https://www.twi-ny.com//wp-content/uploads/201...  Takashi Miike riffs on multiple genres in the ...        594             www.twi-ny.com  ...  0.015503    0.081062   6.120159  4126112080526841162             172272
997    998  https://us.123rf.com/450wm/nyul/nyul1405/nyul1...  Portrait of happy casual caucasian married cou...         16               us.123rf.com  ...  0.881655    0.343428   6.009459  9208056874965420704             172375
998    999  https://t3.ftcdn.net/jpg/00/65/41/20/240_F_654...  Idyllic summer landscape with mountain lake an...         64               t3.ftcdn.net  ...  0.000010    1.000000   6.374364  4701612357070778743             173088
999   1000  https://i.pinimg.com/736x/8b/5f/56/8b5f565710c...  Beards change everything. Colin Morgan is not ...         24               i.pinimg.com  ...  0.020406    0.222567   6.241051 -8544261063483623093             173506

[1000 rows x 13 columns]

The url column are the URLs of images I want to download. This is my code:

import pandas as pd
import requests

counter = 0

data = pd.read_csv('data.csv')
df = pd.DataFrame(data)


urls = df['url'].tolist()
print(urls)

for i in urls:
    img_data = requests.get(i).content

    with open('image_'+str(counter)+'.jpg', 'wb') as handler:
        handler.write(img_data)

Right now, what this code does is convert df['url'] to a list and download every single image from that URL.

What I want to do instead is:

  • Iterate through every entry of df['url']
  • Download the image from that url
  • Rename the image to image_i.jpg
  • Rename the corresponding df['url'] url to the path of that image (they'll be in the same folder so just the image name)

How can I go about doing it this way?

1 Answer 1

1

You can write a custom function for this and call it in df.apply:

Following is a working example with dummy data:

def download_url(row):
  img_data = requests.get(row["url"]).content
  with open(f"/content/sample_data/tmp/image_{row.name}.jpg", "wb") as handler:
    handler.write(img_data)
  return f"image_{row.name}.jpg"
# 
df["url"] = df.apply(lambda row: download_url(row), axis=1)

[Out]:
   rowid          url
0      1  image_0.jpg
1      2  image_1.jpg
2      3  image_2.jpg

Dummy dataset used:

data=[
  [1,"https://www.python.org/static/community_logos/python-logo.png"],
  [2,"https://www.python.org/static/community_logos/python-powered-w-100x40.png"],
  [3,"https://www.python.org/static/community_logos/python-powered-h-50x65.png"]
]

columns = ["rowid","url"]

df = pd.DataFrame(data=data, columns=columns)
Sign up to request clarification or add additional context in comments.

2 Comments

Does this also update the dataframe?
Yes. df["url"] = df.apply() updates the"url" feature as you have asked. See the output shown. Please test it and let me know if it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.