Writing images and text to files using python

Question

I am trying to use beautiful soup to scrape images from multiple URLs, and then write the URLs and the images to a file. The file format would look like:

TEXT OF URL_1

img_1 (ACTUAL IMAGE SHOWN)

img_2 (ACTUAL IMAGE SHOWN)

TEXT OF URL_2

img_1 (ACTUAL IMAGE SHOWN)

The first few lines of my output file right now looks like:

Company : Firehydrant     URL : https://www.firehydrant.io/âPNG


IHDRLf9ÃŒ∫   pHYsöúYiTXtXML:com.adobe.xmp<?xpacket begin="Ôªø" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c148 79.164036, 2019/08/13-01:06:57  

...

How can I view my file with the images actually showing instead of the binary? Or is there a different way of doing this? Sorry if this is a really stupid question!!

Here is my code right now for 1 website:


with open(file_name, 'wb') as img_file:

    option = webdriver.ChromeOptions()
    option.add_argument(" — incognito")
    browser = webdriver.Chrome(executable_path='./chromedriver', chrome_options=option)

    url = 'https://www.firehydrant.io/'

    browser.get(url)
    timeout = 10
    WebDriverWait(browser, timeout)

    soup = BeautifulSoup(browser.page_source, 'html.parser')
    images = soup.find_all("img")

    found_first_image = False
    for image in images:
        src = image['src']
        if(found_first_image == False): # ADD THE TEXT FOR THE COMPANY/URL
            found_first_image = True
            string = ("URL : " + url).encode('utf-8') 
            img_file.write(string)

        # removing everything after the '?' if there is one in the src tag
        src = urljoin(url, src)
        if("?" in src):
            pos = src.index("?")
            src = src[:pos]
        parsed = urlparse(src)
        if(bool(parsed.netloc) and bool(parsed.scheme)): # download the image and write it to the file
            response = requests.get(src)
            URLFile.write(response.content)

How do you want to actually view the file? The file might be better structured as json with base64 encoding for the images embedded in a json value or store the files separately and zip all the files up for distribution — bherbruck
– bherbruck, Commented May 29, 2020 at 8:42
The way you need to think about this: files don't actually contain images or text or anything else. They contain bytes - raw data. It's up to the program you use to view the file, to decide what that data actually means, and display something appropriate. When you view, for example, a .html file in a text editor, it looks different from what you see in a web browser - because those differing interpretations are forced onto the same data by those programs. — Karl Knechtel
– Karl Knechtel, Commented May 29, 2020 at 8:46
I was hoping to just view it in text edit - there are thousands of websites being crawled, and I don't want to store all of the images in various folders/directory and click through one-by-one. If I don't write the text I am printing to my file (the url/company name) and just write the images and then open it in text edit, I can see the first image visibly but not any of the rest. And when I keep the text mixed with the images, I see all of the text and images as symbols when opening in text edit. — jbug123
– jbug123, Commented May 30, 2020 at 6:00

Abhishek Verma · Accepted Answer · 2020-05-29 08:43:34Z

1

Store your images as image files outside rather than putting them here as text. Just keep a unique id for the image when you write it and put that unique id in your text file instead of the image. For saving image, cv2.imread can be used and for generating unique number use:

import uuid

uuid.uuid1().hex

answered May 29, 2020 at 8:43

Abhishek Verma

1,7291 gold badge10 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jbug123 Over a year ago

Thank you Abhishek, though I would prefer to not save each image as a file. Is there a workaround to my original question/code?

Collectives™ on Stack Overflow

Writing images and text to files using python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related