0

I have this code below that downloads files on each loop. What i would like to do is on each loop it should extract the files to its corresponding folder. Whats currently happening is its saving all the files in each folder and overwriting files with the same names.

Any help would be appreciated.

import os
import requests
import zipfile, StringIO
from bs4 import BeautifulSoup

# Here were add the login details to be submitted to the login form.
payloads = [
    {'USERNAME': '1111','PASSWORD': '1111','option': 'login'},
    {'USERNAME': '2222','PASSWORD': '2222','option': 'login'},
    {'USERNAME': '3333','PASSWORD': '3333','option': 'login'},
    {'USERNAME': '4444','PASSWORD': '4444','option': 'login'},
    ]

folders = [r"C:\temp\1111", r"C:\temp\2222", r"C:\temp\3333", r"C:\temp\4444"]

#Possibly need headers later.
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
base_url = "https://service.rl360.com/scripts/customer.cgi/SC/servicing/"
# Use 'with' to ensure the session context is closed after use.

for payload in payloads:

    with requests.Session() as s:
            p = s.post('https://service.rl360.com/scripts/customer.cgi?option=login', data=payload)

            # Get the download page to scrape.
            r = s.get('https://service.rl360.com/scripts/customer.cgi/SC/servicing/downloads.php?Folder=DataDownloads&SortField=ReportDate&SortOrder=Ascending', stream=True)
            content = r.text
            soup = BeautifulSoup(content, 'lxml')
            #Now i get the most recent download URL.
            download_url = soup.find_all("a", {'class':'tabletd'})[-1]['href']
            #now we join the base url with the download url.
            download_docs = s.get(base_url + download_url, stream=True)
            print "Checking Content"
            content_type = download_docs.headers['content-type']
            print content_type
            print "Checking Filename"
            content_name = download_docs.headers['content-disposition']
            print content_name
            print "Checking Download Size"
            content_size = download_docs.headers['content-length']
            print content_size
            #This is where we extract and download the specified xml files.
            z = zipfile.ZipFile(StringIO.StringIO(download_docs.content))
            print "---------------------------------"
            print "Downloading........."
            for folder in folders:
              z.extractall(folder)
            #Now we save the files to the specified location.
            print "Download Complete"
3
  • Use z.read() or z.readstr() function to extract file by file and change their paths and/or names as you need. What is odd about extractall() overwriting existing files? If in each ZIP they are named the same they will be overwritten. It is simple as that. Commented Aug 21, 2017 at 11:28
  • 1
    What you are doing here is extracting same ZIP to all specified folders. So the loop has to be reworked to follow ZIP's content, not the other way around. Assuming, of course, that is what you want to achieve. You can always specify only "C:\temp" to extract whole ZIP to (no loops needed) and then deal with resulting mess. Commented Aug 21, 2017 at 11:42
  • The problem is that im using four different logins, each one has files and those files could have the same name as one in another account. Commented Aug 21, 2017 at 11:46

2 Answers 2

1

Use z.read() or z.readstr() function to extract file by file and change their paths and/or names as you need. What is odd about extractall() overwriting existing files? If in each ZIP they are named the same they will be overwritten. It is simple as that.

What you are doing here is extracting same ZIP to all specified folders. So the loop has to be reworked to follow ZIP's content, not the other way around. Assuming, of course, that is what you want to achieve. You can always specify only "C:\temp" to extract whole ZIP to (no loops needed) and then deal with resulting mess.

You said:

"The problem is that im using four different logins, each one has files and those files could have the same name as one in another account."

But then you should deduced, from my comments copied above, that you have to switch two loops. I.e. you use a new folder (from folders) per each username (from payloads. Here you extract the same ZIP in all folders. The result being all folders specified being filled with the ZIP from last login. So rework this.

A little example keeping your design with necessary changes:

from zipfile import ZipFile
from cStringIO import StringIO
import os
temp_folder = "C:\\temp"
users = {
    123: ("login for 123 etc", "Download URL for 123"),
    345: ("Login for 345 or URL or whatever", "Download URL for 345")}

for user in users:
    login_to(users[user][0])
    content = get_zip_from(users[user][1])
    path = os.path.join(temp_folder, user)
    if not os.path.exists(path):
        os.makedirs(path)
    z = ZipFile(StringIO(content))
    z.extractall((path)

Reworking the thing to follow folders would be similar but you always need to connect folder with the user somehow. Here I used username to be a folder to extract the content to. I hope that is what you really need.

Sign up to request clarification or add additional context in comments.

Comments

0

I think, you got a typo... replace "for folder in folder:" with "for folder in folders:". That should do the trick.

2 Comments

Apologies, in my code it was correct, just typed it wrong when compiling my question.
Yeah the typo is here, but this would raise an exception as the variable folder isn't declared before. So I think the original code doesn't have the same typo as OP didn't say so. And the error would be obvious anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.