0

A bit of background: When I save a web page from e.g. IE8 as "webpage, complete", the images and such that the page contains are placed in a subfolder with the postfix "_files". This convention allows Windows to synchronize the .htm file and the accompanying folder.

Now, in order to keep the synchronization intact, when I rename the HTML file from my Python script I want the "_files" folder to be renamed also. Is there an easy way to do this, or will I need to
- rename the .htm file
- rename the _files folder
- parse the .htm file and replace all references to the old _files folder name with the new name?

1
  • In Perl, there is a module for that, I am pretty sure :-), although I guess you have to write your own in Python Commented Aug 26, 2009 at 7:18

3 Answers 3

1

There is just one easy way: Have IE save the file again under the new name. But if you want to do it later, you must parse the HTML. In this case, BeautifulSoup is your friend.

Sign up to request clarification or add additional context in comments.

Comments

0

If you rename the folder, I'm not sure how you can get around parsing the .htm file and replacing instances of _files with the new suffix. Perhaps you can use a folder alias (shortcut?) but then that's not a very clean solution.

Comments

0

you can use simple string replace on your HTML file without parsing it, it can of course be troublesome if the text being replaced is mentioned in the HTML itself..

os.rename("test.html", "test2.html")
os.rename("test_files", "test2_files")

with open("test2.html", "r") as f:
     s = f.read().replace("test_files", "test2_files")

with open("test2.html", "w") as f:
     f.write(s)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.