Writing from one file to another python

Question

I am trying to take some information I got from a webpage and write one of the variables to a file however I am having no luck it is probably very easy but I'm lost. Here is an example of one of the rows there are 1253 rows.

<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">

I am after the field called data-name it is not at the same spot in each row. I tried this but it did not work

mfile=open('itemlist.txt','r')
mfile2=open('output.txt','a')
for row in mfile:
    if char =='data-name':
        mfile2.write(char)

Edit 1:

I made an example file of 'hello hi peanut' if did:

for row in mfile:
    print row.index('hello')

it would print 0 as expected however when I changed the hello to hi it didnt return 1 it returned nothing.

char is not defined in your code. You could use row.index('data-name') to figure out where the attribute begins. Then you can index again starting from that index to find the two quotation marks and use string manipulation to extract the value. — poke
– poke, Commented Jul 5, 2015 at 20:36
Could you put this as an answer with an example so I can accept it as an answer — Daniel Prinsloo
– Daniel Prinsloo, Commented Jul 5, 2015 at 20:40
I would actually want you to try it on your own first before showing you how to do it. So why don’t you give it a try and then if that fails, show what you have tried, and then we can try to explain you where you went wrong. That way, you learn best. — poke
– poke, Commented Jul 5, 2015 at 20:42
im trying it but i've found that it only looks at the first value and doesn't look at the rest of the values in the row — Daniel Prinsloo
– Daniel Prinsloo, Commented Jul 5, 2015 at 20:47

poke · Accepted Answer · 2015-07-05 21:11:36Z

3

Let’s try to find the value using common string manipulation methods:

>>> line = '''<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'''

We can use str.index to find the position of a string within a string:

>>> line.index('data-name')
87

So now we know we need to start looking at index 87 for the attribute we are interested in:

>>> line[87:]
'data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'

Now, we need to remove the data-name=" part too:

>>> start = line.index('data-name') + len('data-name="')
>>> start
98
>>> line[start:]
'Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'

Now, we just need to find the index of the closing quotation mark too, and then we can extract just the attribute value:

>>> end = line.index('"', start)
>>> end
118
>>> line[start:end]
'Kill-a-Watt Allbrero'

And then we have our solution:

start = line.index('data-name') + len('data-name="')
end = line.index('"', start)
print(line[start:end])

We can put that in the loop:

with open('itemlist.txt','r') as mfile, open('output.txt','a') as mfile2w
    for line in mfile:
        start = line.index('data-name') + len('data-name="')
        end = line.index('"', start)
        mfile2.write(line[start:end])
        mfile2.write('\n')

edited Jul 5, 2015 at 21:11

answered Jul 5, 2015 at 20:55

poke

392k80 gold badges596 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Sait Over a year ago

Pretty instructive and helpful answer. +1

Daniel Prinsloo Over a year ago

I am trying this but i noticed that you left my broken loop in so im trying to fix that now but when i say print start and print end to check that it is finding the line.index values nothing comes out?

poke Over a year ago

Oh yes, sorry, I copy/pasted too much without looking, fixed that code at the end now :)

Daniel Prinsloo Over a year ago

I tried this and it isn't working for me nothing is writing or printing even when i add print start after the definition of start

poke Over a year ago

Hmm, that’s weird. Try printing the line right after for line in mfile to see if any lines actually appear.

|

Sait · Accepted Answer · 2015-07-05 21:05:11Z

1

You can also use beautifulsoup:

a.html:

<html>
    <head>
        <title> Asdf </title>
    </head>
    <body>

        <div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">

    </body>
</html>

a.py:

from bs4 import BeautifulSoup
with open('a.html') as f:
    lines = f.readlines()
soup = BeautifulSoup(''.join(lines), 'html.parser')
result = soup.findAll('div')[0]['data-price']
print result
# prints 3280000

My opinion is, if your task is pretty easy as in your example, there is actually no need of using beautifulsoup. However, if it is more complicated, or it will be more complicated. Consider giving it a try with beautifulsoup.

edited Jul 5, 2015 at 21:05

answered Jul 5, 2015 at 20:54

Sait

19.9k20 gold badges75 silver badges101 bronze badges

5 Comments

poke Over a year ago

The BeautifulSoup module name suggests that you are using version 3, which is pretty old and does not support Python 3. Please update to BeautifulSoup 4 and change the module name in your answer to bs4.

Sait Over a year ago

I proudly prefer using Python 2.7.6 unless the OP explicitly asks for a Python 3 solution. There is only Python tag in the question as far as I see.

poke Over a year ago

Sure, but bs4 works in Python 2.6+ too, and it generally seems like a bad idea to promote outdated, and no-longer updated libraries when a newer version exists (especially when all you have to do is change it to from bs4 import BeautifulSoup)

Sait Over a year ago

@poke Okay, that makes sense.. I updated my answer to bs4 with keeping print result to show it is still Python 2 :-)

poke Over a year ago

Yes, that’s totally fine with me; my issue was only with the old module name. Thanks :)

Collectives™ on Stack Overflow

Writing from one file to another python

2 Answers 2

8 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related