0

My goal is to search file.txt to find a identifying string and then output the following words between the quotation marks.

So the identifier would be data-default-alt= and the name of the item is "Ford Truck" in quotes. I would like to output the name of the item and the price so that i can open it in excel.

data-default-alt="Ford Truck">       </h3>     </a>           </div>     <div class="tileInfo">                <div class="swatchesBox--empty"></div>                                                     <div class="promo-msg-text">           <span class="calloutMsg-promo-msg-text"></span>         </div>                              <div class="pricecontainer" data-pricetype="Stand Alone">               <p id="price_206019013" class="price price-label ">                  $1,000.00               </p> 

Desired Output would be

Ford Truck 1000.00

I am not sure how to go about this task.

1
  • Have you tried regular expressions? Commented Mar 24, 2016 at 17:51

2 Answers 2

1

Well please construct more robust regular expressions for matching your cost and/or brand, here is some code to get you started.

str = '<data-default-alt="Ford Truck"></h3></a></div><div class="tileInfo"><div class="swatchesBox--empty"></div><div class="promo-msg-text"> <span class="calloutMsg-promo-msg-text"></span> </div><div class="pricecontainer" data-pricetype="Stand Alone"><p id="price_206019013" class="price price-label ">$1,000.00</p>'

import re

brand=re.search('<data-default-alt=\"(.*?)">',str)
cost=re.search('\$(\d+,?\d*\.\d+)</p>',str)
if brand:
        print brand.group(1)
if cost:
        print cost.group(1)
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks this gave me output of Ford Truck 1,000.00. How do i read from a text file?
there will be multiples of these in each file also how do i get all of them?
I am EST Time zone
Getting this error now. brand=re.search('<data-default-alt=\"(.*?)">',str) File "C:\Users\turtle02\Anaconda2\lib\re.py", line 146, in search return _compile(pattern, flags).search(string) TypeError: expected string or buffer
all code with open("file.txt") as str: st = str.read() import re brand=re.search('<data-default-alt=\"(.*?)">',str) cost=re.search('\$(\d+,?\d*\.\d+)</p>',str) if brand: print brand.group(1) if cost: print cost.group(1)
|
0

Use the default string methods to find the substring index. For example, "abcdef".find("bc") would return 1, which is the index of the first letter of the substring. To parse your string, you could look for tags and then extract the needed text using string slicing.
So this is an example of solving your problem, considering that the parsed string is being stored in a st variable:

with open("file.txt") as f:
    st = f.read() # that's to get the file contents
name_start = st.find('data-default-alt="') + len('data-default-alt="') # found the first letter's index and added the substring's length to it to skip to the part of the actual data
name_end = st[name_start:].find('"') # found the closing quote
name = st[name_start:name_start + name_end] # sliced the string to get what we wanted

price_start = st.find('class="price price-label ">') + len('class="price price-label ">')
price_end = st[price_start:].find('</p>')
price = st[price_start:price_start + price_end].strip().rstrip()

The results are in name and price variables. If you wanna work with the price as a number and don't want the dollar sign, add it to the strip arguments (.strip("$ "), read more on that method in Python docs). You can remove the comma by calling a replace(",", "") on the price string and after all, convert the string to a float using float(price)
Notes: it may just be the way you put the parsed string in, but I've added strip() and rstrip() methods to get rid of whitespaces on each end of the price string.

11 Comments

I seem to have messed up something i get this output {{= $item.parent.data.itemAttributes.title}} $2.84 there will be multiples of these in each file also how do i get all of them?
@turtle02 If you will have multiple of those, you might be better off using regular expressions. If you're having trouble reading from a file, take a look at the first two lines of my code, they do just that.
Getting this error now. brand=re.search('<data-default-alt=\"(.*?)">',str) File "C:\Users\turtle02\Anaconda2\lib\re.py", line 146, in search return _compile(pattern, flags).search(string) TypeError: expected string or buffer
@turtle02 Please, provide the str variable contents by printing it right before that error-causing line
<li class="tile standard atc-enabled"><div class="tileImage"><a href="example.com/p/ford-truck/-/A-14773925#prodSlot=_1_1" title="ford-truck" id="prodTitle-medium-1-1" class="productClick" name="prodImageTitle_206019013" data-title="ford-truck" data-default-title="ford-truck" data-default-href="/p/ford-truck/-/A-14773925#prodSlot=_1_1"> <h3> <img style="visibility: visible;" text"></span> </div><div class="pricecontainer" data-pricetype="Stand Alone"> <p id="price_206019013" class="price price-label "> $1000.00</p><p class="regularprice-label">
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.