2

I have the snippet that extract links from my html file and I wanted to add some data into the result. I had been trying to search as to how to come up with the improvements but not successful.

Any idea will be very helpful and very very welcome. Thank you.

from bs4 import BeautifulSoup
import re, codecs

srcfile = 'sourcefile.html'
URL = open(srcfile,'r', encoding="utf-8")
soup = BeautifulSoup(URL, "html.parser")
count = 0

for a_href in soup.find_all("a", href=re.compile('https://bscscan\.com/token/')):
    print("BscScan: ", a_href["href"])

Current Output:

BscScan: https://bscscan.com/token/0xd7b0B9d1F011ec19312836F09Ef24a6494da0B8F
BscScan: https://bscscan.com/token/0x6679777D2D59B80302164284a9494a2080350225

Output With Additional Data:

Name: BigDustin                         Total Supply: 10,000,000,000 BIGD
Liquidity: 5.0000 BNB ($2909.3682)      Holders: 1          Transfers: 1
  BscScan: https://bscscan.com/token/0xd7b0B9d1F011ec19312836F09Ef24a6494da0B8F

Name: SUPERHIT SHIBA                    Total Supply: 100,000,000 SUPERHIT
Liquidity: 1.0100 BNB ($587.6924)       Holders: 2          Transfers: 2
  BscScan: https://bscscan.com/token/0x6679777D2D59B80302164284a9494a2080350225

sourcefile.html #-- local .html file

🆕 <u>New token</u><br><br><strong>Version</strong>: V2<br><br><strong>Pair</strong>: WBNB-BIGD<br><strong>Liquidity</strong>: 5.0000 BNB ($2909.3682)<br>ℹ️ <a href="https://bscscan.com/tx/0xba7ed738e744e5899138529a3051ee3b9d2bdc9512ffb8e649d9c291dfe26b14">Transaction</a><br><br><strong>Name</strong>: BigDustin<br><strong>Total Supply</strong>: 10,000,000,000 <strong>BIGD</strong><br><strong>Token Price</strong>: 0.0000 BNB ($0.0000)<br><br><strong>Holders</strong>: 1<br><strong>Transfers</strong>: 1<br><br>⛓ <a href="https://bscscan.com/token/0xd7b0B9d1F011ec19312836F09Ef24a6494da0B8F">BscScan</a><br><br>🥞 <a href="https://exchange.pancakeswap.finance/#/swap?outputCurrency=0xd7b0B9d1F011ec19312836F09Ef24a6494da0B8F">Swap on PancakeSwap</a><br><br>➡️ <a href="https://poocoin.app/tokens/0xd7b0B9d1F011ec19312836F09Ef24a6494da0B8F">poocoin.app</a><br><br>0xd7b0B9d1F011ec19312836F09Ef24a6494da0B8F<br>-----------------------------------<br>Our Main Info Channel - <a href="https://t.me/YourCryptoHelper">YourCryptoHelper</a>
       </div>
      </div>
     </div>
     <div class="message default clearfix joined" id="message473415">
      <div class="body">
       <div class="pull_right date details" title="20.11.2021 03:44:47">03:44
       </div>
       <div class="text">
🆕 <u>New token</u><br><br><strong>Version</strong>: V2<br><br><strong>Pair</strong>: SUPERHIT -WBNB<br><strong>Liquidity</strong>: 1.0100 BNB ($587.6924)<br>ℹ️ <a href="https://bscscan.com/tx/0xcbddd72c16dafd622cb8ba815f68c5139b2d080943a544dfd2eb7f7f1aea86de">Transaction</a><br><br><strong>Name</strong>: SUPERHIT SHIBA<br><strong>Total Supply</strong>: 100,000,000 <strong>SUPERHIT </strong><br><strong>Token Price</strong>: 0.0000 BNB ($0.0000)<br><br><strong>Holders</strong>: 2<br><strong>Transfers</strong>: 2<br><br>⛓ <a href="https://bscscan.com/token/0x6679777D2D59B80302164284a9494a2080350225">BscScan</a><br><br>🥞 <a href="https://exchange.pancakeswap.finance/#/swap?outputCurrency=0x6679777D2D59B80302164284a9494a2080350225">Swap on PancakeSwap</a><br><br>➡️ <a href="https://poocoin.app/tokens/0x6679777D2D59B80302164284a9494a2080350225">poocoin.app</a><br><br>0x6679777D2D59B80302164284a9494a2080350225<br>-----------------------------------<br>Our Main Info Channel - <a href="https://t.me/YourCryptoHelper">YourCryptoHelper</a>
       </div>
      </div>
     </div>
1
  • You have only provided a part of code. Please post the complete code for us to help. Your HTML code is invalid (there are no matching opening <div> tags for the closing </div> tags). Commented Nov 21, 2021 at 8:22

1 Answer 1

1

How about starting from the "New token" tag and following along the chain of tags using "nextSibling", for example:

for u in soup.select('u'):
    s = u.nextSibling
    while s and s.name != 'u':
        if s.name == 'strong':
            key = s.text.strip() if s.text else ""
            s = s.nextSibling
            value = s.text.strip() if s.text else ""
            print(key, value)
        s = s.nextSibling

Result:

Version : V2
Pair : WBNB-BIGD
Liquidity : 5.0000 BNB ($2909.3682)
Name : BigDustin
Total Supply : 10,000,000,000
BIGD
Token Price : 0.0000 BNB ($0.0000)
Holders : 1
Transfers : 1
Version : V2
Pair : SUPERHIT -WBNB
Liquidity : 1.0100 BNB ($587.6924)
Name : SUPERHIT SHIBA
Total Supply : 100,000,000
SUPERHIT
Token Price : 0.0000 BNB ($0.0000)
Holders : 2
Transfers : 2
Sign up to request clarification or add additional context in comments.

4 Comments

Im getting error: v = s.text.strip(), AttributeError: 'NavigableString' object has no attribute 'text'
Updated the code to allow for empty values. I don't have your complete file to test with, I only have what you pasted above. For anything that can't be reproduced with the material you provided, you'll need to look after yourself.
Traceback (most recent call last): File "C:/Users/test/test.py", line 18, in <module> value = s.text.strip() if s.text else "" File "C:\Python38\lib\site-packages\bs4\element.py", line 921, in getattr raise AttributeError( AttributeError: 'NavigableString' object has no attribute 'text'
The sourcefile.html I am using is exactly the same file I have pasted. I am trying to make further research and testing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.