How to extract specific number of characters from a substring in python with same suffix

Question

I have this python code to extract the image src from an HTML website

listingid=[img['src'] for img in soup.select('[src]')]

Now would like to extract the values from the following output and store into a dictionary:

Any approach I can take to achieve this?

Im thinking if there is any syntax in python to take 14 characters before a specific suffix(like .jpg)

s[-18:-4]? Python slices can take negative indices to mean start from the end. — joanis
– joanis, Commented Sep 21, 2022 at 2:59

joanis · Accepted Answer · 2022-09-21 03:01:32Z

1

You can use negative indices in Python slices to count from the end. Since you say in the question you want 14 characters before a 4 character suffix, a simple s[-18:-4] would do.

With your code:

listingid = [img['src'] for img in soup.select('[src]')]
listingid = [s[-18:-4] for s in listingid]

or, in one statement:

listingid = [img['src'][-18:-4] for img in soup.select('[src]')]

answered Sep 21, 2022 at 3:01

joanis

13k23 gold badges38 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

HedgeHog · Accepted Answer · 2022-09-21 03:29:21Z

1

If the number of characters is exactly the same use slicing for shorthand, if it differ I would recommend to try split() by pattern:

[i.get('src').split('_')[-1].split('.')[0] for i in soup.select('[src]')]

or using regex:

import re
[re.search('.*?([0-9]+)\.[a-zA-Z]+$',i.get('src')).group(1) for i in soup.select('[src]')]

Example

from bs4 import BeautifulSoup

html = '''
<img src="img/katalog/honda-crv-4x2-2.0-at-2001_30082022103745.jpg">
<img src="img/katalog/mitsubishi-xpander-1.5-exceed-manual-2018_08072022134628.jpg">
'''
soup = BeautifulSoup(html)

[i.get('src').split('_')[-1].split('.')[0] for i in soup.select('[src]')]

Output

['30082022103745', '08072022134628']

edited Sep 21, 2022 at 3:29

answered Sep 21, 2022 at 3:06

HedgeHog

25.4k5 gold badges18 silver badges43 bronze badges

Collectives™ on Stack Overflow

How to extract specific number of characters from a substring in python with same suffix

2 Answers 2

Comments

Example

Output

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Example

Output

Comments

Your Answer

Sign up or log in

Post as a guest

Related