1

There is a Java Script page I am attempting to scrape with BeautifulSoup

bb2_addLoadEvent(function() {
    for ( i=0; i < document.forms.length; i++ ) {
        if (document.forms[i].method == 'post') {
            var myElement = document.createElement('input');
            myElement.setAttribute('type', 'hidden');
            myElement.name = 'bb2_screener_';
            myElement.value = '1568090530 122.44.202.205 122.44.202.205';
            document.forms[i].appendChild(myElement);
        }

I would like to obtain the value of "myElement.value", but I am not familiar with how to do so( If it is even possible with BeautifulSoup)

Ive tried :

soup = BeautifulSoup(a.text, 'html.parser')
h = soup.find('type')   ...('div') ... ('input') ... even ('var')
    print(soup)

and NO Luck :(

Is there a way of obtaining the value? If so how?

2 Answers 2

2

It would help to know more about the myElement.value across different pages. You might get away with a simple character set and lead string as shown in regex below. I would like to tighten it up but would need more examples ..... perhaps those number lengths are fixed and repeating ? ..... then something like p = re.compile(r"myElement\.value = '(\d{10}(?:(\s\d{3}\.\d{2}\.\d{3}\.\d{3}){2}))';") <= then take group 1.

import re

s = '''bb2_addLoadEvent(function() {
    for ( i=0; i < document.forms.length; i++ ) {
        if (document.forms[i].method == 'post') {
            var myElement = document.createElement('input');
            myElement.setAttribute('type', 'hidden');
            myElement.name = 'bb2_screener_';
            myElement.value = '1568090530 122.44.202.205 122.44.202.205';
            document.forms[i].appendChild(myElement);
        }'''

p = re.compile(r"myElement\.value = '([\d\s\.]+)';")
print(p.findall(s)[0])

@SIM also kindly proposed:

p = re.compile(r"value[^']+'([^']*)'"
Sign up to request clarification or add additional context in comments.

2 Comments

How about re.compile(r"value[^']+'([^']*)'")? It seems I'm beginning to learn regex.
@QHarr thanks . Your example worked perfectly. For anyone who was stuck with the same issue as me I made a get request and used the .text of the get , then used “QHarr’s” example to obtain the value.
0

If myElement.value = is static, this can be achieved with a simple regular expression:

value = re.compile(r"myElement\.value = '([^']+)'").search(str).group(1)

This matches myElement.value = ', followed by non-' characters, followed by another ', where all the non-' characters are captured in a group. Then the group(1) extracts the group from the match.

If the string may contain escaped 's as well, eg:

myElement.value = 'foo \' bar';

then alternate \. with [^']:

myElement\.value = '((?:\\.|[^'])+)'

https://regex101.com/r/Tdarel/1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.