1

I cannot deal with the conditions in regular expressions. I need to find everything:

<script type="text/javascript">9089089089</script>
<script>9089089089</script>

Example:

<script type="text/javascript" src="python_files/py_dict.js"></script>

My regex is not working properly. Please tell me how to do it?

re.compile(r'<script.*(?<!src$).*?>(.*)</script>')

I need to find all the <script> tags that do not contain the src attribute and display the code that is inside the tag.

7
  • What do you intend yo extract ? Commented May 23, 2015 at 16:13
  • <script>If the condition is true, get between the tag</script> Commented May 23, 2015 at 16:16
  • I have no idea what you are trying to extract. Can you actually add what you want as output? Commented May 23, 2015 at 16:19
  • Should not be src attribute Commented May 23, 2015 at 16:19
  • Do you want to just get all the attribute names and values in the script tag if there is no src attribute inside it? Please just add expected output to the question. Commented May 23, 2015 at 16:20

3 Answers 3

1

If you insist on a regex-based solution:

(?s)<script\b((?:(?!src).)*?)>(.*?)</script>

Python code:

import re
p = re.compile(r'(?s)<script\b((?:(?!src).)*?)>(.*?)</script>')
test_str = "<script type=\"text/javascript\" src=\"python_files/py_dict.js\"></script>\n<script type=\"text/javascript\">9089089089</script>\n<script>9089089089</script>"
print [(x.group(1), x.group(2)) for x in re.finditer(p, test_str)]
Sign up to request clarification or add additional context in comments.

Comments

1

You can use BeautifulSoup to find the script tags setting src=False:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)

print(soup.find_all("script", src=False))

It will return the script tags without the src attribute:

import  requests
r  = requests.get("http://stackoverflow.com/questions/30414867/make-regular-expression-python/30414987#30414987")
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.content)
print(set(soup.find_all("script")).difference(soup.find_all("script", src=False)))
{<script src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>, <script src="//cdn.sstatic.net/Js/stub.en.js?v=f07e1c0b90d5"></script>}

2 Comments

@Strelok2014Strelok, why can't you use BeautifulSoup, is this homework?
I am developing a portable app, I need a minimum of dependencies, and to parse the tags 3, BeautifulSoup pretty big ...
0

I agree with the other answers that there is probably a python package that will work more elegantly for your application. However, if you really want to use a regular expression, just look for the script tags:

re.compile(r'<script>(.*)</script>')

When you do a re.search (or re.match) will return None if there's an src attribute in the line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.