Since other answers have given a lot of ways to solve your problem, let me try to explain the behavior you witnessed.
First of all, Rubular is specific to Ruby's Regular Expression Semantics. (I don't have the exact information as to what is different between Ruby and Python's RegEx engines). Since you have tagged python, you might want to use regex101 or debuggex. I ll be using both these to explain.
Now, let us look at your actual RegEx and the data, here. Your input string is like this
476dn
e586
9999
rrr
ABCF
The input can be seen by Regular expression in two ways. A long string with newlines in it or a list of strings separated by newlines. We can control this behavior with a RegEx flag, which is known as multiline flag (In Python it is, re.MULTILINE or re.M). Quoting from the Python docs,
re.M
re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.
For example, in our case, if this flag is NOT enabled, the input string will be treated as a long string with newlines in it and ^ will match the position before 4 in the first line, $ will match the position after F in the last line.
When that flag is enabled, then the ^ and $ will match the corresponding positions before and after the first and the last characters respectively. So, they can match the following
- when
^ is the position before 4, $ will be the position after n
- when
^ is the position before 4, $ will be the position after 6
- when
^ is the position before 4, $ will be the position after 9
- when
^ is the position before 4, $ will be the position after r
- when
^ is the position before 4, $ will be the position after f
- when
^ is the position before e, $ will be the position after 6
- when
^ is the position before e, $ will be the position after 9
- when
^ is the position before e, $ will be the position after r
- when
^ is the position before e, $ will be the position after f
- when
^ is the position before 9, $ will be the position after 9
- when
^ is the position before 9, $ will be the position after r
- when
^ is the position before 9, $ will be the position after f
- when
^ is the position before r, $ will be the position after r
- when
^ is the position before r, $ will be the position after F
- when
^ is the position before A, $ will be the position after F
Since it can match multiple positions, we have to explicitly specify the RegEx engine that, we have to match each lines separately when we use multiline strings. In Python, we can use re.findall to re.finditer. In the RegEx world, it is normally represented with the flag g, search globally.
With this basic understanding, let us look at your data again. I believe rubular has got both these enabled, by default. We can see the matches clearly, with the capture group, like in this demo, with the RegEx
^([\D]*[0-9]+[\D]*)$
We can find the matches with Python, like this
regex = re.compile(r"^[\D]*[0-9]+[\D]*$", re.MULTILINE)
print regex.findall(data)
# ['476pe', 'e586', '9999\nrrr\nABCF']
The given pattern matches the first and the second lines, it should be trivial. But the third match might be difficult to understand at first. When we say ^[\D]*, it means that 0 or more characters which are not digits. So, an empty string can also be matched with [\D]*. So, at the beginning of 9999, [\D]* matches the empty string before 9999 and then [0-9]+ matches the digits 9999 and the rest of the string till the end will be matched by [\D]*. It matches the newlines as well because, \D anything but a digit. Since, a newline is not a digit, even that also has got matched.
Also note that \D allows other special characters as well. Quoting from the Docs,
When the UNICODE flag is not specified, matches any non-digit character; this is equivalent to the set [^0-9]. With UNICODE, it will match anything other than character marked as digits in the Unicode character properties database.
So, you might want to be more explicit like in tobias_k's answer
^[0-9a-zA-Z]*[0-9][0-9a-zA-Z]*$
This can be used in Python, like this
regex = re.compile(r"^[0-9a-zA-Z]*[0-9][0-9a-zA-Z]*$", re.MULTILINE)
print regex.findall(data)
# ['476pe', 'e586', '9999']
Or, if you can break the string into multiple strings, then you can do
regex = re.compile(r"^[0-9a-zA-Z]*[0-9][0-9a-zA-Z]*$")
print [item for item in data.split() if regex.match(item)]
# ['476pe', 'e586', '9999']
"12a34"match or not? Also, do you really want to match strings like"(/3)!§"(which your regex currently matches)?but it didn't works. What do you mean by that? Help us to help you.