Pandas contains() returning an empty series while searching for text in a DataFrame column

Question

Dataset: fandango_score_comparison.csv

I'm trying to access a row that matches a given movie name using the following code:

df = pd.read_csv("http://github.com/mircealex/Movie_ratings_2016_17/raw/master/fandango_score_comparison.csv")

df.drop_duplicates(subset=["FILM"], inplace=True, ignore_index=True)

movie_name = df.FILM.iloc[0]
movie_df = df[df["FILM"].str.contains(movie_name)]

But the movie_df I get is always empty, irrespective of the movie_name I select. What am I missing or doing wrongly?

Try df[df["FILM"].str.contains(movie_name), regex=False]. contains assumes the argument is a regular expression. Your first movie name may accidentally be a valid regex. — The Lazy Graybeard
– The Lazy Graybeard, Commented Dec 14, 2022 at 23:21
You probably meant df[df["FILM"].str.contains(movie_name, regex=False)]. — accdias
– accdias, Commented Dec 14, 2022 at 23:23

Marco Bonelli · Accepted Answer · 2022-12-14 23:38:05Z

1

As noted in the comments, and as documented here, pandas.Series.str.contains() takes a regex= parameter, which by default is True. This means that if your movie_name contains special regular-expression characters (such as *, (), [], and so on), it will be interpreted as a regular expression, which is most likely what is happening.

You should be ok if you explicitly disable regular expressions:

movie_df = df[df["FILM"].str.contains(movie_name, regex=False)]

answered Dec 14, 2022 at 23:38

Marco Bonelli

71.1k21 gold badges129 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas contains() returning an empty series while searching for text in a DataFrame column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related