0

I am trying to work on a classification problem: The data is of reviews of a particular product category from an e-commerce platform. Please find below the description of each attribute:

  • id: Unique identifier for each tuple.
  • category: The reviews have been categorized into two categories representing positive and negative reviews. 0 represents positive reviews and 1 represents negative reviews.
  • text: Tokenized text content of the review.

The sample dataset is attached in the picture.

image contains the training data format which consists of the above said columns

I am thinking to try TF-IDF however, given the text format don't know how to use the same.

I expect to predict the category based on the text column provided.

1 Answer 1

0

You can use the column textas several features, I would recommend you to split that column (How do I split a string into several columns in a dataframe with pandas Python?):

#first load dataframe (I assume it is excel format)
import pandas as pd
df = pd.read_excel('YourPath', header=True)
df['Text'].str.split('', expand=True)

then you can conver it to a (0,1) dataframe:

df1 = (pd.get_dummies(df.set_index(['id', 'category']).stack())
         .max(level=0)
         .rename(columns=int)
         .reset_index())

this will leads to something like:

id category 5002  7400 ....
 1    A         1     0 .....
 2   B         0     1

where the columns are the values from your dataframe, and only filled if the value exists in that category

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, but if I split them then the length of the data is not constant and hence will get multiple columns with empty values of maximum rows.
you can check out this for the conversion: stackoverflow.com/questions/58027455/…
@DhrubSatyamJha did you get any solution?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.