0

I do the preprocessing for the data to apply to K-means cluster for time-series data following hour. Then, I normalize the data but it shows the error:

`

Traceback (most recent call last):
  File ".venv\lib\site-packages\pandas\core\series.py", line 191, in wrapper
    raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".venv\timesequence.py", line 210, in <module>
    matrix = pd.DataFrame(scaler.fit_transform(x_calls), columns=df_hours.columns, index=df_hours.index)
  File ".venv\lib\site-packages\sklearn\base.py", line 867, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 420, in fit
    return self.partial_fit(X, y)
  File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 457, in partial_fit
    X = self._validate_data(
  File ".venv\lib\site-packages\sklearn\base.py", line 577, in _validate_data
    X = check_array(X, input_name="X", **check_params)
  File ".venv\lib\site-packages\sklearn\utils\validation.py", line 856, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File ".venv\lib\site-packages\pandas\core\generic.py", line 2064, in __array__
    return np.asarray(self._values, dtype=dtype)
ValueError: setting an array element with a sequence.
#--------------------Preprocessing ds 
counter_ = 0
zero = 0
df_hours = pd.DataFrame({
    'Hour': [],
    'SumView':[],
    'CountStudent':[]
}, dtype=object)

while counter_ < 24:
    if (counter_ in sub_data_hour['Hour']):
        row = sub_data_hour.loc[(pd.to_numeric(sub_data_hour['Hour'], errors='coerce')) == counter_]
        df_hours.loc[len(df_hours.index)] = [counter_, row['SumView'], row['CountStudent']]
    else:
        df_hours.loc[len(df_hours.index)] = [counter_, zero, zero]
    counter_ += 1

#----------Normalize dataset------------
x_calls = df_hours.columns[2:]
scaler = MinMaxScaler()
matrix = pd.DataFrame(scaler.fit_transform(df_hours[x_calls]), columns=x_calls, index=df_hours.index) 

`

I did try .to_numpy() or .values or [['column1','column2']] following this post pandas dataframe columns scaling with sklearn

But it did not work. Could anyone please help me to fix this? Thanks.

1 Answer 1

0

The problem here is the datatype of df_hours I preprocessed.

Solution: change row['SumView'] to row['SumView'].values[0] and do the same with row['CountStudent'].

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.