I am trying to understand how crime frequency affect house price in certain area. To do so, I started with Chicago crime data and zillow real estate data. I want to understand the relation between house price and crime frequency and top 5 crimes in certain areas. Initially, I build up model for this specification, but it wasn't very meaningful to me. Can anyone enlighten me what should I do? any efficient approach to train regression model for potential relation between house price and crime frequency in certain areas? any heuristic idea to move forward?
example data snippet:
here is the merged data that includes annual house price and top crime type in certain areas:
Here is reproducible example data snippet
my attempt
so here is my attempt to fit regression model with above reproducible example data:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import pandas as pd
regDF = pd.read_csv('exampleDF')
X_feats = regDF.drop(['Avg_Price_2012'], axis=1)
y_label = regDF['Avg_Price_2012'].values
sc_x = StandardScaler()
sc_y = StandardScaler()
X = sc_x.fit_transform(X_feats)
#y= sc_y.fit_transform(y_label)
y = sc_y.fit_transform(y_label .reshape(-1,1)).flatten()
regModel = LinearRegression()
regModel.fit(X, y)
regModel.coef_
but to me, above model wasn't that efficient and needs to be done something more. I think I have to use non linear regression model for those polynomial features, and I am not sure to get this done.
Can anyone point me out how to build correct model for house price prediction over type of crimes and frequencies in certain areas? any idea? Thanks
Goal:
I want to build regression model to predict house price based on crime frequencies and types in certain areas. How can I get modeling the relationship between house price and crimes in certain areas? any thoughts?
