I have sql query that return a data set in following format
user_id, type_id, avg
1, 3, 2.5
1, 2, 3.0
1, 5, 4.6
1, 11, 3.4
2, 2, 4.5
2, 3, 3.0
2, 11, 3.1
data above comes from following query, and it get executed is a very large table.
select u.user_id, t.type_id, sum(u.preference)/count(u.preference)
from user_preference u, item_type_pairs t
where t.item_id = u.item_id group by u.user_id, t.type_id;
Query takes 10min and returns 2 plus million records. My end goal is to put this in data frame where rows are user_id and columns representing type_id and each cell representing the avg value for an item by type_id.
type_id_1, type_id_2, type_id_3
u1| 3.0 2.5
u2| 4.5 3.0
What would be the best way to go about on this. I am also still figuring out? Should I be reading row by row and somehow populate the data frame?