0

I want to import data from MS Sql Server and apply linear regression on the data in R. But i am not sure how i can manipulate the data from sql server so that i can do a regression. My table in sql server looks like this,

Pack    Cubes   Name    Sales
1001    1.2      A      10
1001    1.2      B      12
1002    0.9      A      8
1002    0.9      B      5
1002    0.9      C      12
1003    1.5      A      5
1003    1.5      C      10
1004    0.8      B      8
1004    0.8      C      10
1005    1.3      A      5
1005    1.3      B      8
1005    1.3      C      12

If i would manipulate the data in excel for a regression model it would looks like this,

Cubes   A   B   C
1.2    10   12  0
0.9    8    5   12
1.5    5    0   10
0.8    0    8   10
1.3    5    8   12

The A, B, C is my dependent variables and Cubes my independent variable. The Pack in my sql table is just a reference. My Sql connection to a DSN looks like this (which works perfectly),

library(RODBC)
myconn <- odbcConnect("sqlserver")
data <- sqlQuery(myconn,"select Cubes,Name,Sales from mytable")

With the regression i tried (which is wrong),

summary(data)
reg<-lm(Cubes~Sales,data)
summary(reg)

How can i manipulate the data from sql server as i would if i did it in excel?

3 Answers 3

2

Try reshape or the reshape package:

wide <- reshape(data, v.names = "Sales", idvar = "Cubes",
            timevar = "Name", direction = "wide")
Sign up to request clarification or add additional context in comments.

2 Comments

+1, although to get exactly what the OP asked for, you need to replace the NA by 0.
Thanks, reshape works, i just added wide[is.na(wide)]=0 that makes the NAs equal to 0.
2

I would use dcast from the reshape2 package. Note that dcast leads to NA for non-existing combinations of Name and Sales. You need to manually change this to 0:

res = dcast(df, Cubes ~ Name, value.var = 'Sales')
res[is.na(res)] = 0
res
  Cubes  A  B  C
1   0.8  0  8 10
2   0.9  8  5 12
3   1.2 10 12  0
4   1.3  5  8 12
5   1.5  5  0 10

Comments

0

You can get the data from SQL Server directly in the format you need by using an SQL as follows:

   SELECT Cubes,
    SUM(CASE WHEN Name='A' then Sales else 0 END) A,
    SUM(CASE WHEN Name='B' then Sales else 0 END) B,
    SUM(CASE WHEN Name='C' then Sales else 0 END) C
    FROM mytable
    GROUP BY Cubes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.