1

How can you add variables to a dataframe in a for-loop?

I would like to create a dataframe where each column is the revenue for a region between 2009 and 2011.

regions = c('A','APAC','CEE','LATAM','ME', 'NA', 'WE')

# Loop through all regions, and add them as a column in my dataframe.
for (region in regions) {

  # create the query string
  query_string  = sprintf("SELECT date, revenue
                  FROM country_revenue
                  WHERE region = '%s'
                  AND date>='2009-01-01'
                  AND date<='2011-12-31'
                  ORDER BY date ASC
                  LIMIT 2000", region)

  # Query the database, and assign the result to a variable.
  assign(sprintf('rev.%s',region), mysql_query(query_string))

  # I only want the 2nd column returned from my query above. 
  # THIS IS THE PART THAT FAILS. Error in sprintf("rev.%s", region)[, 2] : incorrect number of dimensions
  sprintf('rev.%s',region) = sprintf('rev.%s',region)[,2]

  # Add this variable to my data frame.
  revenue = cbind(revenue, sprintf('rev.%s',region))
}
3
  • I presume what's not working is the fact that you're passing a character string to cbind rather than the name of an object. (Although, even if it worked as is, the use of assign and growing a data frame by appending columns one at a time are generally unwise.) Commented Aug 4, 2012 at 21:17
  • Thanks Joran - I'm not sure of the best way to create a dataframe using a for-loop. If you have a suggestion please let me know :) Commented Aug 4, 2012 at 21:20
  • 1
    The best way would be to not use a loop! seems to me you should be able to to the query using all regions and filter in R much more easily. Then something like dcast to go from "long" to "wide" format... is there a good reason for doing each query separately? Commented Aug 4, 2012 at 21:23

1 Answer 1

6

That would be pretty inefficient. Why not return region as part of the SQL call so you have something like

foo <- data.frame(date = rep(Sys.Date() + 0:4, 7),
                  revenue = runif(7*5),
                  region = rep(c('A','APAC','CEE','LATAM','ME', 'NA', 'WE'), 
                               each = 5))

> head(foo)
        date   revenue region
1 2012-08-04 0.1170867      A
2 2012-08-05 0.6173779      A
3 2012-08-06 0.9860934      A
4 2012-08-07 0.1344043      A
5 2012-08-08 0.5570391      A
6 2012-08-04 0.5844136   APAC

It is a simple dcast() call to reshape the data into the desired format.

> require(reshape2)
> dcast(foo, date ~ region, value.var = "revenue")
        date         A      APAC       CEE      LATAM         ME
1 2012-08-04 0.1170867 0.5844136 0.8011066 0.82864796 0.85856770
2 2012-08-05 0.6173779 0.7893151 0.3991653 0.41268349 0.05925445
3 2012-08-06 0.9860934 0.2812308 0.2272009 0.04599903 0.82367709
4 2012-08-07 0.1344043 0.7513777 0.8022602 0.96933913 0.61501816
5 2012-08-08 0.5570391 0.2915478 0.4601065 0.82996462 0.83779233
         NA         WE
1 0.4833374 0.25713295
2 0.9574843 0.22122544
3 0.5575645 0.03492411
4 0.2962364 0.51973593
5 0.9020639 0.95506837
Sign up to request clarification or add additional context in comments.

2 Comments

Exactly. Based on the OP's struggled with paste in a previous example my draft answer only differed from this by including an example of how to paste all the regions together to be able to pass them to the query for use with an IN clause.
+1 Have been struggling for a long time before I got to this solution. My question was how to create new columns from a factor variable/column

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.