3

I'm quite new to R and I would like to learn how to write a Loop to create and process several columns.

I imported a table into R that cointains data with 23 variables. For all of these variables I want to calculate the per capita valuem multiply this with 1000 and either write the data into a new table or in the same table as the old data.

So to this for only one column my operation looked like this:

<i>agriculture<-cbind(agriculture,"Total_value_per_capita"=agriculture$Total/agriculture$Total.Population*1000)</i>

Now I'm asking how to do this in a Loop for the 23 variables so that I won't have to write 23 similar lines of code.

I think the solution might look quite similar to the code pasted in this thread:

loop to create several matrix in R (maybe using paste)

but I dind't got it working on my code.

So any suggestion would be very helpful.

2
  • 1
    You can put cbind inside a loop. If your table's 23 variables are in 23 columns, just loop over the column number rather than label. E.g., cbind(agriculture, agriculture$Total/agriculture[,j]*1000) where j is the loop index. Commented Oct 30, 2012 at 13:17
  • Thanks for your reply. I tried to do it like this: agriculture<-cbind(agriculture, "Total_value_per_capita"=agriculture[,24]/agriculture$Total.Population*100 but it dindt work out. There was only one new column build with some strange numbers (I think maybe the product of all values over the prevoius columns...) Commented Oct 30, 2012 at 14:11

2 Answers 2

1

I would always favor an appropriate *ply function over loops in R. In this case sapply could be your friend:

df <- data.frame( a=sample(10), b=sample(10), c=sample(10) )
df.per.capita <– as.data.frame(
  sapply(
    df[ colnames(df) != "c" ], function(x){ x/df$c *1000 }
  )
)

For more complicated cases, you should definitely have a look at the plyr package.

Sign up to request clarification or add additional context in comments.

3 Comments

all right. this one worked out for me. Thanks a LOT!!! The only thing i maybe would like to improve is to automaticly get the columns since with this method here I still had to put all the paths in code i.e. data.frame(a=...,b=...c=...d=...). So the code still got a little bit crowded since i have 23 columns...
I'm sorry but I am not sure if I got your question right. Which column names do you mean? The names of the input data.frame (df) or the resulting data.frame (df.per.capita)? Would something like names(df.per.capita) <- paste(names(df.per.capita), "_per_capita", sep="") help you?
I ment the header of the columns to be calculated. But I just figured it out today by simply defining the whole table at once as the dataframe df<-data.frame(agriculture) Thank you for your help!!
1

This can be done using sweep function. Using Beasterfield's data generation but setting the seed you can obtain the same results

set.seed(001)
df <- data.frame( a=sample(10), b=sample(10), c=sample(10) )
per.capita <- sweep(df[,colnames(df) != "c"], 1, STATS=df$c, FUN='/')*1000
per.capita
           a          b
1   300.0000   300.0000
2  2000.0000  1000.0000
3   833.3333  1000.0000
4  7000.0000 10000.0000
5   222.2222   555.5556
6  1000.0000   875.0000
7  1285.7143  1142.8571
8  1200.0000   800.0000
9  3333.3333   333.3333
10  250.0000  2250.0000

Comparing with Beasterfield's results:

all.equal(df.per.capita, per.capita)
[1] TRUE

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.