Edit : Ok, so I've solved my initial problem using a suggestion from G. Grothendieck, thanks again, exactly the clean way of doing that I was after. Initial post is below. Now reality is that my file is just a little more subtle than this template.
It actually looks like this:
A1
100
200
txt
A2
STRING
300
400
txt txt
txt
txt txt txt
A3
STRING
STRING
150
250
A2
.
.
.
a STRING that is well known right after A something, sometimes it does not occur and sometimes just one time or several occurences. I didn't notice the several occurences at first, so while thinking it was just one time when it happened, I did a loop to handle the problem :
for (i in 1:nrow(raw_data)){
if (is.na(raw_data[i,2])) {
raw_data <- raw_data[-c(i)]
} else if (raw_data[i,2] == "STRING") {
raw_data[i,2] = raw_data[i,3]
raw_data[i,3] = raw_data[i,4]
raw_data[i,4] = raw_data[i,5]
raw_data[i,5] = raw_data[i,6]
raw_data[i,6] = raw_data[i,7]
raw_data[i,7] = raw_data[i,8]
raw_data[i,8] = raw_data[i,9]
raw_data[i,9] = raw_data[i,10]
raw_data[i,10] = raw_data[i,12]
raw_data[i,11] = "Yes"
if (is.na(raw_data[i,13])){
raw_data[i,12] = NA
} else raw_data[i,12] = raw_data[i,13]
Basically I'm assigning "yes" in column 11 to say that the string was found. I clearly should state the occurence here instead of Yes/No (so 0 by default, 1 or 2 or ...). All the other column values are being shifted to the left so that they are going back to the columns where they are expected to be.
How can I adapt this, if possible, to the fact that, in reality, I may have several occurences of STRING. I might have to change entirely my approach ?
now for those of you who like the challenge, I'm really starting to assess if my processing is really efficient for this file... What about processing each line of the original file, and since we know that anything like A1 A2 etc should go in col1 etc etc... ?
Anyhow, Thanks for those who will look into this and try :)
Initial post : I have a dataset in R that is comprised of a single column containing variables that I ideally would like in multiple columns. The structure is as follow :
A1
100
200
txt
A2
300
400
txt txt
txt
txt txt txt
A3
150
250
A2
.
.
.
Ideally this is the result I'm chasing :
A1 | 100 | 200 | txt
A2 | 300 | 400 | txt txt | txt | txt txt
A3 | 150 | 250
A2 | . | . | .
The set {A1;A2;A3} is known. The main difficulty I'm hitting right now is that the number of columns is unknown.
I've started by transpose my data, and was thinking doing a loop on the single row, and each time I see one of the value in my set {A1;A2;A3} I start a new row with this value in column 1 so that column 1 only contains {A1;A2;A3} values.
I'm convinced that there is a cleaner way of doing such task.
Thanks ahead of time for your assistance with this!