3

I want to analyze many years of of Quicken home finance records. I exported a file to qif and used bank2csv program to render a csv. Within Quicken one can use a category (eg automobile, tax), subcategories (eg automobile:service, automobile:fuel) and tags (eg self, spouse, son). bank2csv renders the categories:subcategories/tag as a concatenated string. I want to instead put the category in a category column, subcategory in a subcategory column and put whatever tags in the tag column. I saw a similar question but alas that worked bystrsplit then unlist and then indexing each element so that it could be written to the correct place by assignment. That will not work here since sometimes there is no tag and sometimes there is no subcategory. It is quite easy to split the string into a list and save that list in a column but how on earth does one assign the first element of the list to one column and the second element (if it exists) to a second column. Surely there is an elegant easy way.

simplified sample

library(data.table)
library(stringi)
dt <- data.table(category.tag=c("toys/David", "toys/David", "toys/James", "toys", "toys", "toys/James"), transaction=1:6)

How do I create a third and fourth column: category, tag. Some of tag would be NA

I can do the following but it does not get me very far. I need a way to specify the first or the second element of the resultant list (as opposed to the whole list)

dt[, category:= strsplit(x = category.tag, split = "/") ]

4 Answers 4

6

Just pushed two functions transpose() and tstrsplit() in data.table v1.9.5.

With this we can do:

require(data.table)
dt[, c("category", "tag") := tstrsplit(category.tag, "/", fixed=TRUE)]
#    category.tag transaction category   tag
# 1:   toys/David           1     toys David
# 2:   toys/David           2     toys David
# 3:   toys/James           3     toys James
# 4:         toys           4     toys    NA
# 5:         toys           5     toys    NA
# 6:   toys/James           6     toys James

tstrsplit is a wrapper for transpose(strsplit(as.character(x), ...)). And you can also pass fill=. to fill missing values with any other value than NA.

transpose() can also be used on lists, data frames and data tables.

Sign up to request clarification or add additional context in comments.

Comments

4

You could use cSplit

library(splitstackshape)
dt[, c("category", "tag") := cSplit(dt[,.(category.tag)], "category.tag", "/")]
dt
#    category.tag transaction category   tag
# 1:   toys/David           1     toys David
# 2:   toys/David           2     toys David
# 3:   toys/James           3     toys James
# 4:         toys           4     toys    NA
# 5:         toys           5     toys    NA
# 6:   toys/James           6     toys James

8 Comments

Is cSplit an abbreviation of concat.split?
I believe it's similar, with some added functionality
I love the one line easy elegance.
@Farrel, it's an alias, essentially. Several people were complaining that concat.split was too awkward to type, so when I was working on improving the function's efficiency, I changed the name to cSplit.
@Farrel, The package version for "splitstackshape" at Rdocumentation is 1.2, but the present version is 1.4.2. concat.split is essentially Gabor's read.table approach, while cSplit is a much faster implementation using strsplit. By version 1.6, I plan to have moved over to stri_split, which would be even faster. Here's a (functional) toy implementation of where cSplit should be soon.
|
2

1) Try read.table

read <- function(x) read.table(text = x, sep  = "/", fill = TRUE, na.strings = "")
dt[, c("category", "tag") := read(category.tag)]

No extra packages are needed.

2) An alternative is to use separate in the tidyr package:

library(tidyr)
separate(dt, category.tag, c("category", "tag"), extra = "drop")

The above is with tidyr version 0.1.0.9000 from github. To install it ensure that the devtools R package is installed and issue the command: devtools::install_github("hadley/tidyr") .

Update: Incorporated Richard Scrivens' comment and minor improvements. Added tidyr solution.

4 Comments

Absolutely wonderful. Now how should I decide who to award the "accepted" to. You or @RichardScriven. I like being introduced to new packages that may make my life easier with other problems but then again I assume there is an overhead when I use new packages instead of the functions that are already loaded.
getting Error in strsplit(value, sep, ...) : unused argument (extra = "drop")
I am using version 0.1
1

Since you already have "stringi" loaded, you can also look at stri_split_fixed and the simplify argument:

setnames(cbind(dt, stri_split_fixed(dt$category.tag, "/", simplify = TRUE)), 
         c("V1", "V2"), c("category", "tag"))[]
#    category.tag transaction category   tag
# 1:   toys/David           1     toys David
# 2:   toys/David           2     toys David
# 3:   toys/James           3     toys James
# 4:         toys           4     toys    NA
# 5:         toys           5     toys    NA
# 6:   toys/James           6     toys James

Though I must admit I'm partial to cSplit :-)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.