11

R is not able to cope with null-strings (\0) in characters, does anyone know how to handle this? More concrete, I want to store complex R objects within a database using an ODBC or JDBC connection. Since complex R objects are not easily to be mapped to dataframes, I need a different possibility to store such objects. An object could be for example:

library(kernlab)
data(iris)
model <- ksvm(Species ~ ., data=iris, type="C-bsvc", kernel="rbfdot", kpar="automatic", C=10) 

Because >model< cannot be stored directly in a database, I use the serialize() function to retrieve a binary representation of the object (in order to store it in a BLOB column):

 serialModel <- serialize(model, NULL)

Now I would like to store this via ODBC/JDBC. To do so, I need a string representation of the object in order to send a query to the database, e.g. INSERT INTO. Since the result is a vector of type raw vector, I need to convert it:

 stringModel <- rawToChar(serialModel)

And there is the problem:

Error in rawToChar(serialModel) : 
  embedded nul in string: 'X\n\0\0\0\002\0\002\v\0......

R is not able to deal with \0 in strings. Does anyone has an idea how to bypass this restriction? Or is there probably a completly different approach to achieve this goal?

Thanks in advance

2
  • The obvious way to support this is for database connectors to store raw vectors as BLOBs. Does RODBC not have support for this? I would be surprised if it didn't. I Know RMySQL doesn't. It's on my todo list though ;) Commented May 10, 2011 at 15:03
  • This is what I am looking for. However, I'm not aware of any functionality in RODBC or RJDBC providing this feature :/ Commented May 10, 2011 at 16:41

2 Answers 2

11

You need

stringModel <- as.character(serialModel)

for a character representation of the raw bit codes. rawToChar will try to convert the raw bit codes, which is not what you want in this case.

The resulting stringModel can be converted later on back to the original model by :

newSerialModel <- as.raw(as.hexmode(stringModel))
newModel <- unserialize(newSerialModel)
all.equal(model,newModel)
[1] TRUE

Regarding the writing of binary types to databases through RODBC : as for today, the vignette of RODBC reads (p.11) :

Binary types can currently only be read as such, and they are returned as column of class "ODBC binary" which is a list of raw vectors.

Sign up to request clarification or add additional context in comments.

4 Comments

I already have a similar solution in order to convert the object into a string and store it in a CLOB. My question aimed the problem if there was a possibility to store the binary representation directly (in a BLOB). The conversions take lots of time. So I'm looking for a solution the save objects efficiently in the database. This is what you proposed in one line:unserialize(as.raw(as.hexmode(strsplit(paste(serialize(model, NULL), collapse=" "), " ")[[1]]))) Most of the time is spent in paste() and strsplit() – I rather want a solution that makes directly use of the binary representation.
@Thomas : what I propose is store the stringModel. I really don't see where the paste and strsplit is coming from. That's not needed at all. You asked for a character form of the binary representation, and that is stringModel.
With the ODBC-command sqlQuery(conn, paste("INSERT INTO bla VALUES ('", stringModel, "')", sep="")) only the first value of the string vector is stored, therefore I do need some logic to create a single string out of it - and of course a way for the unserialization of the string...
@Thomas : true, didn't think of that. So now I'm secretly hoping you'll get so frustrated that you implement the BLOB in RODBC yourself... ;)
4

A completely different approach would be to simply store the output of capture.output(dput(model)) along with a descriptive name and then reconstitute it with <- or assign(). See comments below regarding the need for capture.output().

> dput(Mat1)
structure(list(Weight = c(7.6, 8.4, 8.6, 8.6, 1.4), Date = c("04/28/11", 
"04/29/11", "04/29/11", "04/29/11", "05/01/11"), Time = c("09:30 ", 
"03:11", "05:32", "09:53", "19:52")), .Names = c("Weight", "Date", 
"Time"), row.names = c(NA, -5L), class = "data.frame")
> y <- capture.output(dput(Mat1))
> y <- paste(y, collapse="", sep="")  # Needed because capture output breaks into multiple lines
> dget(textConnection(y))
  Weight     Date   Time
1    7.6 04/28/11 09:30 
2    8.4 04/29/11  03:11
3    8.6 04/29/11  05:32
4    8.6 04/29/11  09:53
5    1.4 05/01/11  19:52
> new.Mat <- dget(textConnection(y))

11 Comments

Nice solution as well. But you can't flex your R-muscles by digging up obscure hexmode functions.
I admit I was impressed with your command of the internals, Joris.
dput() results in another complex object that cannot be sent via INSERT INTO query to the database. And I don't want to store objects on the hard disk.
I thought it would result in a text object, but apparently it has a side-effect of printing and invisibly returns its argument. Will amend my answer,
Could you please give me a minimal example how the reconstitution should work? I see that dput() returns a description (as string) how the object can be created, but how do I retrieve my object with this string? Using <- or assign just gives me the string representation and not my object...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.