1

With SparkR 1.4.1 when working with a data frame of structure:

printSchema(dta)
root
 |-- date: timestamp (nullable = true)
 |-- valA: float (nullable = true)
 |-- valB: float (nullable = true)
 |-- ...

I would like to convert all of the existing columns to strings, without explicitly referring to each column by name.

Desired approach

Desired approach would loop over all columns:

# Quickly creating new data frame
dtaTmp <- select(dta, "date")

# Looping through each column of old data frame and adding string equivalent
# to a newly created data frame
for (i in seq_along(columns(dtaTmp))) {
    print(i)
    x  <- cast(eval(parse(text = paste(sep = "$", "dtaTmp", columns(dtaTmp)[i]))), 
           "string")
    dtaTmp <- withColumn(dtaTmp, (columns(dtaTmp)[i], x)
}

This fails with error: returnStatus == 0 is not TRUE. In effect I'm looking for a solution that would enable me to run equivalent of sapply(mtcars, as.character) on SparkR data frame.

Desired results

a new data frame should be of structure:

printSchema(desiredDta)
root
 |-- date: string(nullable = true)
 |-- valA: string(nullable = true)
 |-- valB: string(nullable = true)
 |-- ...
0

1 Answer 1

1

You're hitting a bug in 1.4 branch where withColumn retains duplicated column names. The simplest solution is to use a single select with a list of columns:

select(df, lapply(columns(df), function(x) alias(cast(df[[x]], "string"), x)))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.