0

I work with a database (of which I am not the DBA) that has character columns of length greater than the actual data.

Is it possible to automatically strip trailing whitespace when fetching data with DBI::dbGetQuery()? (i.e. something similar to utils::read.table(*, strip.white = TRUE))

# connect
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# generate fake data
mytable <- data.frame(x = 1, y = LETTERS[1:3], z = paste(LETTERS[1:3], "   "))
dbWriteTable(con, "mytable", mytable)

# fetch data
(a <- dbGetQuery(con, "select * from mytable"))
#   x y     z
# 1 1 A A    
# 2 1 B B    
# 3 1 C C    

# trailing space are kept
sapply(a, nchar)
#      x y z
# [1,] 1 1 5
# [2,] 1 1 5
# [3,] 1 1 5

I hope I can avoid something like:

idx <- sapply(a, is.character)
a[idx] <- lapply(a[idx], trimws, which = "left", whitespace = "[ ]")
sapply(a, nchar)
#      x y z
# [1,] 1 1 1
# [2,] 1 1 1
# [3,] 1 1 1

If not, is it a good approach?

2
  • 1
    you could "trim" the data inside sqllite: sqlitetutorial.net/sqlite-functions/sqlite-trim Commented Jan 30, 2023 at 17:09
  • 1
    You can define your own function which calls dbGetQuery and then trims each character column so that from then on it is no harder than calling dbGetQuery. Commented Jan 30, 2023 at 17:20

1 Answer 1

1

As long as you're using select *, there is nothing SQL is going to do for this. If you select them by-name (which is a "best practice" and in many areas the industry-standard), you can use TRIM:

sqldf::sqldf("select x, y, trim(z) as z from mytable") |>
  str()
# 'data.frame': 3 obs. of  3 variables:
#  $ x: num  1 1 1
#  $ y: chr  "A" "B" "C"
#  $ z: chr  "A" "B" "C"

There are also rtrim and ltrim for limiting which side of the string you trim trailing/leading blank space.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your answer and your proactivity on the r DBI thread in general!
For the sake of clarity, can you confirm whether there is no nothing included in DBI::dbGetQuery (or other DBI function) to do this automatically at fetch-time? (i.e. similarly to utils::read.table(*, strip.white = TRUE) for csv data)
I believe the post-processing of retrieved data is limited (at most) to type/class work, nothing that changes the content of values (such as this).
There is a mention (in github.com/r-dbi/DBI/blob/…) of automatically stripping excessive blanks, but I know of no implementation of that (which was last updated/changed in 2015), and searching the repo reveals nothing for trim or strip.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.