0

I am reading a big csv (>1GB big for me!). It contains a timestamp field. I read it (100 rows to start with ) with fread from the excellent data.table package.

ddfr <- fread(input="~/file1.csv",nrows=100, header=T)

Problem 1 (RESOLVED): the timestamp fields (called "ts" and "update"), e.g. "02/12/2014 04:40:00 AM" is converted to string. I convert the fields back to timestamp with lubridate package mdh_hms. Splendid.

ddfr$ts <- data.frame( mdy_hms(ddfr$ts))

Problem 2 (NOT RESOLVED): The timestamp is created with time zone as per POSIXlt.

How do I create in R a timestamp with NO TIME ZONE? is it possible??

Now I use another (new) great package, PivotalR to write the dataframe to PostGreSQL 9.3 using as.db.data.frame. It works as a charm.

x <- as.db.data.frame(ddfr, table.name= "tbl1", conn.id = 1) 

Problem 3 (NOT RESOLVED): As the original dataframe timestamp fields had time zones, a table is created with the fields "timestamp with time zone". Ultimately the data needs to be stored in a table with fields configured as "timestamp without time zone".

But in my table in Postgres the data is stored as "2014-02-12 04:40:00.0", where the .0 at the end is the UTC offset. I think I need to have "2014-02-12 04:40:00".

I tried

ALTER TABLE tbl ALTER COLUMN ts type timestamp without time zone;

Then I copied across. While Postgres accepts the ALTER COLUMN command, when I try to copy (using INSERT INTO tbls SELECT ...) I get an error:

   "column "ts" is of type timestamp without time zone but expression is of type text
  Hint: You will need to rewrite or cast the expression."

Clearly the .0 at the end is not liked (but why then Postgres accepts the ALTER COLUMN? boh!).

I tried to do what the error suggested using CAST in the INSERT INTO query:

INSERT INTO tbl2 SELECT CAST(ts as timestamp without time zone) FROM tbl1

But I get the same error (including the suggestion to use CAST aargh!)

The table directly created by PivotalR (based on the dataframe) has this CREATE script:

CREATE TABLE tbl2
(
  businessid integer,
  caseno text,
  ts timestamp with time zone
 )
WITH (
  OIDS=FALSE
);
ALTER TABLE tbl1
  OWNER TO mydb;

The table I'm inserting into has this CREATE script:

CREATE TABLE tbl1
(
  id integer NOT NULL DEFAULT nextval('bus_seq'::regclass),
  businessid character varying,
  caseno character varying,
  ts timestamp without time zone,
  updated timestamp without time zone,
  CONSTRAINT busid_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE tbl1
  OWNER TO postgres;

My apologies for the convoluted explanation, but potentially a solution could be found at any step in the chain, so I preferred to put all my steps in one question. I am sure there has to be a simpler method...

5
  • Please show the whole SQL statements you ran, including their parameters, not just chunks of them. Edit the question to add the info and comment here when done. Commented Feb 21, 2014 at 11:24
  • Done! It definitely reads better. Thanks. Commented Feb 21, 2014 at 12:15
  • The error you report makes extremely little sense with the SQL you show. Are you sure it's the same error when you use CAST, not a different one? If so, show your table definitions - \dt tbl1 and \dt tbl2 in psql. Commented Feb 21, 2014 at 14:12
  • I added the table definitions. Commented Feb 21, 2014 at 14:27
  • The table definitions do not match the error message. I get ERROR: column "businessid" is of type integer but expression is of type timestamp without time zone, which makes a lot more sense. What's your PostgreSQL version? Commented Feb 21, 2014 at 14:29

1 Answer 1

2

I think you're confused about copying data between tables.

INSERT INTO ... SELECT without a column list expects the columns from source and destination to be the same. It doesn't magically match up columns by name, it'll just assign columns from the SELECT to the INSERT from left to right until it runs out of columns, at which point any remaining cols are assumed to be null. So your query:

INSERT INTO tbl2 SELECT ts FROM tbl1;

isn't doing this:

INSERT INTO tbl2(ts)  SELECT ts FROM tbl1;

it's actually picking the first column of tbl2, which is businessid, so it's really attempting to do:

INSERT INTO tbl2(businessid)  SELECT ts FROM tbl1;

which is clearly nonsense, and no casting will fix that.

(Your error in the original question doesn't match your tables and queries, so the details might be different as you've clearly made a mistake in mangling/obfuscating your tables or posted a newer version of the tables than the error. The principle remains.)

It's generally a really bad idea to assume your table definitions won't change and column order won't change anyway. So always be explicit about columns. In this case I think your intention might have actually been:

INSERT INTO tbl2(businessid, caseno, ts) 
SELECT CAST(businessid AS integer), caseno, ts
FROM tbl1;

Note the cast, because the type of businessid is different between the two tables.

Sign up to request clarification or add additional context in comments.

1 Comment

You are correct! My mistake was to loose sight of the alignment between the tables (identical bar the sequence id field). Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.