1

I want to create a CSV file which contains the results of query. This CSV file will live in Google Cloud Storage. (This query is around 15GB) I need it to be a single file. Is it possible, if so how?

CREATE OR REPLACE TABLE `your-project.your-dataset.chicago_taxitrips_mod` AS (
WITH
  taxitrips AS (
  SELECT
    trip_start_timestamp,
    trip_end_timestamp,
    trip_seconds,
    trip_miles,
    pickup_census_tract,
    dropoff_census_tract,
    pickup_community_area,
    dropoff_community_area,
    fare,
    tolls,
    extras,
    trip_total,
    payment_type,
    company,
    pickup_longitude,
    pickup_latitude,
    dropoff_longitude,
    dropoff_latitude,
    IF((tips/fare >= 0.2),
      1,
      0) AS tip_bin
  FROM
    `bigquery-public-data.chicago_taxi_trips.taxi_trips`
  WHERE
    trip_miles > 0
    AND fare > 0)
SELECT
  trip_start_timestamp,
  trip_end_timestamp,
  trip_seconds,
  trip_miles,
  pickup_census_tract,
  dropoff_census_tract,
  pickup_community_area,
  dropoff_community_area,
  fare,
  tolls,
  extras,
  trip_total,
  payment_type,
  company,
  tip_bin,
  ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickup_longitude,
        pickup_latitude), 0.1)) AS pickup_grid,
  ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropoff_longitude,
        dropoff_latitude), 0.1)) AS dropoff_grid,
  ST_Distance(ST_GeogPoint(pickup_longitude,
      pickup_latitude),
    ST_GeogPoint(dropoff_longitude,
      dropoff_latitude)) AS euclidean,
  CONCAT(ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickup_longitude,
          pickup_latitude), 0.1)), ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropoff_longitude,
          dropoff_latitude), 0.1))) AS loc_cross
FROM
  taxitrips
LIMIT
  100000000
  )
1
  • 1
    Unless you are directly executing the query and reading the data (bad design) you have no control over how many files BigQuery will generate in its write to Cloud Storage. BigQuery is designed for (excels at) parallelism, not single threads. 15 GB queries are fairly big, pay attention to costs. Commented May 15, 2019 at 4:31

2 Answers 2

1

If BigQuery needs to output multiple files, you can then concatenate them into a single one with a gsutil operation for files in GCS:

gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite

Note that there is a limit (currently 32) to the number of components that can be composed in a single operation.

Sign up to request clarification or add additional context in comments.

Comments

1

Exporting 15GB to a single CSV file is not possible (to multiple files is possible). I tried your same query (Bytes processed 15.66 GB) then tried to export it to a CSV file in GCS but failed with this error

Table gs://[my_bucket]/bq_export/test.csv too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data.

BQ Documentation only allows you to export up to 1 GB of table data to a single file. Since the table exceeds 1GB then you have to use a wildcard like:

gs://your-bucket-name/csvfilename*.csv

enter image description here

Not sure why would you like the export csv file to be in a single file but IMHO it's too large to be in a single file. writing it to multiple files will be a lot faster since BQ would use its parallelism to write the output using multiple threads.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.