2

I'm just getting started with Google BigQuery, and have run into issues with my very first query. I'm trying to get a list of Stack Overflow posts since and including 2015-01-01 which have one of several tags. Below is my first pass at the query:

#standardSQL

SELECT
 title,
 body,
 answer_count,
 creation_date,
 tags,
 view_count 

FROM
 `bigquery-public-data.stackoverflow.posts_questions` limit 10

WHERE
 creation_date >= "2015-01-01" AND tags HAVING "terraform" OR "chef" OR "puppet" OR "ansible"

The BigQuery validator is showing the following error message:

Error: Syntax error: Unexpected keyword WHERE at [14:1]

3 Answers 3

2

You have a few syntax errors, namely the limit 10 in the wrong place, and using the HAVING keyword incorrectly. I'd also use native timestamp instead of comparing strings:

#standardSQL
SELECT
  title,
  body,
  answer_count,
  creation_date,
  tags,
  view_count
FROM
  `bigquery-public-data.stackoverflow.posts_questions`
WHERE
  creation_date >= TIMESTAMP('2015-01-01')
  AND tags IN ('terraform',
    'chef',
    'puppet',
    'ansible')
LIMIT
  10
Sign up to request clarification or add additional context in comments.

Comments

0

There are a few issues here, but hopefully this will help:

  • LIMIT needs to come last, after the WHERE clause.
  • While HAVING is valid in some contexts--namely as a post-aggregation (i.e. after GROUP BY) filter, it doesn't have the meaning that I think you want here.

With that said this query might be what you want:

#standardSQL
SELECT
  title,
  body,
  answer_count,
  creation_date,
  tags,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE creation_date >= "2015-01-01" AND
  EXISTS (
    SELECT 1 FROM UNNEST(SPLIT(tags, "|")) AS tag
    WHERE tag IN ("terraform", "chef", "puppet", "ansible")
  )
LIMIT 10;

Note that I needed to use SPLIT with the tags column because the tags are separated by the pipe character. Since you get a terabyte of querying for free, try to make the most of it by getting all the results at once rather than using the LIMIT, too.

Comments

-1

SELECT
usertype, CONCAT(start_station_name, " to ",end_station_name) AS route, COUNT(*) as num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM bigquery-public-data.new_york_citibike.citibike_trips GROUP BY start_station_name, end_station_name,usertype, ORDER BY num_trips DESC LIMIT 10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.