2

I'm working with Big Query's Hacker News dataset, and was looking at which urls have the most news stories. I'd also like to strip the domain names out, and see which of those have the most news stories. I'm working in R, and am having a bit of trouble getting the follow query to work.

# Select the ten domains that have the most stories
sql_domain <- "SELECT url REPLACE(CASE WHEN REGEXP_CONTAINS(url, '//') 
                        THEN url ELSE CONCAT('http://', url) END, '&', '?') as domain_name,
                      COUNT(domain_name) as story_number
                FROM `bigquery-public-data.hacker_news.full`
                WHERE type = 'story'
                GROUP BY domain_name
                ORDER BY story_number DESC
                LIMIT 10"

I don't need to strip the top-level domain; for example, stackoverflow isn't required, stackoverflow.com is fine. Your help is greatly appreciated!

2
  • Maybe you want one of the net functions? cloud.google.com/bigquery/docs/reference/standard-sql/… Commented Oct 3, 2018 at 15:08
  • @ElliotBrossard Very elegant! I'm trying: sql_domain_ag <- "SELECT NET.REG_DOMAIN(url) as domain_name, COUNT(domain_name) as story_number And am now getting "Error: Unrecognized name: domain_name at [2:29] [invalidQuery]" so I must be calling the function improperly, or something. Commented Oct 3, 2018 at 15:22

1 Answer 1

4

The problem is in your query - you should use as below (for BigQuery Standard SQL)

SELECT 
  NET.REG_DOMAIN(url) AS domain_name,
  COUNT(NET.REG_DOMAIN(url)) AS story_number
FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story'
GROUP BY 1
ORDER BY story_number DESC
LIMIT 10   

this will give you something like below

Row domain_name     story_number     
1   github.com      81784    
2   medium.com      71953    
3   youtube.com     58119    
4   blogspot.com    52925    
5   nytimes.com     48986    
6   techcrunch.com  43924    
7   google.com      26326    
8   wordpress.com   23372    
9   arstechnica.com 23162    
10  wired.com       18480    
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.