0

I am an MPH student taking an INTRO to data science class writing my second SQL code. I have created a csv file. One of the columns is a string.

title      author
------------------------------
Boo        'Harry, Joe, Nancy'
Engine     'Harry, Mike, Sue'
Pig        'Amy, Kelly, Bob'

I have uploaded this csv file into postgreSQL using python.

I need to create a query where the user enters an author name, it searches the database table, and returns all the titles by that author. There are actually 3 author columns (last, first and MI) and all three are strings with multiple authors, but I haven't tackled that problem yet.

If someone enters Harry, it should give me back the rows with Boo and Engine.

I've tried:

SELECT 
    title
FROM 
    table_name
WHERE 
    author = "Harry"

but it says Harry doesn't exist. I've also tried using 'Harry' and I got the column, but nothing in it.

I'm assuming I need to split the column with names into a list of strings? where each name is it's own string? To get:

title      author
----------------------------------
Boo        'Harry', 'Joe', 'Nancy'
Engine     'Harry', 'Mike', 'Sue'
Pig        'Amy', 'Kelly', 'Bob'

I tried using

SELECT 
    title
    authors
FROM 
    table_name
    CROSS APPLY STRING_SPLIT(authors, ',');

This says that there is a syntax error:

ERROR:  syntax error at or near "APPLY"
LINE 5:  CROSS APPLY STRING_SPlIT(author_last, ',');

I got the code from http://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/

A point in the right direction is appreciated. Thank you.

UPDATE: I Used Tim's suggestion and tried

SELECT 
    article_title, author_last
FROM paper_table
WHERE author_last ~ '\yHarry\y';

and got this pic of query I think it means it ran, but didn't find anything?

UPDATE2- Using Metal's suggstion: picture of query And I know it's there bc I can see it. pic of table

UPDATE3- using Metal's second suggestion. pic of second query

5
  • there is no string_split in postgresql Commented Dec 12, 2019 at 5:25
  • well that would explain why it didn't work. head-slap Commented Dec 12, 2019 at 5:41
  • You shouldn't be storing comma separated values in a single column to begin with. That's not how relational databases work. It violates first normal form, if you properly create a one-to-many relationship between a book and the authors you will have a lot less trouble in the long run Commented Dec 12, 2019 at 6:51
  • @a_horse_with_no_name Thank you for the suggestion. I have heard of a on-to-many relationship, but do not know how to create one. I started with a panadas dataframe that I created from a webscraper for pubmed. I had a hard time pulling out the needed information so I used for loops to parse through the XML file that it downloaded. I'm realizing it's bad from, but my class didn't teach me otherwise and I couldn't get the Entrez or Medline parser to work. Commented Dec 12, 2019 at 15:32
  • @marc_s Hi. Thank you for the edit. I tried to put the code in like that the first time, but it wouldn't let me post the question. It said the indents were wrong. I'm curious how to you got it to work. Thanks. Commented Dec 12, 2019 at 19:09

3 Answers 3

1

You can use like operator.

select title from table_name where author like '%Harry,%' or author like '%,Harry%'

Above solution is to be assumed you want those comma separated list that contains 'Harry'. I would suggest below solution. Which will work same as string_split function in sql server.

select t.title 
from table_name t, unnest(string_to_array(t.author , ',')) s(val)
where trim(s.val) = 'Harry'

See dbFiddle.

Sign up to request clarification or add additional context in comments.

9 Comments

this didn't work either. I got the same outcome. It gives me the columns names, but no data.
@pandora, try again
@Metal Yep. The last 2 pictures are the update. One is trying your suggestion again, and the other is the table showing the author's last name in the column. If I made a mistake anywhere... ?
Your logic is off, and would match something like Harry Potter should that name appear. You need to check for word boundaries on both sides of the name to search.
@TimBiegeleisen,Harry Potter is still valid result
|
0

You don't need to unnest the array, you can use a regex to split the string and get rid of the whitespace by that.

SELECT title, author
FROM the_table
WHERE 'Harry' = any(regexp_split_to_array(author, '\s*,\s*'))

1 Comment

This didn't split the string, the whole thing came up. But it did find the name, so i'm happy with that for now. Thank you again.
0

You should seriously try to avoid storing CSV data in your SQL tables, as it would generally cause you trouble when you try to query (as in this question).

You could try searching for an input name using regex, with the name being surrounded by word boundaries:

SELECT title, author
FROM table_name
WHERE author ~ '\yHarry\y';

screen capture of demo below

Demo

1 Comment

Thank you. I got the above picture. I'm not sure what you mean about not using csv files. I built a web scraper for the HW that scraped data off of pubmed into a pandas data frame. That was saved onto my drive in a csv. In a new module, I uploaded the csv back into a pandas data frame and transferred that into the postgreSQL using pd.to_sql function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.