1

Let's say I have two tables (I'm trying to remove everything irrelevant to the question from the tables and make some sample ones, so bear with me :)

    ___________________         ________________________
    |File             |         |Content               |
    |_________________|         |______________________|
    |ID Primary Key   | 1     * |ID Primary Key        |
    |URL Varcher(255) |---------|FileID Foreign Key    |
    |_________________|         |    ref File(ID)      |
                                |FileContent Text      |
                                |______________________|

A File has a url. There may be many Content items corresponding to each File.

I need to create a query using these tables that I'm having some trouble with. I essentially want the query, in simple terms, to say:

"Select the file URL and the sum of the times substring "X" appears in all content entries associated with that file."

I'm pretty good with SQL selects, but I'm not so good with aggregate functions and it's letting me down. Any help is greatly appreciated :)

2 Answers 2

1

The query won't be efficient but might give you a hint:

SELECT url, cnt
FROM (
  SELECT
    f.id,
    IFNULL(
      SUM(
        (LENGTH(c.text) - LENGTH(REPLACE(c.text, f.url, '')))/LENGTH(f.url)
      ),
      0
    ) as cnt
  FROM file c
  JOIN content c ON f.id = c.fileid
  GROUP BY f.id
) cnts JOIN file USING(id);

To append files that do not have a match in the content table you can UNION ALL the rest of use LEFT JOIN in the cnts subquery.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the solution - sorry for not marking an answer yet but I won't have time to try this till the weekend, busy work week :)
1

This solution attempts to use REGEXP to match the substring. REGEXP returns 1 if it matches, 0 if not, so SUM() them up for the total. REGEXP might seem like overkill, but would allow for more complicated matching than a simple substring.

SELECT
  File.ID,
  File.URL,
  SUM(Content.FileContent REGEXP 'substring') AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
GROUP BY File.ID, File.URL;

The easier method if a more complex match pattern won't ever be needed uses LIKE and COUNT(*) instead of SUM():

SELECT
  File.ID,
  File.URL,
  COUNT(*) AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
WHERE Content.FileContent LIKE '%substring%'
GROUP BY File.ID, File.URL;

Note the use of LEFT JOIN, which should produce 0 when there are not actually any entries in Content.

1 Comment

Thanks for the solution - sorry for not marking an answer yet but I won't have time to try this till the weekend, busy work week :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.