2

I am doing a PoC to find out if a graph database will fit our needs.

We have a survey in which respondents have answered questions. We want to group these respondents using every possible combination of answers out of a set of (mostly two or three) questions.

I have the following nodes and relationships

(:Question)-[:HasAnswer]->(:Answer)
(:Respondent)-[:Answered]->(:Answer)
(:Answer)-[:BelongsTo]->(:WeightingGroup)

In which:

  • Question: the question
  • Answer: A possible answer to a question
  • Respondent: A person that answered questions
  • Answered: The relation between a respondent that answered a question
  • WeightingGroup: A group of answers that form unique combinations of given answers
  • BelongsTo: The relationship between an answer and a weightinggroup to form groups of answers.

My goal is to receive a result like this:

/----------------------------------------------\
| Q1                 | Q2                 | n  |
|--------------------+-------------------------|
| Answer1            | Answer1            | 23 | 
| Answer1            | Answer2            | 12 | 
| Answer1            | Answer3            | 54 | 
| Answer2            | Answer1            | 65 | 
| Answer2            | Answer1            |  5 | 
| Answer2            | Answer1            | 15 | 
\--------------------+--------------------+----/

or:

/-------------------------\
| Q1, Q2             | n  |
|--------------------+----|
| Answer1, Answer2   | 23 | 
| Answer1, Answer2   | 12 | 
| Answer1, Answer3   | 54 | 
| Answer2, Answer1   | 65 | 
| Answer2, Answer1   |  5 | 
| Answer2, Answer1   | 15 | 
\--------------------+----/

Where n is the number of respondents that gave both answers.

However, when I run this query:

// Aantal antwoorden per wegingsgroep
match (w:WeightingGroup)-[]->(a:Answer)<-[:Answered]-(r:Respondent)
with w, collect(distinct a.Text) as answers, count(distinct r) as n
return answers, w.Weight, n

It seems to be returning n = the number of respondents that answered answer1 OR answer2.

How do get the count of Respondents that gave answer1 AND answer2?

Thanks in advance!

3
  • If I understood your requirement correctly just add "WHERE length(answers) > 1" clause to your query. Commented Feb 28, 2014 at 16:28
  • please put a small sample dataset on console.neo4j.org and share it Commented Mar 1, 2014 at 9:07
  • I am working on a sample dataset Commented Mar 6, 2014 at 13:51

1 Answer 1

3

How about something like this: collect answers per group, collect answers per respondent, filter on where all answers for a respondent are present in the answers for a group. I think then the count on respondent should be ok (you'll have to extract the answer texts since the answers are already collected.)

MATCH (g:WeightingGroup)<-[:BelongsTo]-(a)
WITH g, collect(a) as apg
MATCH (r:Respondent)-[:Answered]->(a)
WITH g, apg, r, collect(a) as apr
WHERE ALL(a IN apr WHERE a IN apg)
RETURN g.Weight, EXTRACT(a IN apg | a.Text), count(r) as n
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer! And sorry for the late response (carnaval came in between ;)) I tried this query but it crashed neo4J (taking all CPU and 2GB memory). This probably has to do with the fact that the database contains over 600.000 given answers.
Is that 600k possible answers across your questions, and every combination of them is an answer group? I think the query works but you may need to restructure the data or do the analysis in steps.
Sorry, I meant 6,000,000 given answers. I tried your query on a sample dataset and it did give me the desired results so will mark it as accepted answer :) My Neo4j community however now refuses to start at all so I guess I seriously broke something :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.