0

I'm trying to create a MongoDB database that contains two collections: Students and Courses.

The first collection "students" contains:

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database

student = [{"_id":"0",
          "firstname":"Bert",
           "lastname":"Holden"},
           {"_id":"1",
            "firstname":"Sam",
            "lastname":"Olsen"},
           {"_id":"2",
            "firstname":"James",
            "lastname":"Swan"}]


students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())

The second collection "courses" contains:

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database

course = [{"_id":"10",
           "coursename":"Databases",
           "grades":"[{student_id:0, grade:83.442}, {student_id:1, grade:45.323}, {student_id:2, grade:87.435}]"}]




courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())

I then want to use aggregation to find a student and the corresponding courses with grade(s).

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client["Database"]

pipeline = [
    {
        "$lookup": {
            "from": "courses",
            "localField": "_id",
            "foreignField": "student_id",
            "as": "student_course"
            }
    },
    {
        "$match": {
            "_id": "0"
        }
    }
]

pprint.pprint(list(db.students.aggregate(pipeline)))

I'm not sure if the student_id/grade is implemented correctly in the "courses" collection, so that might be one reason why my arregation returns [].

The aggregate works if I create seperate courses for each student, but that seems like a waste of memory, so I would like to have one course with all the student_ids and grades in an array.

Expected output:

 [{'_id': '0',
  'firstname': 'Bert',
  'lastname': 'Holden',
  'student_course': [{'_id': '10',
                      'coursename': 'Databases',
                      'grade': '83.442',
                      'student_id': '0'}]}]
3
  • 1
    Why hold the grades as a string and not an object? Commented Oct 6, 2021 at 17:29
  • I'm not sure. Would you care to explain? :) Commented Oct 6, 2021 at 18:38
  • Sorry, I wasn't able to take a deeper look until now. I have posted an answer (even though you already found a working answer). Commented Oct 6, 2021 at 21:30

2 Answers 2

1

A couple of points worth mentioning..

  1. Your example code in file "courses.py" is inserting grades as a string that represents an array, not an actual array. This was pointed out by Matt in the comments, and you requested an explanation. Here is my attempt to explain - if you insert a string that looks like an array you cannot perform $unwind, or $lookup on sub-elements because they aren't sub-elements, they are part of a string.
  2. You have array data in courses that hold students grades, which are the datapoints that are desired, but you start the aggregation on the student collection. Instead, perhaps change your perspective a bit and come at it from the courses collections instead of the student perspective. If you do, you will may re-qualify the requirement as - "show me all courses and student grades where student id is 0".
  3. Your array data seems to have a datatype mismatch. The student id is an integer in your string variable "array", but the student collection has the student id as a string. Need to be consistent for the $lookup to work properly (if not wanting to perform a bunch of casting).

But, nonetheless, here is a possible solution to your problem. I have revised the python code, including a redefinition of the aggregation...

The name of my test database is pythontest as seen in this code example. This database must exist prior to running the code else an error.

File students.py

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest

student = [{"_id":"0",
          "firstname":"Bert",
           "lastname":"Holden"},
           {"_id":"1",
            "firstname":"Sam",
            "lastname":"Olsen"},
           {"_id":"2",
            "firstname":"James",
            "lastname":"Swan"}]


students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())

Then the courses file. Notice the field grades is no longer a string, but is a valid array object? Notice the student id is a string, and not an integer? (In reality, a stronger datatype such as UUID or int would likely be preferable).

File courses.py

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest

course = [{"_id":"10",
           "coursename":"Databases",
           "grades": [{ "student_id": "0", "grade": 83.442}, {"student_id": "1", "grade": 45.323}, {"student_id": "2", "grade": 87.435}]}]


courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())

... and finally, the aggregation file with the changed aggregation pipeline...

File aggregation.py

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest

pipeline = [
    { "$match": { "grades.student_id": "0" } },
    { "$unwind": "$grades" },
    { "$project": { "coursename": 1, "student_id": "$grades.student_id", "grade": "$grades.grade" } },
    {
        "$lookup":
        {
            "from": "students",
            "localField": "student_id",
            "foreignField": "_id",
            "as": "student"
        }
    },
    {
        "$unwind": "$student"
    },
    { "$project": { "student._id": 0 } },
    { "$match": { "student_id": "0" } }
]

pprint.pprint(list(db.courses.aggregate(pipeline)))

Output of running program

> python3 aggregation.py
[{'_id': '10',
  'coursename': 'Databases',
  'grade': 83.442,
  'student': {'firstname': 'Bert', 'lastname': 'Holden'},
  'student_id': '0'}]

The format of the data at the end of the program may not be as desired, but can be tweaked by manipulating the aggregation.

** EDIT **

So if you want to approach this aggregation from the student rather than approaching it from the course you can still perform that aggregation, but because the array is in courses the aggregation will be a bit more complicated. The $lookup must utilize a pipeline itself to prepare the foreign data structures:

Aggregation from Student perspective

db.students.aggregate([
{ $match: { _id: "0" } },
{ $addFields: { "colStudents._id": "$_id" } },
{
    $lookup:
    {
        from: "courses",
        let: { varStudentId: "$colStudents._id"},
        pipeline:
        [
            { $unwind: "$grades" },
            { $match: { $expr: { $eq: ["$grades.student_id", "$$varStudentId" ] } } },
            { $project: { course_id: "$_id", coursename: 1, grade: "$grades.grade", _id: 0} }
        ],
        as: "student_course"
    }
},
{ $project: { _id: 0, student_id: "$_id", firstname: 1, lastname: 1, student_course: 1 } }
])

Output

> python3 aggregation.py
[{'firstname': 'Bert',
  'lastname': 'Holden',
  'student_course': [{'course_id': '10',
                      'coursename': 'Databases',
                      'grade': 83.442}],
  'student_id': '0'}]
Sign up to request clarification or add additional context in comments.

2 Comments

Hello! This works perfectly. Thank you! I actually tried that exact aggregation, but when my courses.py file wasn't done correctly, it gave me no output. And that's exactly why I came here, as I didn't know if my aggregation or my collection(s) was faulty. Thanks again for a thorough explanation! :D
@MadVags -thanks for the points! I have added an update to the post. I have added the aggregation if approaching from the student perspective as well. Its a bit more complicated because the array is in the foreign table, but hopefully it will make sense...
1

I was finally able to take a look at this..

TLDR; see Mongo Playground

This solution requires you to store grades as an actual object vs a string.

Consider the following database structure:

db={
  // Collection
  "students": [
    {
      "_id": "0",
      "firstname": "Bert",
      "lastname": "Holden"
    },
    {
      "_id": "1",
      "firstname": "Sam",
      "lastname": "Olsen"
    },
    {
      "_id": "2",
      "firstname": "James",
      "lastname": "Swan"
    }
  ],
  // Collection
  "courses": [
    {
      "_id": "10",
      "coursename": "Databases",
      "grades": [
        {
          student_id: "0",
          grade: 83.442
        },
        {
          student_id: "1",
          grade: 45.325
        },
        {
          student_id: "2",
          grade: 87.435
        }
      ]
    }
  ],
}

You can achieve what you want using the following query:

db.students.aggregate([
  {
    $match: {
      _id: "0"
    }
  },
  {
    $lookup: {
      from: "courses",
      pipeline: [
        {
          $unwind: "$grades"
        },
        {
          $match: {
            "grades.student_id": "0"
          }
        },
        {
          $group: {
            "_id": "$_id",
            "coursename": {
              $first: "$coursename"
            },
            "grade": {
              $first: "$grades.grade"
            },
            "student_id": {
              $first: "$grades.student_id"
            }
          }
        }
      ],
      as: "student_course"
    }
  }
])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.