0

I'm migrating data from Mongo to Arango and I need to reproduce a $group aggregation. I have successfully reproduced the results but I'm concerned that my approach maybe sub-optimal. Can the AQL be improved?

I have a collection of data that looks like this:

{
    "_id" : ObjectId("5b17f9d85b2c1998598f054e"),
    "department" : [ 
        "Sales", 
        "Marketing"
    ],
    "region" : [ 
        "US", 
        "UK"
    ]
}

{
    "_id" : ObjectId("5b1808145b2c1998598f054f"),
    "department" : [ 
        "Sales", 
        "Marketing"
    ],
    "region" : [ 
        "US", 
        "UK"
    ]
}

{
    "_id" : ObjectId("5b18083c5b2c1998598f0550"),
    "department" : "Development",
    "region" : "Europe"
}

{
    "_id" : ObjectId("5b1809a75b2c1998598f0551"),
    "department" : "Sales"
}

Note the value can be a string, Array or not present

In Mongo I'm using the following code to aggregate the data:

db.test.aggregate([
{
    $unwind:{
        path:"$department",
        preserveNullAndEmptyArrays: true
    }
},
{
    $unwind:{
        path:"$region",
        preserveNullAndEmptyArrays: true
    }
},
{
    $group:{
        _id:{
            department:{ $ifNull: [ "$department", "null" ] },
            region:{ $ifNull: [ "$region", "null" ] },
        },
        count:{$sum:1}
    }
}
])

In Arango I'm using the following AQL:

FOR i IN test
    LET FIELD1=(FOR a IN APPEND([],NOT_NULL(i.department,"null")) RETURN a)
    LET FIELD2=(FOR a IN APPEND([],NOT_NULL(i.region,"null")) RETURN a)

    FOR f1 IN FIELD1
        FOR f2 IN FIELD2
            COLLECT id={department:f1,region:f2} WITH COUNT INTO counter

            RETURN {_id:id,count:counter}

Edit: The APPEND is used to convert string values into an Array

Both produce results that look like this;

{
    "_id" : {
        "department" : "Marketing",
        "region" : "US"
    },
    "count" : 2.0
}

{
    "_id" : {
        "department" : "Development",
        "region" : "Europe"
    },
    "count" : 1.0
}

{
    "_id" : {
        "department" : "Sales",
        "region" : "null"
    },
    "count" : 1.0
}

{
    "_id" : {
        "department" : "Marketing",
        "region" : "UK"
    },
    "count" : 2.0
}

{
    "_id" : {
        "department" : "Sales",
        "region" : "UK"
    },
    "count" : 2.0
}

{
    "_id" : {
        "department" : "Sales",
        "region" : "US"
    },
    "count" : 2.0
}

1 Answer 1

2

Your approach seems alright. I would suggest to use TO_ARRAY() instead of APPEND() to make it easier to understand though.

Both functions skip null values, thus it is unavoidable to provide some placeholder, or test for null explicitly and return an array with a null value (or whatever works best for you):

FOR doc IN test
    FOR field1 IN doc.department == null ? [ null ] : TO_ARRAY(doc.department)
    FOR field2 IN doc.region == null ? [ null ] : TO_ARRAY(doc.region)
    COLLECT department = field1, region = field2
    WITH COUNT INTO count
        RETURN { _id: { department, region }, count }

Collection test:

[
  {
    "_key": "5b17f9d85b2c1998598f054e",
    "department": [
      "Sales",
      "Marketing"
    ],
    "region": [
      "US",
      "UK"
    ]
  },
  {
    "_key": "5b18083c5b2c1998598f0550",
    "department": "Development",
    "region": "Europe"
  },
  {
    "_key": "5b1808145b2c1998598f054f",
    "department": [
      "Sales",
      "Marketing"
    ],
    "region": [
      "US",
      "UK"
    ]
  },
  {
    "_key": "5b1809a75b2c1998598f0551",
    "department": "Sales"
  }
]

Result:

[
  {
    "_id": {
      "department": "Development",
      "region": "Europe"
    },
    "count": 1
  },
  {
    "_id": {
      "department": "Marketing",
      "region": "UK"
    },
    "count": 2
  },
  {
    "_id": {
      "department": "Marketing",
      "region": "US"
    },
    "count": 2
  },
  {
    "_id": {
      "department": "Sales",
      "region": null
    },
    "count": 1
  },
  {
    "_id": {
      "department": "Sales",
      "region": "UK"
    },
    "count": 2
  },
  {
    "_id": {
      "department": "Sales",
      "region": "US"
    },
    "count": 2
  }
]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, it's nice to know I was on the right track. Your version was much cleaner though!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.