0

in elasticsearch i got several hundred thousand documents with roughly this kind of structure:

{
  "script": "/index.html",
  "query": {
    "ab": "hello",
    "cd": "world",
    "ef": "123"
}

The url "http://localhost/index.html?ab=hello&cd=world&ef=123" is parsed into it. "script" only contains the path and the target script - no query at all. The query array does not contain the same list of keys and of course different values, which doesn't matter at the moment at all.

I know, i am able to get a distinct list of "script" with:

{
  "aggregations": {
    "my_agg": {
      "terms": {
        "field": "script.raw"
      }
    }
  }
}

which results into multiple buckets like

"buckets": [
{
    "key": "/index.html",
    "doc_count": 123456
},
{
    "key": "/hello.html",
    "doc_count": 1456
},
...

My question: Is there a way to get additionally a list and count of all query keys, which are occurring in the different urls?

Something like:

"buckets": [
{
    "key": "/index.html",
    "doc_count": 123456,
    "query_key_count": {
      "ab": 33456,
      "cd": 3456,
      "ef": 456,
      "gh": 56,
      "ij": 6
    }
},
{
    "key": "/hello.html",
    "doc_count": 1456,
    "query_key_count": {
      "zy": 156,
      "gh": 6
    }
},
...

Thanks alot!!

3
  • You mean, the query_key_count actually contains the number of occurrences of its keys among all items in your data. Say if you have 10 total objects, with 2 objects having "ab" in their query object, then you want the result to be query_key_count:{"ab":2 ... so on so forth}? Commented Mar 23, 2015 at 14:44
  • This should help you >>> stackoverflow.com/questions/26743204/… Commented Mar 23, 2015 at 14:57
  • Yes, if i have a index.html-doc with the params "ab" and "cd" and another index.html-doc with the params "cd" and "ef" with random values, i like to get a "query_key_count":{"cd": 2, "ab": 1, "ef": 1}. Thanks alot for the link - i will have a look! Commented Mar 23, 2015 at 15:08

1 Answer 1

0

To leverage Elasticsearch's strengths, you really need your documents to be structured something like this:

{
   "script": "/index.html",
   "query": [
      {
         "query_key": "ab",
         "query_val": "hello"
      },
      {
         "query_key": "cd",
         "query_val": "world"
      },
      {
         "query_key": "ef",
         "query_val": "123"
      }
   ]
}

If I set up a mapping with a nested type:

PUT /test_index
{
   "mappings": {
      "doc": {
         "properties": {
            "query": {
               "type": "nested",
               "properties": {
                  "query_key": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "query_val": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            },
            "script": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

and add a couple of docs:

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"script": "/index.html","query": [{"query_key":"ab", "query_val":"hello"},{"query_key":"cd", "query_val":"world"}, {"query_key":"ef", "query_val":"123"}]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"script": "/index.html","query": [{"query_key":"ab", "query_val":"foo"},{"query_key":"cd", "query_val":"bar"}, {"query_key":"gh", "query_val":"456"}]}

I can get back query keys in a nested terms aggregation:

POST /test_index/_search?search_type=count
{
   "aggs": {
      "resellers": {
         "nested": {
            "path": "query"
         },
         "aggs": {
            "query_keys": {
               "terms": {
                  "field": "query.query_key"
               }
            }
         }
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "resellers": {
         "doc_count": 6,
         "query_keys": {
            "buckets": [
               {
                  "key": "ab",
                  "doc_count": 2
               },
               {
                  "key": "cd",
                  "doc_count": 2
               },
               {
                  "key": "ef",
                  "doc_count": 1
               },
               {
                  "key": "gh",
                  "doc_count": 1
               }
            ]
         }
      }
   }
}

Here's the code I used:

http://sense.qbox.io/gist/aecd92e5903f644e28c802860a90a86bdd7f97ee

Sign up to request clarification or add additional context in comments.

1 Comment

That did it - thanks a million! In addition to my question your request is missing the first grouping by the script itself. I changed your request to: { "aggs": { "group_by_script": { "terms": { "field": "script" }, "aggs": { "query_count": { "nested": { "path": "query" }, "aggs": { "query_keys": { "terms": { "field": "query.query_key" } } } } } } } } Now it is perfectly working :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.