1

I'm trying to insert a comma separated string (GROUP_CONCAT) into elasticsearch as an array datatype. As input I use JDBC and the output of the SQL query is like:

+---------+-----------+------------+--------------------------+-------------+---------------------+---------+------------+----------+---------------------+-------------+---------+----------------------------------------+
| network | post_dbid | host_dbid  | host_netid               | post_netid  | published           | n_likes | n_comments | language | indexed             | n_harvested | country | vrt                                    |
+---------+-----------+------------+--------------------------+-------------+---------------------+---------+------------+----------+---------------------+-------------+---------+----------------------------------------+
| xxx     | 2_xxx     | 60480_xxx  | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2017-12-28 08:11:58 | 5       | 0          | en       | 2018-05-30 00:00:00 | 0           | ID      | Fitness,Well-being                     |
| xxx     | 5_xxx     | 98458_xxx  | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2016-09-01 11:59:14 | 2275    | 242        | ar       | 2018-05-30 00:00:00 | 0           | SA      | SmartPhones_Gadgets                    |
| xxx     | 15_xxx    | 50884_xxx  | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2018-04-23 16:36:10 | 0       | 0          | en       | 2018-05-30 00:00:00 | 0           | EG      | Fashion_Beauty                         |
| xxx     | 21_xxx    | 64118_xxx  | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2015-07-01 22:50:54 | 295     | 8          | pt       | 2018-05-30 00:00:00 | 0           | BR      | Nutrition                              |
| xxx     | 24_xxx    | 9767_xxx   | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2017-05-30 02:35:29 | 10      | 1          | en       | 2018-06-18 15:32:57 | 0           | US      | Health                                 |
| xxx     | 87_xxx    | 44473_xxx  | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2017-01-08 23:02:52 | 7       | 0          | en       | 2018-05-30 00:00:00 | 0           | US      | Beverages                              |
| xxx     | 99_xxx    | 120198_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2018-02-17 02:57:58 | 8       | 0          | en       | 2018-05-30 00:00:00 | 0           | US      | Food                                   |
| xxx     | 126_xxx   | 50258_xxx  | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2018-03-22 09:16:25 | 1       | 0          | en       | 2018-05-30 00:00:00 | 0           | IN      | Health                                 |
+---------+-----------+------------+--------------------------+-------------+---------------------+---------+------------+----------+---------------------+-------------+---------+----------------------------------------+

I used split from mutate plugin:

filter {
  mutate {
     split => { "vrt" => "," }
  }
}

although, field was inserted as a comma separated string:

GET xxx/_search
{
  "query": {
    "terms": {
      "_id": ["2_xxx"] 
    }
  }
}

responce:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "xxx",
        "_type": "doc",
        "_id": "2_xxx",
        "_score": 1,
        "_source": {
          "post_dbid": "2_xxx",
          "host_dbid": "60480_xxx",
          "host_netid": "xxxxxxxxxxxxxxxxxxxxxxxx",
          "n_likes": 5,
          "n_comments": 0,
          "country": "ID",
          "network": "xxx",
          "indexed": "2018-05-30T00:00:00.000Z",
          "n_harvested": 0,
          "vrt": "Fitness,Well-being",
          "@version": "1",
          "post_netid": "xxxxxxxxxxx",
          "@timestamp": "2018-06-27T15:47:24.370Z",
          "language": "en",
          "published": "2017-12-28T08:11:58.000Z"
        }
      }
    ]
  }
}

My final goal is to insert vrt as array field and using kibana, to create visualizations. For example I want to create a counter on kibana and count how many document has the "Fitness" on vrt field.

ELK Version: 6.2.4

1 Answer 1

3

You could use ruby filter. Here's how I do it. I created a ruby method that splits comma separated string, trims, rejects empty elements and removes duplicates. You can then use the method for all comma separated string as below:

 filter {
 ruby{
     code =>"

         # method to split the supplied string by comma, trim whitespace and return an array
         def mapStringToArray(strFieldValue)

            #if string is not null, return array
            if (strFieldValue != nil)
                fieldArr =  strFieldValue.split(',').map(&:strip).reject(&:empty?).uniq 
                return fieldArr                             
            end     

            return [] #return empty array if string is nil
         end

         vrtArr = mapStringToArray(event.get('vrt'))
         if vrtArr.length > 0                           
            event.set('vrt', vrtArr) 
         end
"
}
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.