0

I have indexed a json file in Mongodb in collection "clicklog" using a shell command. Below is the result of my shell command:

db.clicklogs.find().pretty()

Output:

    {
            "_id" : ObjectId("58fe78dcfbe21fa7896552e8"),
            "preview" : false,
            "offset" : 0,
            "result" : {
                    "search_term" : "484797",
                    "request_time" : "Sat Apr 01 23:58:49 -0400 2017",
                    "request_ip" : "127.0.0.1",
                    "stats_type" : "clickstats",
                    "upi" : "66024330304",
                    "unit" : "CITCS",
                    "job_title" : "IT Engineer",
                    "vpu" : "ICR",
                    "organization" : "73",
                    "location" : "MH",
                    "city" : "San Diego",
                    "country" : "USA",
                    "title" : "TOM",
                    "tab_name" : "People-Tab",
                    "page_name" : "PEOPLE",
                    "result_number" : "1",
                    "page_num" : "0",
                    "session_id" : "14e88b44576ad4fdc035bc41529762ad1",
                    "total_results" : "1",
                    "_raw":"request_time=Sat Apr 01 23:58:49 -0400 2017,request_ip=127.0.0.1,application=Search,stats_type=clickstats,upi=660243301304,unit=CITCS,job_title=IT Assistant, Client Services,vpu=ICR,location=DHAKA, BANGLADESH (IFC),organization=73,city=Dhaka,country=BANGLADESH,city_code=,search_term=484797,title=   Tom,url=http://isearch.worldbank.org/skillfinder/ppl_profile_new/000484797,tab_name=People-Tab,page_name=PEOPLE,result_number=1,page_num=0,filter=qterm=484797,total_results=1,app_environment=production,log_version=1.0,session_id=4e88b44576ad4fdc035bc41529762ad1",
                    "_time":"2017-04-01T23:58:49.000-0400"

            }
    }
{"_id" : ObjectId("58fe78dcfbe21fa7896552e9"),
        "preview" : false,
        "offset" : 0,
         "result" : {
                "search_term" : "demo",
                "request_time" : "Sat Apr 01 23:58:49 -0400 2017",
                "request_ip" : "127.0.0.1",
                 ....
                 "time":"2017-04-01T23:58:49.000-0400"
}
}

For every json document, I would like to get only the few field(id,searchterm,upi,page_name,sessionid, url(which is under _raw)). Is it possible to do it using mongo shell commands and store the result document in a new collection? Any help is appreciated.

2
  • What is your mongodb version ? Commented Apr 25, 2017 at 0:08
  • I am on the latest version 3.4.4 Commented Apr 25, 2017 at 0:34

1 Answer 1

1

You can try below aggregation in 3.4 version.

The query uses $split operator couple of times to reach to url value. Rest is standard projection fields.

$out stage to write the results into new collection.

db.getCollection('clicklogs').aggregate([{
        $project: {
            searchterm: "$result.searchterm",
            upi: "$result.upi",
            page_name: "$result.page_name",
            session_id: "$result.session_id",
            url: {
                $let: {
                    vars: {
                        obj: {
                            $arrayElemAt: [{
                                $split: ["$result._raw", ',']
                            }, 1]
                        }
                    },
                    in: {
                        $arrayElemAt: [{
                            $split: ["$$obj", '=']
                        }, 1]
                    }
                }
            }
        }
    },
    {
        $out: "clicklogs_temp"
    }
])
Sign up to request clarification or add additional context in comments.

4 Comments

Hey Veeram, Actually I didn't add the whole _raw data earlier in the question. I have updated my question. The URL is now at the position 18. I tried using your code by changing the <idx> position to 17 instead of 1. But it didn't get the URL field in the result. If I keep the position at 1, I am getting URL field with request_ip value. Can you help me with this?
It should've worked. Just to confirm you did change the first arrayAtElement index value right ? If yes, please verify the data. It works for me with the data you've provided in the post.
Yes. I changed the areayAtElement to 17 twice at obj and at in. I will check again. The collection has around 6400 json documents. Thanks for your reply.
You are welcome. You only have to change index in arrayatElemnt for obj to 17 which will return you something like url = value and arrayElemAt inside in will read that value at index 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.