1

How can I get the query results to a dataframe with columns preserving the hierarchical structure? columns like this:

type|postDate|discussionTitle|courses|subjectKeywords|SentiStrength|SentiWordNet|universities|universityKeywords|

I have an elasticSearch with around 1,000,000 JSOn docs. I want to use this dataset for Natural Language Processing(NLP) with Python. Could someone please help me on how to get data from elasticsearch into Python and to write data back to elasticsearch from Python. Would much appreciate it, As I am stuck unable to do any NLP on the dataset I have, as I cannot get it to connect with Python. This is what the index structure of the elasticsearch look like:
I want to enter a new index into the hierarchy just like "University info" called "Process info" and this new index will index the dataset based on a set of keywords i give - just like "universityKeywords" every jason file should store the set of keywords that the tags used. I want to tag the dataset into "process info" - put 4 tags or categories on the json files named- Applications, offers, Enrolments, requirements based on keywords in the json file post-title and post text

 "educationforumsenriched2": {
          "mappings": {
             "whirlpool": {
                "properties": {
                   "CourseInfo": {
                      "properties": {
                         "courses": {
                            "type": "string",
                            "index": "not_analyzed"
                         },
                         "subjectKeywords": {
                            "type": "string",
                            "index": "not_analyzed"
                         }
                      }
                   },
                   "SentimentInfo": {
                      "properties": {
                         "SentiStrength": {
                            "type": "float"
                         },
                         "SentiWordNet": {
                            "type": "float"
                         }
                      }
                   },
                   "UniversityInfo": {
                      "properties": {
                         "universities": {
                            "type": "string",
                            "index": "not_analyzed"
                         },
                         "universityKeywords": {
                            "type": "string",
                            "index": "not_analyzed"
                         }
                      }
                   },
                   "postDate": {
                      "type": "date",
                      "format": "strict_date_optional_time||epoch_millis"
                   },
                   "postID": {
                      "type": "integer"
                   },
                   "postText": {
                      "type": "string"
                   },
                   "references": {
                      "type": "string"
                   },
                   "threadID": {
                      "type": "integer"
                   },
                   "threadTitle": {
                      "type": "string"
                   }
                }
             },
             "atarnotes": {
                "properties": {
                   "CourseInfo": {
                      "properties": {
                         "courses": {
                            "type": "string",
                            "index": "not_analyzed"
                         },
                         "subjectKeywords": {
                            "type": "string",
                            "index": "not_analyzed"
                         }
                      }
                   },
                   "SentimentInfo": {
                      "properties": {
                         "SentiStrength": {
                            "type": "float"
                         },
                         "SentiWordNet": {
                            "type": "float"
                         }
                      }
                   },
                   "UniversityInfo": {
                      "properties": {
                         "universities": {
                            "type": "string",
                            "index": "not_analyzed"
                         },
                         "universityKeywords": {
                            "type": "string",
                            "index": "not_analyzed"
                         }
                      }
                   },
                   "discussionTitle": {
                      "type": "string"
                   },
                   "postDate": {
                      "type": "date",
                      "format": "strict_date_optional_time||epoch_millis"
                   },
                   "postID": {
                      "type": "integer"
                   },
                   "postText": {
                      "type": "string"
                   },
                   "query": {
                      "properties": {
                         "match_all": {
                            "type": "object"
                         }
                      }
                   },
                   "threadID": {
                      "type": "integer"
                   },
                   "threadTitle": {
                      "type": "string"
                   }
                }
             }
          }
       }
    }

this is the code i used to create the process info tags in java- I want to do the same in Python

 processMap.put("Applications", new ArrayList<>(Arrays.asList("apply", "applied", "applicant", "applying", "application", "applications")));
        processMap.put("Offers", new ArrayList<>(Arrays.asList("offers", "offer", "offered", "offering")));
        processMap.put("Enrollment", new ArrayList<>(Arrays.asList("enrolling","enroled","enroll", "enrolment", "enrollment","enrol","enrolled")));
        processMap.put("Requirements", new ArrayList<>(Arrays.asList("requirement","requirements", "require")));
4
  • Python Elasticsearch Client? Commented Jun 3, 2017 at 4:48
  • pyelasticsearch? i have instaled the packages- but cant figure out how to get this dataset to python. A small example will be highly useful. This is the mapping structure of my elasticsearch index: Commented Jun 3, 2017 at 6:06
  • "educationforumsenriched2": { "mappings": { "whirlpool": { "properties": { "CourseInfo": {.. Commented Jun 3, 2017 at 6:10
  • What about an ingest pipeline ? Commented Jun 6, 2017 at 21:46

1 Answer 1

1

With the elasticsearch python client, once you have established a successful connection, you just need to provide the DSL Query and the indices you want to search through to retrieve the required information, for instance, if you have a query:

GET educationforumsenriched2/_search
{
    "query": {
        "match" : {
            "CourseInfo.subjectKeywords" : "foo"
        }
    }
}

The equivalent in Python would be:

from elasticsearch import Elasticsearch

es = Elasticsearch({"host": "localhost", "port": 9200}) #many other settings are available if using https and so on

query = {
        "query": {
            "match" : {
                "CourseInfo.subjectKeywords" : "foo"
            }
        }
    }
res = es.search(index="educationforumsenriched2", body=query)

#do some processing

#create new document in ES
es.create(index="educationforumsenriched2", body=new_doc_after_processing)

Edit: Just thinking about it, but if your processing is not too complex you could also think about building an ingest pipeline

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you.But how can I get the results in es into a structure like a data frame with columns for the fields like in question edit
@BAstu What kind of dataframe are we talking about, pandas dataframe? Spark dataframe? Maybe this question can help: stackoverflow.com/questions/25186148/…
Yes a Pandas Dataframe. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.