0

how to create document and collection in mongodb to make python code configuration. Get attribute name, datatype, function to be called from mongodb ?

mongodb collection sample example

db.attributes.insertMany([
   { attributes_names: "email", attributes_datype: "string", attributes_isNull="false", attributes_std_function = "email_valid" }
   { attributes_names: "address", attributes_datype: "string", attributes_isNull="false", attributes_std_function = "address_valid" }

]);

Python script and function

def email_valid(df):

    df1 = df.withColumn(df.columns[0], regexp_replace(lower(df.columns[0]), "^a-zA-Z0-9@\._\-| ", ""))
    extract_expr = expr(
        "regexp_extract_all(emails, '(\\\w+([\\\.-]?\\\w+)*@\\[A-Za-z\-\.]+([\\\.-]?\\\w+)*(\\\.\\\w{2,3})+)', 0)")
    df2 = df1.withColumn(df.columns[0], extract_expr) \
        .select(df.columns[0])

    return df2

How to get all the mongodb values in python script and call the function according to attribues.

1 Answer 1

1

To create MongoDB collection from a python script :

import pymongo
# connect to your mongodb client
client = pymongo.MongoClient(connection_url)

# connect to the database
db = client[database_name]

# get the collection
mycol = db[collection_name]

from bson import ObjectId
from random_object_id import generate

# create a sample dictionary for the collection data
mydict = { "_id": ObjectId(generate()),
           "attributes_names": "email", 
           "attributes_datype": "string", 
           "attributes_isNull":"false", 
           "attributes_std_function" : "email_valid" }

# insert the dictionary into the collection
mycol.insert_one(mydict)

To insert multiple values in the MongoDB, use insert_many() instead of insert_one() and pass the list of dictionary to it. So your list of dictionary will look like this

mydict = [{ "_id": ObjectId(generate()),
           "attributes_names": "email", 
           "attributes_datype": "string", 
           "attributes_isNull":"false", 
           "attributes_std_function" : "email_valid" },
           { "_id": ObjectId(generate()),
           "attributes_names": "email", 
           "attributes_datype": "string", 
           "attributes_isNull":"false", 
           "attributes_std_function" : "email_valid" }]

To get all the data from MongoDB collection into python script :

data = list()
for x in mycol.find():
  data.append(x)

import pandas as pd
data = pd.json_normalize(data)

And then access the data as you access an element of a list of dictionaries:

value = data[0]["attributes_names"]
Sign up to request clarification or add additional context in comments.

5 Comments

how to do for multiple attributes.
create a list of all your data and Use insert_many() instead of insert_one(), I have updated the answer as well
I am not able to write id part in direct mongodb.
Yup, I think MongoDB generates its own id when you are working directly on MongoDB without python. And if it doesn't generate that then maybe you can remove it from python code as well
Yes you are right. I have removed it and working fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.