0

I have two servers, one is for neo4j to store graph data, another server will run ETL to load data into neo4j every minutes.

My current solution is: using a for loops to run a transaction for each item of coming data (based on py2neo) , but the performance is very slow, I have also tried to save a tmp csv file in the neo4j local server, then use load csv syntax in cypher it will improve performance a lot, but I dont know how to load csv from a remote server.

so, what I want to know is that if there is a way to load dict/list/(pandas dataframe) into neo4j ? just like load csv to do a batch import, in python script ? I am new to neo4j, thanks very much for help.

1 Answer 1

1

If you want to load CSV from remote server, you need to run a simpleHTTPServer or something similar that hosts files on HTTPServer. Then you can simply use

LOAD CSV FROM "http://192.x.x.x/myfile.csv" as row

On the other hand you can import your file from a pandas dataframe. I have create a simple script that calculated linear regression gradient and saves it back to neo4j

from neo4j.v1 import GraphDatabase
import pandas as pd
import numpy as np
driver = GraphDatabase.driver("bolt://192.168.x.x:7687", auth=("neo4j", "neo4j"))
session = driver.session()

def weekly_count_gradient(data):
    df = pd.DataFrame([r.values() for r in data], columns=data.keys())
    df["week"] = df.start.apply(lambda x: pd.to_datetime(x).week if pd.notnull(x) else None)
    df["year"] = df.start.apply(lambda x: pd.to_datetime(x).year if pd.notnull(x) else None)
    group = df.groupby(["week","year","company"]).start.count().reset_index()
    for name in group["company"].unique():
        if group[group["company"] == name].shape[0] >= 5:
            x = np.array([i[1] if i[0] == 2016 else i[1] + 52 for i in group[group.company == name][["year","week"]].values])
            y = group[group.company == name]["start"].values
            fit = np.polyfit(x,y,deg=1)     
            update = session.run("MATCH (a:Company{code:{code}}) SET a.weekly_count_gradient = toFLOAT({gradient}) RETURN a.code,{"code":name,"gradient":fit[0]})

the key here is that you run a query with parameters, and parameters can come from anywhere (list/dict/pandas)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.