2

I need to write some indexing jobs to run once per day that query our Oracle database tables and index into ElasticSearch. Some tables index first and others next since there are table dependencies. But around that process of indexing, I need to enhance the fields going into to the ES index, as well as log to our Oracle database table job statuses and perhaps even the records that have succeeded/failed the indexing process.

Can I use the Elastic Search plugin JDBC-River.

2 Answers 2

5

My concern was logging back to the RDBS via an insert statement after the query to extract from the DB. I got in touch with the creator of the jdbc-river. He mentioned this is how I should configure things: really helpful!

curl -XDELETE '0:9200/_river/my_jdbc_river/


curl -XPUT '0:9200/_river/my_jdbc_river/_meta' -d '
    {
        "type": "jdbc",
        "jdbc": {
            "url": "jdbc:mysql://localhost:3306/test",
            "user": "",
            "password": "",
            "schedule": "0 0-59 0-23 ? * *",
            "sql": [
                {
                    "statement": "select *, created as _id, \"myjdbc\" as _index, \"mytype\" as _type from orders"
                },
                {
                    "statement": "insert into ack(n,t,c) values(?,?,?)",
                    "parameter": [
                        "$job",
                        "$now",
                        "$count"
                    ]
                }
            ]
        }
    }'
Sign up to request clarification or add additional context in comments.

1 Comment

"schedule": "0 0-59 0-23 ? * *" means run every minute. change it to "schedule": "0 0 0 ? * *" to schedule run once a day at midnight
0

Yes , You can do this by using poll parameter in jdbc river. In detail

Polling

JDBC river runs are repeated at a given interval. This method is also known as polling. You can specify the polling interval with the poll parameter, which takes an Elasticsearch time value. The default value is 1h.

Example :-

curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "com.mysql.jdbc.Driver",
        "url" : "jdbc:mysql://localhost:3306/test",
        "user" : "",
        "password" : "",
        "sql" : "select * from orders",
        "poll" : "1h" 
    },
    "index" : {
        "index" : "jdbc",
        "type" : "jdbc",
        "bulk_size" : 100,
        "max_bulk_requests" : 30,
        "bulk_timeout" : "60s"
    }
}'

For your reference :- https://github.com/jprante/elasticsearch-river-jdbc/issues/92

2 Comments

But What I really care about here is insertion. Inserting back into the RDBMS for logging of what records got indexed into elastic and what failed. How can I create that logic? Thanks
@TazMan i dont thinkg jdbc river supports that, you better do all the indexing/logging your self if what you need is other then simple bulk indexing from db to elastic

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.