Trying to use jdbc river ElasticSearch plugin for batch processing

Question

I need to write some indexing jobs to run once per day that query our Oracle database tables and index into ElasticSearch. Some tables index first and others next since there are table dependencies. But around that process of indexing, I need to enhance the fields going into to the ES index, as well as log to our Oracle database table job statuses and perhaps even the records that have succeeded/failed the indexing process.

Can I use the Elastic Search plugin JDBC-River.

Horse Voice · Accepted Answer · 2014-02-03 04:27:06Z

5

My concern was logging back to the RDBS via an insert statement after the query to extract from the DB. I got in touch with the creator of the jdbc-river. He mentioned this is how I should configure things: really helpful!

curl -XDELETE '0:9200/_river/my_jdbc_river/


curl -XPUT '0:9200/_river/my_jdbc_river/_meta' -d '
    {
        "type": "jdbc",
        "jdbc": {
            "url": "jdbc:mysql://localhost:3306/test",
            "user": "",
            "password": "",
            "schedule": "0 0-59 0-23 ? * *",
            "sql": [
                {
                    "statement": "select *, created as _id, \"myjdbc\" as _index, \"mytype\" as _type from orders"
                },
                {
                    "statement": "insert into ack(n,t,c) values(?,?,?)",
                    "parameter": [
                        "$job",
                        "$now",
                        "$count"
                    ]
                }
            ]
        }
    }'

answered Feb 3, 2014 at 4:27

Horse Voice

8,38818 gold badges78 silver badges121 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mjalil Over a year ago

"schedule": "0 0-59 0-23 ? * *" means run every minute. change it to "schedule": "0 0 0 ? * *" to schedule run once a day at midnight

Roopendra · Accepted Answer · 2014-01-30 07:27:43Z

0

Yes , You can do this by using poll parameter in jdbc river. In detail

Polling

JDBC river runs are repeated at a given interval. This method is also known as polling. You can specify the polling interval with the poll parameter, which takes an Elasticsearch time value. The default value is 1h.

Example :-

curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "com.mysql.jdbc.Driver",
        "url" : "jdbc:mysql://localhost:3306/test",
        "user" : "",
        "password" : "",
        "sql" : "select * from orders",
        "poll" : "1h" 
    },
    "index" : {
        "index" : "jdbc",
        "type" : "jdbc",
        "bulk_size" : 100,
        "max_bulk_requests" : 30,
        "bulk_timeout" : "60s"
    }
}'

For your reference :- https://github.com/jprante/elasticsearch-river-jdbc/issues/92

edited Jan 30, 2014 at 7:27

answered Jan 30, 2014 at 7:18

Roopendra

7,76716 gold badges72 silver badges95 bronze badges

2 Comments

Horse Voice Over a year ago

But What I really care about here is insertion. Inserting back into the RDBMS for logging of what records got indexed into elastic and what failed. How can I create that logic? Thanks

Алексей Over a year ago

@TazMan i dont thinkg jdbc river supports that, you better do all the indexing/logging your self if what you need is other then simple bulk indexing from db to elastic

Collectives™ on Stack Overflow

Trying to use jdbc river ElasticSearch plugin for batch processing

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related