5

I am trying to figure out a way to keep my mysql db and elasticsearch db in sync. I have setup a jdbc river using the jprante / elasticsearch-river-jdbc plugin for elasticsearch. When I execute the below request:

curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
    "driver" : "com.mysql.jdbc.Driver",
    "url" : "jdbc:mysql://localhost:3306/MY-DATABASE",
    "user" : "root",
    "password" : "password",
    "sql" : "select * from users",
    "poll" : "1m"
},
"index" : {
    "index" : "test_index",
    "type" : "user"
}
}'

the river starts indexing data, but for some records I get org.elasticsearch.index.mapper.MapperParsingException. Well there is discussion related to this issue here, but I want to know a way to get around this issue.

Is it possible to permanently fix this by creating an explicit mapping for all 'fields' of the 'type' that I am trying to index or is there a better way to solve this issue?

Another question that I have is, when the jdbc-river polls the database again, it seems to re-index the entire data-set(given in sql query) again into ES. I am not sure, but is this done because elasticsearch wants to add fresh data as well as update any changes in the existing data? Is it possible to index only the fresh data, if the table's data is static?

1

2 Answers 2

5

Did you look at default mapping? http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html

I think it can help you here.

If you have an insertion date field in your datatable, you can use it to filter what you have to index. See https://github.com/jprante/elasticsearch-river-jdbc#time-based-selecting

HTH

David

Sign up to request clarification or add additional context in comments.

Comments

0

Elastic Search has dropped the river sync concept at all. It is not a recommended path, because usually it doesn't make sense to keep same normalized SQL table structure in document store like Elastic Search.

Say, you have Product as an entity with some attributes, and Reviews on Product entity as a parent child table as Reviews could be multiple on same table.

Products(Id, name, status,... etc)
Product_reviewes(product_id, review_id)
Reviews(id, note, rating,... etc)

In document store you may want to create a single Index with name say product that includes Product{attribute1, attribute1,... Product reviews[review1, review2,...]}

Here is approach of syncing in such setup.

Assumption:

  1. SQL Database(True Source of record)
  2. Elastic Search or any other NoSql Document Store

Solution:

  1. As soon as Update/updates happens in Publish event/events in JMS/AMQP/Database Queue/File System Queue/Amazon SQS etc. either full Product or primary object ID(I would recommend just ID)
  2. Queue consumer should then call the Web Service to get full object if only Primary ID is pushed to Queue or just take the object it self and send the respective changes to Elastic search/NoSQL database.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.