How to read from multiple Elasticsearch indices in Spark?

Question

I need to read data from multiple indices of Elasticsearch. But all of these indices have the same data structure.

For example:

val df1 = spark.read.format("org.elasticsearch.spark.sql")
              .option("query", myquery)
              .option("pushdown", "true")
              .load("news_01/myitem")

val df2 = spark.read.format("org.elasticsearch.spark.sql")
              .option("query", myquery)
              .option("pushdown", "true")
              .load("news_02/myitem")

What happens if I get the array of index names ["news_01", "news_02"]?

How can I avoid creating df1, df2 as I do now?

You mean, you want to merge the data from two indices?

sramalingam24
– sramalingam24

2018-04-24 13:24:56 +00:00
Commented Apr 24, 2018 at 13:24 — sramalingam24
– sramalingam24, Commented Apr 24, 2018 at 13:24

falomir · Accepted Answer · 2018-04-25 05:01:44Z

0

Given that ElasticSearch allows you to target multiple indices at the same time during a search request, you could do something like:

val df = spark.read.format("org.elasticsearch.spark.sql")
              .option("query", myquery)
              .option("pushdown", "true")
              .load("news_01,news_02")

answered Apr 25, 2018 at 5:01

falomir

1,1671 gold badge9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to read from multiple Elasticsearch indices in Spark?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related