Remove null from SOLR Indexes

Question

I am currently indexing a few documents from an external source into SOLR. This external source has few empty elements that are getting indexed in SOLR as well. How can I avoid indexing empty/null values in SOLR.

For e.g.

My CSV is name,city,zip. Some values are

Jack,Houston, 89812
,Austin,98123

In the second value set I do not have a name. However, when SOLR indexes this document it adds {"Name":"","City":"Austin","Zip":"98123"}. How can I avoid having "Name" as an empty element in SOLR?

Thanks in advance

According to your comment below - this CSV file isn't submitted directly to Solr as a CSV file? Solr will by default ignore any field with an empty value in a csv file, unless you explicitly tell it it to keep it with keepEmpty. — MatsLindh
– MatsLindh, Commented Aug 16, 2018 at 19:02
My apologies. I should have mentioned the usage of Spark to convert the CSV to JSON and then index it to SOLR. Can you please shed some light on keepEmpty? Where do I set this? Is it a SOLR property? — Nick
– Nick, Commented Aug 17, 2018 at 12:15
keepEmpty is an argument you can give the CSV update module, which excludes fields that doesn't have a value in the CSV file. For the same for general updates, see the answer by Alexandre. — MatsLindh
– MatsLindh, Commented Aug 18, 2018 at 15:09

Alexandre Rafalovitch · Accepted Answer · 2018-08-17 13:12:53Z

3

If you need to do any pre-processing on submitted documents before they hit the schema, Solr has a whole UpdateRequestProcessor subsystem. The specific one you are looking for is RemoveBlankFieldUpdateProcessorFactory, possibly coupled with TrimFieldUpdateProcessorFactory. there

Remember that you need to tell Solr that you want to use them, either via chain (default or explicit) or via individual configuration (explicit), all described in the first link above.

answered Aug 17, 2018 at 13:12

Alexandre Rafalovitch

9,8091 gold badge26 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Harald · Accepted Answer · 2018-08-16 17:53:12Z

0

You could convert your CSV to JSON, not providing the empty name and then indexing the JSON file(s).

Solr by itself only indexes what it gets. If it indexes an empty field, it got an empty field. And this is what happens with the CSV indexer, I guess, it just is not made to leave empty fields out.

With JSON you are in control.

answered Aug 16, 2018 at 17:53

Harald

5,2778 gold badges40 silver badges83 bronze badges

1 Comment

Nick Over a year ago

That was my last resort. I am picking a huge CSV file into a Spark Data Frame and then indexing it directly to SOLR. I will have to code something in Spark while creating the JSON. I thought there would be a better approach within the managed-schema to drop elements that are empty/null. For e.g. Check the length of the field. If greater than zero index it or else don't add it to the document.

Collectives™ on Stack Overflow

Remove null from SOLR Indexes

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related