0

I am currently indexing a few documents from an external source into SOLR. This external source has few empty elements that are getting indexed in SOLR as well. How can I avoid indexing empty/null values in SOLR.

For e.g.

My CSV is name,city,zip. Some values are

Jack,Houston, 89812
,Austin,98123

In the second value set I do not have a name. However, when SOLR indexes this document it adds {"Name":"","City":"Austin","Zip":"98123"}. How can I avoid having "Name" as an empty element in SOLR?

Thanks in advance

3
  • According to your comment below - this CSV file isn't submitted directly to Solr as a CSV file? Solr will by default ignore any field with an empty value in a csv file, unless you explicitly tell it it to keep it with keepEmpty. Commented Aug 16, 2018 at 19:02
  • My apologies. I should have mentioned the usage of Spark to convert the CSV to JSON and then index it to SOLR. Can you please shed some light on keepEmpty? Where do I set this? Is it a SOLR property? Commented Aug 17, 2018 at 12:15
  • keepEmpty is an argument you can give the CSV update module, which excludes fields that doesn't have a value in the CSV file. For the same for general updates, see the answer by Alexandre. Commented Aug 18, 2018 at 15:09

2 Answers 2

3

If you need to do any pre-processing on submitted documents before they hit the schema, Solr has a whole UpdateRequestProcessor subsystem. The specific one you are looking for is RemoveBlankFieldUpdateProcessorFactory, possibly coupled with TrimFieldUpdateProcessorFactory. there

Remember that you need to tell Solr that you want to use them, either via chain (default or explicit) or via individual configuration (explicit), all described in the first link above.

Sign up to request clarification or add additional context in comments.

Comments

0

You could convert your CSV to JSON, not providing the empty name and then indexing the JSON file(s).

Solr by itself only indexes what it gets. If it indexes an empty field, it got an empty field. And this is what happens with the CSV indexer, I guess, it just is not made to leave empty fields out.

With JSON you are in control.

1 Comment

That was my last resort. I am picking a huge CSV file into a Spark Data Frame and then indexing it directly to SOLR. I will have to code something in Spark while creating the JSON. I thought there would be a better approach within the managed-schema to drop elements that are empty/null. For e.g. Check the length of the field. If greater than zero index it or else don't add it to the document.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.