0

On an insert into mongodb, can I set the writeconcern to something that will only ignore Duplicate Key errors? I want to completely ignore these errors, but still catch if something else went wrong. I'd especially like to know if I've lost my connection to the database or if mongod itself has crashed. (I'm on a research network, so these two things aren't always the most reliable.)

The writeconcern I see that seems to come close is UNACKNOWLEDGED, but I don't quiet understand what exactly it'll catch and what will be ignored.

My application is written in Java with mongo driver version 2.10.1 and using mongo version 2.4.6

I'm avoiding using the default writeconcern and simply catching the exception because I don't want the exception overhead. I expect the number of Duplicate Keys to be high. Is this a silly concern?

4
  • 1
    What you wanna do is catch the error but ignore it in your app, I do not believe this is possible on MongoDB side and I believe such a thing is pre-optimisation Commented Nov 8, 2013 at 22:39
  • You're code is causing so many duplicate key inserts that you're worried about the performance of raising exceptions in Java? Yikes. Why are so many duplicates generated? If you could, it would be best to avoid even sending those to the server by caching a large block at least of known duplicate Ids on the client. Commented Nov 8, 2013 at 23:00
  • @WiredPrairie I'm doing some long running (as in, I never turn it off) stream processing and the stream I get (out of my control) very often contains duplicate objects. The problem, however, is that the exact duplicates don't repeat, so there isn't any concept of "most common" duplicates. I might get a stream of several thousand unique items, all of which I've already seen, but then I won't see these items again for weeks or months. I'm still fairly new to both stream processing and big data, so any pointers or suggestions you can provide are more than welcome. Commented Nov 9, 2013 at 21:41
  • @WiredPrairie It also seems I may have overestimated the cost of handling exceptions in Java. Commented Nov 9, 2013 at 21:42

1 Answer 1

3

The official documentation about WriteConcern shows what options the WriteConcern setting has.

The WriteConcern.UNACKNOWLEDGED ({w:0,j:0,fsync:0}) will report network errors, so you already have that covered. When the database crashes, the connection to it will be interrupted immediately (when the operating system is still running) or time out in a few seconds (when the whole OS or even the physical server crashes), so you should also notice that rather quickly.

The next best WriteConcern ({w:1,j:0,fsync:0}, or WriteConcern.SAFE) waits for acknowledgment by the primary of the replica-set, so it will report primary key unique index errors.

When there would be a setting which would make the server respond after it received the data but before it interpreted it, it wouldn't be much faster. Validating the BSON syntax is usually cheap, and checking for collisions in any unique indexes isn't that much more expensive. At least compared to the network roundtrip time.

When you worry about performance of catching the exception in Java: Exception handling in Java isn't expensive either.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, Philipp. The documentation of writeconcern on Mongo's site was confusing me because they mentioned something about network errors not always getting caught depending on your network configuration. It also looks like I'll need to dig deeper into Java exception handling. I come from a python, not Java, background and some research I had done into Java preferring to ask for permission rather than forgiveness left me with the impression that exception handling could get expensive.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.