2

Microservices are deployed hosting their own database.

What strategies do you employ when business requirements necessitate joins across data in multiple services?

Example problem: You are implementing a movie review site. You have a movie microservice that holds the movie DB. You also have a review microservice that manages reviews in its own separate DB. Reviews are linked to movies via a GUID; but as these are implemented as separate data stores, not a key constraint.

You would like to have available, accurate to the last minute, a report that tells you the total number of reviews for each review level grouped by the first letter of the movie having a review word count > 25 words. You currently host 5 million reviews for 40,000 movies.

E.G. Reviews with more than 25 words:

  • A [8457 "1 star"] [16615 "2 star"] [...
  • B [98445 "1 star"] [80210 "2 star"] [...
  • ...

Having chosen a microservice architecture for your project, what strategies would you now employ to implement this feature?

3 Answers 3

2

I think at this point I would ask myself what exactly is the domain you are trying to model against. If the domain is strictly rendering movies and the reviews for the movies, my question would be why are there two separate services, the movie and movie review service.

In essence, I would merge the two services together into a single service and call it a movie-reviews-service since reviews for the movies is all thats cared about. In this case, there would no longer be a problem with joins.

Personally, I think the question to really ask is whether the movie service should exist and what kind of role it plays. In your example, it seems a extraneous to be broken into a separate service. While this may not be a satisfactory answer, the example provided is technically a little too simple to make a microservices architecture worthwhile since there are less components requiring the separation of concern to really break them down further into multiple services.

If the example was complex enough to warrant a microservices architecture to have these two separate services, it would just be a matter of redundancy of data in the movie-reviews service and the movies-service in order to fully denormalize. The idea being that a service should try to entirely rely on itself as much as possible rather than making multiple requests to very granular services leading to an antipattern -- the nanoservices architecture. Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Nano / pico / yatto services, the next craze? I feel compelled to approach the problem in this way as well. Thanks.
No problem! I think the term "micro" services is a tiny bit of a overloaded term. Nanoservice is actually a term for the antipattern and it generally means that it has gone beyond the microservices architecture. Really at the "micro" services stage, it should already be granular enough as its supposed to adhere to the single responsibility principle. There may just be a misconception since "micro" seems to suggest that its measurable but I wouldn't say its quite measurable. It really depends on your domain and what the responsibility of that single service is.
1

If you are going to need to retrieve reviews by the first letter of the movie title, put an attribute called "movie review key" or even the movie title itself on the review service.

I've had to learn the hard way that denormalization is a way of life in microservices. If you try to strictly normalize your services, you will end up with FAR too much chattiness. Things that change rarely (like a movie title) can definitely get copied to a separate store.

2 Comments

With this approach, the next release of my Review service would include a data update of 5 million records by adding an attribute for the first letter of the movie. It would also require updating of the "create review" implementation to fetch the title of the movie, but allows the API for create review to remain the same. Doable for this specific case. Do we just continue this approach for the next "group by" attribute the customer asks for: director, year release, actors appearing, etc etc? How are enterprises addressing BI, pivot tables etc, from data in microservices repositories?
BI is a completely separate concern from microservices. By its very nature, BI has to consume content from disparate sources, and join it together (usually implemented in a data warehouse). I've never seen BI done with run-time services. It has always been implemented as an after-the-fact ETL process.
1

Having chosen a microservice architecture for your project, what strategies would you now employ to implement this feature?

Assuming your split of your application into microservices is correct, I would say, that no joins are required.

In your end-user web application you get a list of review entities, satisfying the report conditions by querying Review microservice. No Movie objects are there. Only their GUIDs. Then you iterate the collection you received collecting Movie GUIDs and you ask Movie microservice to give you objects with those GUIDs.

Then you just render the report to the user using two object collections (keyed by id, for example).

Will that work for you?

1 Comment

I can see how this works if the first letter of the movie is included in the denormalized data for reviews. The next release of my review service would either create a table mapping Movie GUID to first letter (40,00 records), or updating individual reviews (5 million records). The problem arises on the success with this report that the customer to request aggregations on more and more dimensions. Perhaps at that point it really is a Business Intelligence problem and time to pipe the data into another system for such queries. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.