Transition from Monolith to Microservices

A Transition from Monolith to Microservices Part 1: Deciding to Transition

John Holliman

By John Holliman, 03.01.2019

This post is part of a 3-part series on Expanse’s transition to a Microservices [1] Architecture built on Java and Spring Boot. In this series, we seek to share the issues we faced with the monolithic system, why we think the Spring Boot-based services will address them, and how we are affecting this change with other technologists who love to design or think about systems.

This series is best for readers with some working knowledge about building and maintaining web applications, relational databases, SQL, and common interfaces between web applications and databases (e.g. ORMs).

Introduction

At Expanse, most of our serving layer, our publicly-exposed API and its supporting components, used to be based in Django and Python. The tiny Django codebase that allowed us to rapidly prototype and proves out the value of our global Internet sensing product grew and evolved over the years as our business matured. In order for us to continue to scale our serving tier and extend it to accommodate future product offerings, we’ve started migrating the monolithic application toward a Microservices [1] Architecture built on Java and Spring Boot.

I wrote this series for engineers, architects, and technical managers who love to design or think about systems. You may find this series to be especially relevant if you need to make decisions about the architecture of systems that must serve data to consumers in a performant and highly-available manner. But even if you currently have no input over the architecture of your systems or the tools that your organization uses to build them, this post can still help you get a feel for how to evaluate the strengths and weaknesses of alternative approaches. I’m not attempting to present a perfect, one-size-fits-all solution for how enterprises should serve data to consumers. Rather, I’m just trying to show how we at Expanse solved the problem given our resources and specific requirements.

How did we get here?: Deciding to make the switch

When designing the first iteration of a system in a new startup, you’re probably not thinking about (and probably not facing) any scalability concerns. The biggest concern for you and your organization should be on shipping product faster than your capital constraints can force you out of business. You may already know and accept this as a common reason for the architectural problems experienced by later-stage startups. If that is the case, feel free to skim over this section and head straight to post #2 on our target architecture. However, before diving into our target architecture, I’d like to describe our current system and some of the challenges it’s presented to us.

Current State

A Transition from Monolith to Microservices Part 1: Deciding to Transition
Fast businessman with a car wins against the competitors. Concept of business success and competition

Imagine you’re flying down the road in a souped-up Honda Civic. It’s on fire. Your boss, sitting in the back where it’s probably safest, tells you to step on it, to speed up. The car has won a ton of street races, but now it’s being forced to enter the Indy 500. That was basically our system at Expanse. Our initial serving layer, at one point well designed, evolved about as gracefully as it could have in the face of business demands and the available human resources. But it needed some work to continue onto the next level scale and reliability.

Roughly two and a half years ago, we built Expander, a web portal that provides access to our Global Internet Sensing data. Previously, we had served this data through human-generated, PDF reports. When we started building Expander, we had developers comfortable in Python and no real concerns about customer data sizes. At the time, our only customer-owned about 0.002% of the IPv4 address space; the data related to services on these IPs was quite manageable. Django and a relational, highly normalized data model was a defensible choice.

Today, we have many more customers, with some owning approximately 2% of IPv4 space each –that’s 3 orders of magnitude over the load imposed by our first customer. Factor in improvements we’ve made in the breadth of data we collect, and the substantially increased frequency at which we collect it, and the load gets even higher. As a result, we’ve had to evolve our data model to keep pace with the metaphorical explosion of customer data. Our data model is now almost entirely denormalized and the SQL that results from our API calls is generally specialized to the point that the Django ORM more often than not get in the way.

To further complicate things, the data-processing pipelines used to populate data into the relational database were written in Java and maintained their own persistence mappings to the models in the database. Subtle deviations between these and the Python-Django mappings caused us large headaches. If we want to make a change we had to update both and review very carefully before we released.

There were other issues with the system in addition to the ones mentioned above, but the primary motivator for re-architecting was to allow us to scale development and operational tasks across our growing engineering team. Logically distinct business capabilities embedded in the monolithic application were too tightly coupled to allow us to effectively scale up the number of contributors.

When assumptions about the future prove false or requirements change, once sound decisions look wrong: hindsight is always 20/20. Who would guess that assumptions made about future customer data sizes would be off by multiple orders of magnitude (as, to our surprise, in our case)? If the system was built during the early stages of a startup’s lifetime, it’s extremely unlikely that it was built for much beyond demonstrating the value—or the potential value—of the product. If all goes well, it’s time to start building for stability, maintainability, extensibility, and efficiently distributing development across the team you’ve hired to keep up with the increased demand for additional features.

Now that we have an understanding of the issues related to the original system, let’s talk about how we moved towards a solution.

Goals

As a now-fast-growing startup onboarding many new customers, we needed our system to be maximally performant. When developing a plan to re-architect our system, we kept the following goals in mind:

Low Latency

Any proposed architecture we proposed, first and foremost, must be able to satisfy our customer requests within an acceptable amount of time

Scalability

The proposed system should allow us to 10-100x the number of customers the current system supports

Requirements

In addition to the above goals, our system had to satisfy the following requirements:

  • Allows for work to be effectively distributed across multiple teams
    • The system should be architected in a manner that allows for multiple, potentially geographically distributed, teams, to simultaneously contribute without a single team bottlenecking others
  • Extensible
    • Adding new routes or modifying existing routes should be trivial (i.e. cognitive overhead for making a change should be low)
    • Additionally, launching a new product with a distinct API should be easy
  • Easy to debug
    • When a request fails, it should be straightforward to figure out why
      • For the monolithic system, this often meant figuring how a request and it’s associated query parameters are translated in the SQL that fetches the results from the database—Python mixins combined with dynamically generated Django views kept this interesting
  • Failure isolation
    • A failure in a single module should not impact distinct modules.
      • In the monolithic architecture, we did not have this property; it was possible a single bad query in one service, could bring the database to its knees, impacting all other services
      • This suggests that distinct services should have their own backing stores and should communicate via strict interfaces instead of linking through the database layer
  • Minimal infrastructure spend
    • The different services should scale horizontally (preferably in an automated fashion)
  • Minimal deployment risks
    • Services representing logically distinct business capabilities should be able to be deployed separately with little to no coordination with other services
  • Presents as a single application to customers
    • Forcing external clients to deal with the complexities of multiple services—that is, managing and aggregating data from multiple endpoints and separately authenticating to different services—is not acceptable
    • Additionally, allowing clients to take a direct dependency on our services makes it difficult for us to rewrite them in the future
    • A consequence of this requirement is maintaining a single user permission model (this is desirable in and of itself)
  • Shares code with existing data processing pipelines
    • We would like to maintain a single persistence mapping to the data stores powering our applications
  • Gradual rollout
    • We want to avoid an all-or-nothing strategy for this re-architecting project at all costs
    • The path to move from the monolithic architecture to the new architecture should be incremental and unlock value in small nuggets throughout the process

These aims suggested that we break our existing Django application apart into units segmented by business capability, rewriting the units into Java services in order to share code with the Java-based data processing pipelines.

Spring Boot, an opinionated framework for building Java applications on top of the existing Spring Framework, followed as a natural choice of framework due to its level of maturity as a project and, our developers’ comfort and expertise developing in it [2].

In my next installment, I will discuss the solution we came up with to satisfy these goals and requirements.

Footnotes

[1] I personally prefer the term services to microservices because microservices implies that they must be small. That said, I’ll use the microservices vernacular because it’s what most people are familiar with hearing and has better SEO ¯\_(ツ)_/¯.
[2] And also because in Spring Boot we don’t have to write any XML configuration :).