Improving Performance and Reliability of Internal Communication Among Microservices: The Story Behind the Falcon Sandbox Team’s gRPC Journey
The Hybrid Analysis community submits hundreds of thousands of samples for analysis to our systems every day. Those sample submissions mean our CrowdStrike Falcon Sandbox™ software must do millions of file system operations, moving data around to various services, as we detonate the samples and generate numerous artifact files. These operations need to be done as fast as possible without any data loss. To be able to do this, it is imperative to include the right technologies within our software.
Given these volumes of file transactions, and processing a large variety of file sizes, every millisecond counts. Understanding this challenge, and setting goals to build a performant system, we decided to look at our underlying infrastructure to identify ways to speed up file processing within our system.
This is not a trivial undertaking, given this is a set of decentralized microservices running in a virtual cloud. While building a microservices-based system provides the independent processing and scaling we need, it can also result in more file transfers. When designing a system like this, the transportation layer is often overlooked, as it’s not a typical place where developers make improvements. Having our mature, universal and widely used REST interfaces, why would we think about alternatives? As seen repeatedly, the devil is in the details.
REST is not always the best solution. It does not provide any built-in API contracts, and it can require third-party libraries for documentation of endpoints. It uses HTTP/1.1 with a known head-of-line blocking issue. It transfers data in text/bytes format — depending on content (with a possibility of gzip compression usage) — and only in a limited way supports streams.
On the other hand, gRPC is based on a much different communication approach. By design, it uses all of the advantages of HTTP/2 (request and response multiplexing, binary framing, data streaming), that at the end provide low-latency and high-speed throughput. Moreover, each service has to have a message schema — a Protocol Buffer file. It not only describes the structure of requests and responses in the strongly typed manner, but also acts as a self-documenting service contract. Moreover, because of Protocol Buffer binary representation, it’s parsed in a more efficient and less CPU-intensive manner.
Without any doubts, that sounds promising and exciting. That’s enough theory — now, let’s get our hands a bit dirty.
Before deciding to use gRPC technology in our applications, we first prepared the application’s proof of concept (POC) and monitored it closely for performance/compatibility issues.
As we’re using a variety of technologies, for the purpose of testing we decided to build a client in PHP 7.3 and a server in Java 11. During testing, we focused on transferring files (bytes) and regular data structures (with various sizes). Depending on the case, unary calls and streams were used.
In all cases, gRPC was more performant and had less memory and CPU consumption — up to four times in timing and three times in resource consumption. Better results were especially noticeable while transferring data structures, and less while transferring files. The reason for this might be the fact that while transmitting a file, in both cases (gRPC and REST) we’re transferring opaque bytes. In contrast, when transferring structured data, gRPC uses a compact binary format while REST utilizes a text one. In the end, that affects the message size, which leads to different timing.
Our results are not as promising as those visible in benchmarks available on the web (which present gRPC as five to 10 times more performant). I believe that partially was due to the fact that various gRPC clients/servers characterize different performance levels. The same goes for different programming languages. Most likely, those results will differ, once we write the client and server in Java only.
The results of our testing and the satisfying experience of gRPC application development convinced us to move a step forward to create a production-ready program. Now let’s discuss the most intriguing part — the development.
Developing Our Solution
Google’s gRPC technology was officially launched in 2016. It has since reached a large audience while creating an impressive and prolific community. Thanks to the community, it’s possible to find many guides, best standards, projects and frameworks that make the development process straightforward and pleasant.
When starting the development phase, we encountered a few challenges:
- Choosing a framework
- Data validation
- Management of Protocol Buffer files and gRPC code generation
Our server sidebase technology is Java with Spring Boot. It’s feasible to create a raw implementation of a gRPC server, although from the strict development perspective, it would be easier and take much less time to not reinvent the wheel and instead use existing frameworks integrating Spring Boot with gRPC. Spring Boot is a framework that is well-known by our teams, so being able to use many of its features in gRPC applications is a huge advantage. Thanks to the community, this type of solution already exists.
Even though our microservices are only used internally, data validation is something that benefits all teams, especially in a complex distributed system such as ours. From the user perspective, it clearly shows what the data expectations are from particular procedures. From the developer perspective, it allows us to write code that can assume the specific form of an incoming user’s data. Some effort is still required, but these advantages are definitely attractive. When starting an implementation concept, it’s possible to easily spot two issues:
- How and where should it be documented
- Implementing our own (with usage of a third-party validation library) validation rules for convenient usage at the project level
These two issues are addressed in a community-driven protoc-gen-validate project.
The main assumption is to apply validation rules directly into Protocol Buffers files. That’s convenient, since our rules will be visible there, where developers are looking for information about the capabilities of a particular service. Thus developers need not look anywhere else for documentation just describing validation. That resolved our documentation concern.
The most important feature of protoc-get-validate library is that it parses the validation rules from Protocol Buffer files and generates validation classes for each covered validation input data class. That significantly simplifies the input data verification code and fully detaches the application’s responsibility to a third-party library. Moreover, this solution is extraordinarily convenient, since the definition of validation rules in the Protocol Buffer file will be reflected in the code out of the box. As it turns out, this solution resolved our implementation concerns.
Further, protoc-get-validate has an interesting additional feature: the possibility of validating data at the client service, without making any request to the server — all thanks to the generated validation classes that can be reused at the client-side app to process data verification at that level. It’s worth mentioning that it might be useful while creating automated tests at the client’s application. In those, it would be feasible to verify if the built message will pass the validation.
At CrowdStrike, something we take seriously is the ability to easily test our components. In a previous blog post, we talked about our other testing considerations and effort required to test untestable software. Take a look if you’d like to dive into another area of this environment.
Management of Protocol Buffer Files and gRPC Code Generation
As of this writing, there’s no official recommendation for managing Protocol Buffer files. The starting point is that every client service that will interact with our server will need to have access to classes generated from the Protocol Buffer file. That may lead to an unacceptable duplication of Protocol Buffer files among every application, putting the responsibility onto the application’s build process to generate gRPC classes.
Once we have many applications, using duplicated Protocol Buffer files starts to be inconvenient, error-prone and less easy to maintain. Because of this, when we change one file, we need to do the same in multiple places.
To address the problem of Protocol Buffer file duplication and the need for generating gRPC code from them by each application, we applied the following flow and rules. For this example, let’s assume that our clients are using Python, Go and PHP and the server is written in Java.
- Protocol Buffer files for each application are stored in separate Version Control repositories. For instance, application’s A files will be stored in repository “A_ProtoFiles” and application’s B in “B_ProtoFiles.” Alternatively (depending on technology), those Protocol Buffer files can be attached to a single “ProtoFiles” repository and then separated per application level by directories.
- After merging changes from a Pull Request to the “master” branch, Java’s Gradle builds gRPC code packages from updated Protocol Buffer files. If there are two client applications (e.g., using PHP and Python, and the server one is written in Java), libraries will be generated for each language.
- The built code is versioned and pushed to software management repositories.
- Each application fetches the built gRPC code from the respective software management repository (e.g., Nexus).
Our proposed solution addresses a few issues:
- Prevents Protocol Buffer file duplication among projects and simplifies their management
- Adds Protocol Buffer file versioning
- Reduces the application’s environment complexity as it doesn’t need to have attached binaries that would allow applications to generate gRPC code by themselves.
Downside of these solutions? None found!
gRPC is not a technology that fits every case. With its limitation for working in web applications, but with an ability to transfer data efficiently regardless of its size, and the opportunity to use various data transfer approaches (unary calls, various streamings), it becomes a great candidate for handling the communication in internal networks such as our in-house microservices architecture.
After being around for only a few years, and its substantial company and community engagement, many crucial components required for building developer and production-ready environments are readily available (such as documentation and guides, Load Balancers, CLI and GUI clients, third-party libraries and frameworks). Moreover, the ability of gRPC to work well in multi-language environments, and to enforce a service contract, highlight it as a technology very suited for use in environments designed to take advantage of microservices.
Considering all of this, we summarize gRPC advantages and disadvantages:
- Usage of all of HTTP/2 goodies (like multiplexing, binary framing and data streaming) that in the end resolves to better performance and higher throughput
- Decreased memory and CPU consumption (especially while using streams)
- Strongly typed messages
- Built-in message code generation
- Provides client-server contract
- Supports multi-language environments
- Supports multiple operating systems
- Simple to use
- Community provides good enough support
- Applicable to production environments
- Limited browser support
- Supported only in leading programming languages
- Steeper learning curve
After debates with the Hybrid Analysis engineering team and POC creation, we decided to closely monitor one gRPC service in our production environment and steadily adopt that technology on more of our internal microservices.
- Learn more about the CrowdStrike Falcon® platform by visiting the product webpage.
- Test CrowdStrike® next-gen AV for yourself. Start your free trial of Falcon Prevent™ today.