Cloud Monitoring Definition
Cloud monitoring is the practice of measuring workloads inside cloud tenancies against specific metrics and thresholds.
Cloud monitoring allows you to find out if your cloud-hosted applications are performing within their Service-Level Agreement (SLA), discover any potential security risks, identify any capacity issues, and analyze costs.
In this post, we’ll provide a full 360° overview of monitoring in the context of the public cloud, covering the following:
- Why monitoring the cloud is important
- Cloud services you should monitor
- The different types of monitoring available
- How to monitor cloud services
- Features you should look for in cloud-monitoring platforms
Why Is Cloud Monitoring Important?
Cloud monitoring is part of observability, which is the practice of examining the outputs of a system to understand its internal state. In modern IT, enterprises use observability to obtain a holistic picture of the health of their complex, distributed applications.
Because businesses may run some (or all) of their workloads in the cloud, cloud monitoring is critical to their overall observability strategy. However, cloud monitoring primarily deals with metrics and logs. Let’s consider some practical examples of cloud monitoring in action.
By monitoring your cloud footprint, you can track resource utilization and, from there, optimize cost. For example, if monitoring shows your cloud-based VMs only run at full capacity during business hours, shutting them down during off-hours could save money.
As another example, imagine discovering that your cloud-based applications were running slowly. You could add extra CPU or memory capacity, and these additions could be justified by monitoring the ratio between scaling and performance. When this ratio hits a plateau—indicating that additional capacity or elasticity would no longer improve performance—drilling down further into the metrics and logs could help you surface the root cause of the slowdown.
Monitoring well-performing cloud-based applications helps you create baseline benchmarks. These benchmarks are useful for providing before-after comparison data when you upgrade the infrastructure or add a new feature to the application.
Cloud monitoring can help you with security. By examining the application, server, API gateway, or firewall logs, a monitoring solution can alert you of anomalies, malicious access attempts, or DDoS attacks. The insights from this monitoring can feed into the overall efforts of security hardening.
Which Cloud Services Should You Monitor?
The short answer to our question is this: You should monitor every service you use. Organizations use different types of cloud services, which include:
- Software as a service (SaaS), such as Google Workspace, Microsoft 365, or Salesforce.
- Infrastructure as a service (IaaS), such as AWS, Google Cloud Platform, or Microsoft Azure.
- Platform as a service (PaaS), such as managed web application firewalls, container services, API gateways, or DNS.
- Functions as a service (FaaS), such as AWS Lambda or Google Cloud Functions.
- Database as a service (DBaaS), such as Oracle Cloud, Azure Synapse, or Snowflake.
The amount of monitoring information provided by each platform or service differs.
You should collect whatever metrics and logs are exposed by your cloud platforms that are valuable for your use case. For example, you may not want to capture metrics from your development servers. Similarly, collecting metrics from a small serverless function that only performs a lookup operation may not be useful (even in a production environment), whereas web server access logs or database slow query logs would certainly be important.
Which Cloud Metrics Should You Monitor?
Today’s cloud applications use dozens—or even hundreds—of cloud services across multiple cloud providers. Faced with the massive availability of metrics from such complex setups, operations teams often feel like they are drowning in too much information. The signal-to-noise ratio is often too high, and actual warning signs can slip through unnoticed.
Because of this, it’s important to decide two things: first, what categories of information you need, and second, which relevant pieces of information are worth capturing under each category. A shortlist of categories includes:
- Flow logs
- Network bandwidth usage by servers
- Logs from firewalls, anti-virus software, API gateways, web servers, and database server access
- Failed log-in events
- Object access logs, such as those exposed by AWS S3 buckets
- Microservice call stacks and application logs
- Logs from runtime libraries (for example, log4j)
- Server metrics, such as CPU usage, available memory, disk performance, and I/O latency
- Number of pods scheduled per minute or number of pods crashing per node
- Consider serverless functions only if they perform complex, multi-step actions
- Cron job or event scheduler logs for important scheduled tasks
- Slow query logs
- Performance metrics
While not a complete list, this should give you an idea about where to concentrate your efforts.
How Do You Perform Cloud Monitoring?
Now that you know which cloud platforms and which metrics you should monitor, the question that naturally follows is: How do you monitor?
Most cloud vendors offer their own monitoring service. For example, Amazon CloudWatch and CloudWatch Logs give you insight into most AWS services. Similarly, GCP has its Google Cloud Operations Suite, and Azure has Azure Monitor. Other cloud services—like DigitalOcean—offer some basic metrics, and Snowflake shows its query logs with the associated query plans.
If your business has a multi-cloud footprint, then you may find yourself with hundreds of workloads running in multiple accounts from different cloud providers, with service metrics and logs exposed for each cloud account.
Collecting, aggregating, indexing, and searching through millions of lines of logs, metrics, traces, and events—all while looking for the root cause of a problem—can seem like an impossible task.
That’s why there’s a whole new breed of monitoring and reporting platforms on the market today. These platforms can collect logs and metrics from all your cloud touchpoints, then extract only relevant information, standardize formats, and index for efficient searchability. These platforms can show the overall picture of your multi-cloud applications through intelligent trend analysis, anomaly detection, and dashboards.
Such solutions can either run on-premises or as a subscription-based SaaS. Each has its own benefits and drawbacks. Either way, these applications will use administrative privileges to access your cloud accounts and capture the necessary information. You may need to install special software (like collector agents in the target system), but sometimes such integrations are native to the platform and use common protocols.
Must-Have Features for Cloud Monitoring Platforms
When looking for a cloud-monitoring solution, you need to ensure the presence of certain core features. Primarily, the platform you choose should be easy to set up, configure, and maintain.
It should have most—if not all—of the required integrations compared to your existing and planned systems. If the platform does not come with all of the built-in integrations you’ll need, it should at least support an ecosystem of third-party integrations. When collecting metrics and logs from your cloud-hosted workloads, these integrations should not adversely affect any system’s performance.
Analysis and presentation
The platform should also be able to synthesize ingested data to provide a single pane of glass for your application’s health. It should be customizable so you can tailor it to your organizational needs. For example, if you are looking for cloud cost reports, the monitoring platform should be able to extract year-to-date cost data from cloud accounts, presenting insights in a useful fashion. If you are looking for cybersecurity threats, it should be able to run trend analyses on brute force attacks.
Data storage, indexing, and searching
The volume of logs collected from all your cloud sources can easily go up to terabytes or even petabytes of data. The cloud-monitoring platform should not only be able to store such large volumes of information but also index and search it quickly. It should offer search syntaxes like RegEx or a SQL-like language.
Sanitization or redaction
For some organizations, logs can contain sensitive data—like financial or personal information. Organizations often want to redact or mask such information. They may also have to comply with industry regulations that prohibit them from storing logs in SaaS platforms outside their geographical location. If your company fits these profiles, make sure the monitoring platform can address these requirements.
Log Everything, Answer Anything – For Free
Falcon LogScale Community Edition (previously Humio) offers a free modern log management platform for the cloud. Leverage streaming data ingestion to achieve instant visibility across distributed systems and prevent and resolve incidents.
Falcon LogScale Community Edition, available instantly at no cost, includes the following:
- Ingest up to 16GB per day
- 7-day retention
- No credit card required
- Ongoing access with no trial period
- Index-free logging, real-time alerts and live dashboards
- Access our marketplace and packages, including guides to build new packages
- Learn and collaborate with an active community