Using immutable infrastructure to achieve cloud security
Maintaining cloud infrastructure, especially compute components, requires a lot of effort – from patch management, secure configuration, and more.
Other than the efforts it takes for the maintenance part, it simply will not scale.
Will we be able to support our workloads when we need to scale to thousands of machines at peak?
Immutable infrastructure is a deployment method where compute components (virtual machines, containers, etc.) are never updated – we simply replace a running component with a new one and decommission the old one.
Immutable infrastructure has its advantages, such as:
- No dependent on previous VM/container state
- No configuration drifts
- The fast configuration management process
- Easy horizontal scaling
- Simple rollback/recovery process
The Twelve-Factor App
Designing modern or cloud-native applications requires us to follow 12 principles, documents in https://12factor.net
Looking at this guide, we see that factor number 3 (config) guides us to store configuration in environment variables, outside our code (or VMs/containers).
For further reading, see:
- The Twelve-Factor App – Config
- AWS – Applying the Twelve-Factor App Methodology to Serverless Applications
- Azure – The Twelve-Factor Application
- GCP – Twelve-factor app development on Google Cloud
https://cloud.google.com/architecture/twelve-factor-app-development-on-gcp#3_configuration
If we continue to follow the guidelines, factor number 6 (processes) guides us to create stateless processes, meaning, separating the execution environment and the data, and keeping all stateful or permanent data in an external service such as a database or object storage.
For further reading, see:
- The Twelve-Factor App – Processes
https://12factor.net/processes
How do we migrate to immutable infrastructure?
Build a golden image
Follow the cloud vendor’s documentation about how to download the latest VM image or container image (from a container registry), update security patches, binaries, and libraries to the latest version, customize the image to suit the application’s needs, and store the image in a central image repository.
It is essential to copy/install only necessary components inside the image and remove any unnecessary components – it will allow you to keep a minimal image size and decrease the attack surface.
It is recommended to sign your image during the storage process in your private registry, to make sure it was not changed and that it was created by a known source.
For further reading, see:
- Automate OS Image Build Pipelines with EC2 Image Builder
https://aws.amazon.com/blogs/aws/automate-os-image-build-pipelines-with-ec2-image-builder/
- Creating a container image for use on Amazon ECS
https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-container-image.html
- Azure VM Image Builder overview
https://learn.microsoft.com/en-us/azure/virtual-machines/image-builder-overview
- Build and deploy container images in the cloud with Azure Container Registry Tasks
https://learn.microsoft.com/en-us/azure/container-registry/container-registry-tutorial-quick-task
- Create custom images
https://cloud.google.com/compute/docs/images/create-custom
- Building container images
https://cloud.google.com/build/docs/building/build-containers
Create deployment pipeline
Create a CI/CD pipeline to automate the following process:
- Check for new software/binaries/library versions against well-known and signed repositories
- Pull the latest image from your private image repository
- Update the image with the latest software and configuration changes in your image registry
- Run automated tests (unit tests, functional tests, acceptance tests, integration tests) to make sure the new build does not break the application
- Gradually deploy a new version of your VMs / containers and decommission old versions
For further reading, see:
- Create an image pipeline using the EC2 Image Builder console wizard
https://docs.aws.amazon.com/imagebuilder/latest/userguide/start-build-image-pipeline.html
- Create a container image pipeline using the EC2 Image Builder console wizard
https://docs.aws.amazon.com/imagebuilder/latest/userguide/start-build-container-pipeline.html
- Streamline your custom image-building process with the Azure VM Image Builder service
- Build a container image to deploy apps using Azure Pipelines
https://learn.microsoft.com/en-us/azure/devops/pipelines/ecosystems/containers/build-image
- Creating the secure image pipeline
https://cloud.google.com/software-supply-chain-security/docs/create-secure-image-pipeline
- Using the secure image pipeline
https://cloud.google.com/software-supply-chain-security/docs/use-image-pipeline
Continues monitoring
Continuously monitor for compliance against your desired configuration settings, security best practices (such as CIS benchmark hardening settings), and well-known software vulnerabilities.
In case any of the above is met, create an automated process, and use your previously created pipeline to replace the currently running images with the latest image version from your registry.
For further reading, see:
- How to Set Up Continuous Golden AMI Vulnerability Assessments with Amazon Inspector
- Scanning Amazon ECR container images with Amazon Inspector
https://docs.aws.amazon.com/inspector/latest/user/enable-disable-scanning-ecr.html
- Manage virtual machine compliance
- Use Defender for Containers to scan your Azure Container Registry images for vulnerabilities
- Automatically scan container images for known vulnerabilities
https://cloud.google.com/kubernetes-engine/docs/how-to/security-posture-vulnerability-scanning
Summary
In this article, we have reviewed the concept of immutable infrastructure, its benefits, and the process for creating a secure, automated, and scalable solution for building immutable infrastructure in the cloud.
References
- The History of Pets vs Cattle and How to Use the Analogy Properly
https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
- Deploy using immutable infrastructure
- Immutable infrastructure CI/CD using Jenkins and Terraform on Azure
- Automate your deployments
https://cloud.google.com/architecture/framework/operational-excellence/automate-your-deployments
Where is the OSI model in the public cloud?
When talking about the public cloud, I always like the analogy to the OSI model.
“The Open Systems Interconnection model (OSI model) is a conceptual model. Communications between a computing system are split into seven different abstraction layers: Physical, Data Link, Network, Transport, Session, Presentation, and Application” (Wikipedia)
A similar and shorter model of the OSI model is the TCP/IP model.
Here is a comparison of the two models:
In the public cloud, we find a similar concept when talking about the shared responsibility model, where we draw the line of responsibility between the public cloud provider and the customers, in the different cloud service models, usually in terms of security, as we can see in the diagram below:
Where do public cloud services fit in the OSI model?
There are many networks related services in each of the major public cloud providers.
To make things easy to understand, I have prepared the following diagram, comparing common network-related services to the various OSI model layers:
Encryption / Cryptography and the OSI Model
Layer 6 of the OSI model is the presentation layer.
Among the things, we can find in this layer is data encryption.
Encryption in this context is about encryption at rest – from object storage, block storage, file storage, and various data services.
Encryption includes symmetric and asymmetric encryption keys, secrets, passwords, API keys, certificates, etc.
The process includes the generation, storage, retrieval, and rotation of encryption keys.
Here are the most common encryption /cryptography-related services:
Identity Management and the OSI Model
Layer 7 of the OSI model is the application layer.
Among the things we can find in this layer are related to authentication and authorization, or the entire identity management.
Identity management is about managing the entire lifecycle of identity – from an end user, service account, computer accounts, etc.
The process includes account provisioning, password management (and MFA), permission management (role assignments), and finally account de-provisioning.
Here are the most common identity-related services:
How does everything come together?
When reviewing a cloud architecture, I like to compare the various services in the architecture to the different layers of the OSI model, from the bottom up:
- Network connectivity and traffic flow
- Encryption (according to data sensitivity)
- Authentication and Authorization (according to the least privilege principle)
The OSI model analogy, assist me to make sure I do not forget any important aspect when reviewing an architecture for a cloud workload.
Sustainability in the cloud era
When thinking about cloud computing, we immediately think about technology.
Have we ever stopped to think about how much energy this sort of technology requires to operate an average cloud data center, and what is the environmental effect of running such huge data centers around the world?
Data centers generate around 1% of the energy consumed around the world, daily.
Data centers consume a lot of energy – electricity (for running the servers) and water (for cooling the servers).
The more energy a common data center consumes, the bigger its carbon footprint (the total amount of greenhouse gases that is generated by running a data center).
In the past couple of years, there is a new concept for professionals working with cloud services, with high environmental awareness called cloud sustainability.
The idea behind it (from a cloud provider’s point of view) is to achieve 100% renewable energy – replace fuel-based electricity with wind and solar power, within a few years.
All major cloud providers (AWS, Azure, and GCP) put a lot of effort into building a new data center to be powered by green energy and making changes to the existing data center to lower their emissions as much as possible and use green energy as well.
To remain transparent to their customers, the major cloud providers have created carbon footprint tools:
- AWS customer carbon footprint tool
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/what-is-ccft.html
- Microsoft Sustainability Calculator
https://aka.ms/SustainabilityCalculator
- GCP Carbon Footprint
https://cloud.google.com/carbon-footprint
- Cloud Carbon Footprint (Open source) tool
https://www.cloudcarbonfootprint.org/docs/getting-started
Indeed, most of the responsibility for keeping the cloud data centers green is under the responsibility of the cloud providers, since they build and maintain their data centers, but what is our responsibility as consumers?
As an example, here is AWS’s point of view regarding the shared responsibility model, in the context of sustainability:
How to act as responsible cloud consumers?
Region selection
Review business requirements (compliance, latency, cost, service, and features), and pay attention to regions with a low carbon footprint.
Additional information:
- AWS – What to Consider when Selecting a Region for your Workloads
- Carbon-free energy for Google Cloud regions
https://cloud.google.com/sustainability/region-carbon
- Measuring greenhouse gas emissions in data centers: the environmental impact of cloud computing
https://www.climatiq.io/blog/measure-greenhouse-gas-emissions-carbon-data-centres-cloud-computing
Architecture design considerations
Use cloud-native design patterns:
- Microservices – use containers (and Kubernetes) to deploy your applications and leverage the scaling capabilities of the cloud
- Serverless – use serverless (or function as a service) whenever you can decouple your applications into small functions
- Use message queues as much as possible, to decouple your applications and lower the number of requests between the various services/components
- Use caching mechanisms to lower the number of queries to backend systems
Infrastructure considerations
Embed the following as part of your infrastructure considerations:
- Right-sizing – when using VMs, always remember to right-size the VM size to your application demands
- Use up-to-date hardware – when using VMs, always use the latest VM family types and the latest block storage type, to suit your application demands
- ARM-based processors – consider using ARM processors (such as AWS Graviton Processor, Azure Ampere Altra Arm-based processors, GCP Ampere Altra Arm processors, and more), whenever your application supports the ARM technology (for better performance and lower cost)
- Idle hardware – monitor and shut down (or even delete) unused or idle hardware (VMs, databases, etc.)
- GPU – use GPUs only for tasks that are considered more efficient than CPUs (such as machine learning, rendering, transcoding, etc.)
- Spot instances – use spot instances, whenever your application supports sudden interruptions
- Schedule automatic start and stop of VMs – use scheduling capabilities (such as AWS Instance scheduler, Azure Start/Stop VMs, GCP start and stop virtual machine (VM) instances, etc.) to control the behavior of your workload VMs
- Managed services – prefer to use PaaS or managed services (from databases, storage, load-balancers, and more)
- Data lifecycle management – use object storage (or file storage) lifecycle policies to archive or remove unused or unnecessary data
- Auto-scaling – use the cloud built-in capabilities to scale horizontally according to your application load
- Content Delivery Network – use CDN (such as Amazon CloudFront, Azure Content Delivery Network, Google Cloud CDN, etc.) to lower the amount of customer traffic to your publicly exposed workloads
Summary
Sustainability and green computing are here to stay.
Although the large demand for cloud services has a huge environmental impact, I strongly believe that the use of cloud services is much more environmentally friendly than any use of legacy data center, for the following reasons:
- Efficient hardware utilization (nearly 100% of hardware utilization)
- Fast hardware replacement (due to high utilization)
- Better energy use (high use of renewable energy sources to support the electricity requirements)
I advise all cloud customers, to put sustainability higher in their design considerations.
Additional reading materials
- AWS Well-Architected Framework – Sustainability Pillar
https://docs.aws.amazon.com/wellarchitected/latest/sustainability-pillar/sustainability-pillar.html
- Microsoft Azure Well-Architected Framework – Sustainability
- Google Cloud – Design for environmental sustainability
https://cloud.google.com/architecture/framework/system-design/sustainability
Data protection in cloud services
Storing data in the cloud, raise questions regarding data protection.
Data can be customers’ data (PII, healthcare data, credit cards, etc.), company data (financial information, trade secrets, security vulnerabilities, etc.), or any information with value to our organization.
As in the traditional data center, we still have concerns regarding who has access to our data and what can he do with the access provided.
In this blog post, I will review the required controls for protecting data stored in cloud services.
Data discovery and classification
The first action we need to take regarding sensitive data is discovery and classification.
Data classification is the action of assigning labels or categories to our data, such as public information, internal, confidential, highly confidential, etc.
Discovery tools allow us to detect where we store sensitive information in storage locations such as object storage, file storage, databases, and more.
Examples of services for the discovery process:
- Amazon Macie – discover sensitive information stored in Amazon S3 buckets.
- Microsoft Purview – map and discover data on-premise and in the cloud.
Entitlement
Entitlement deals with the questions – who has access, to what resources, and what can he do with his access rights?
In any access request, we should always make sure the identity (human, service account, computer account, etc.) is authenticated against our system, preferably using a central identity provider.
Once the identity is authenticated against our system, we need to make sure it has proper access rights to take the exact number of privileges required to accomplish its desired task, according to the principle of least privilege (such as view configuration, read customer data, update records, etc.)
Entitlement combines authentication with authorization.
Examples of services for entitlements:
- AWS IAM Access Analyzer – detects AWS resources with permissions belonging to external identities and generates least privilege policies.
- Azure AD Identity Governance – assists in making sure an identity has the right access to the right resource.
Encryption
To protect data, we need to protect it in any state the data resides:
- Data in transit – all cloud services (from object storage, file storage, and databases) support encryption in transit using TLS protocol. Unlike the traditional data center where encryption in transit was either not supported or required an additional effort from our side, in the cloud, services support encryption in transit by default, and in many cases, we have no option to disable this feature.
- Data at rest – all cloud storage services (from object storage, file storage, and databases) support encryption at rest using the AES256 algorithm.
In the traditional data center, encryption key management and key rotation were challenging.
Today, most cloud providers allow us to choose between encryption at rest using encryption keys generated and managed by the cloud provider, or using encryption keys that we generate and control (to minimize the risk of rough cloud provider admin having access to our data).
Examples of services for storing encryption keys and sensitive data:
AWS KMS – controls the entire lifecycle of cryptographic keys.
AWS Secrets Manager – controls the entire lifecycle of secrets, credentials, API keys, etc.
Azure Key Vault – controls the entire lifecycle of cryptographic keys, secrets, credentials, API keys, etc.
- Data in use – even if we encrypt the data while in transit and while at rest, at some point, we need to have the data accessible for reading or update, while in the memory of a server in the cloud. The common name for this technology is “confidential computing“, which in most cases relies upon hardware capabilities to encrypt data and make sure data in memory is kept confidential.
Examples of solutions that provide confidential computing capabilities:
- AWS Nitro Enclaves – isolates data stored in the memory of EC2 instances.
- Azure Confidential Computing – isolates data stored in the memory of virtual machines and Azure Kubernetes Service nodes.
Auditing and threat detection
The final action we need to take protecting data is to audit who accessed our data and detect anomalous behavior with actions performed on our data.
Although it is considered a detective control, it is still an important phase in data control.
Examples of services that perform audit trails:
- AWS CloudTrail – record all API actions done on AWS services.
- Azure Monitor – record all operations done on Azure resources.
Now that we record all actions, we need a solution to review the logs and notify us about anomalous behavior that requires our attention.
Examples of threat detection services:
- Amazon GuardDuty – detect anomalies from (among other) CloudTrail logs.
- Microsoft Defender for Cloud – detect anomalies in actions conducted against services such as Azure SQL and Azure storage.
Summary
In this blog post, I have reviewed the necessary controls for protecting data stored in the cloud.
It is essential to understand that to get effective protection for data stored in the cloud, we must configure strong controls of both encryption at rest (preferred with customer-managed encryption keys), combined with entitlement process (which enforces the least privilege) – we cannot rely on single security control and pray that no unauthorized person will ever access our data.
Automation as key to cloud adoption success
After deploying several workloads in the public cloud, making mistakes, failing, fixing, and beginning using the cloud for production workloads, it is now the time to think about the next step in cloud adoption.
To be able to fully embrace the benefits of the public cloud, the scale, the elasticity, and the short time it takes to deploy new resources, it is time to put automation in place.
Automation allows us to do the same tasks over and over again, deploying the same configuration to multiple environments (Dev, Test, Prod) and get the same results – no human errors (assuming you have tested your code…)
Automation can be achieved in various ways – from using the CLI, using the cloud vendor’s SDK (languages such as Python, Go, Java, and more), or using Infrastructure as Code (such as Terraform, AWS CloudFormation, Azure Resource Manager, and more).
In this article, we shall review some of the common alternatives for using automation using code.
Why use code?
The clear benefit of using code for automation is the ability to have change management. Simply choose your favorite source control (such as GitHub, AWS CodeCommit, Azure Repos, and more), upload your scripts and have the version history of your code, and be able to know at each stage who made changes to the code.
Another benefit of using code for automation is the fact that the Internet is full of samples you can find to automate (almost) anything in your cloud environment.
The downside of doing everything using code, is the learning curve required by your organization’s IT or DevOps teams, learning new languages, but once they pass this stage, you can have all the benefits of the scripting languages.
Automation – the AWS way
If AWS is your sole cloud provider, you should learn and start using the following built-in services or capabilities offered by AWS:
Infrastructure as Code
- AWS CloudFormation – The built-in IaC for deploying and managing AWS resources.
Reference: https://github.com/aws-cloudformation/aws-cloudformation-samples
- AWS Cloud Development Kit (AWS CDK) – Ability to write CloudFormation templates, based on common programming languages such as Python, Java, DotNet, and more.
Reference: https://github.com/aws-samples/aws-cdk-examples
Policy as Code
- Service control policies (SCPs) – Managing permissions in AWS Organizations.
Reference: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps_examples.html
CI/CD pipeline
- AWS CodePipeline – A fully managed continuous delivery service.
Reference: https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials.html
Containers and Kubernetes
- Amazon ECS – Container management service based on the AWS platform.
Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/example_task_definitions.html
- Amazon Elastic Kubernetes Service (EKS) – Managed Kubernetes service.
Reference: https://github.com/aws-quickstart/quickstart-amazon-eks
Automation – the Azure way
If Azure is your sole cloud provider, you should learn and start using the following built-in services or capabilities offered by Azure:
Infrastructure as Code
- Azure Resource Manager templates (ARM templates) – The built-in IaC for deploying and managing Azure resources.
Reference: https://github.com/Azure/azure-quickstart-templates
- Bicep – Declarative language for deploying Azure resources.
Reference: https://github.com/Azure/azure-docs-bicep-samples
Policy as Code
- Azure Policy – Enforce organizational standards across the Azure organization.
Reference: https://github.com/Azure/azure-policy
CI/CD pipeline
- Azure Pipelines – A fully managed continuous delivery service.
Reference: https://github.com/microsoft/azure-pipelines-yaml
Containers and Kubernetes
- Azure Container Instances – Container management service based on the Azure platform.
Reference: https://docs.microsoft.com/en-us/samples/browse/?products=azure&terms=container%2Binstance
- Azure Kubernetes Service (AKS) – Managed Kubernetes service.
Reference: https://github.com/Azure/AKS
Automation – the Google Cloud way
If GCP is your sole cloud provider, you should learn and start using the following built-in services or capabilities offered by GCP:
Infrastructure as Code
- Google Cloud Deployment Manager – The built-in IaC for deploying and managing GCP resources.
Reference: https://github.com/GoogleCloudPlatform/deploymentmanager-samples
Policy as Code
- Google Organization Policy Service – Programmatic control over the organization’s cloud resources.
CI/CD pipeline
- Google Cloud Build – A fully managed continuous delivery service.
Reference: https://github.com/GoogleCloudPlatform/cloud-build-samples
Containers and Kubernetes
- Google Kubernetes Engine (GKE) – Managed Kubernetes service.
Reference: https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
Automation – the cloud agnostic way
If you plan for the future, plan for multi-cloud. Look for solutions that are capable of connecting to multiple cloud environments, to decrease the learning curve of your DevOps team learning the various scripting languages and being able to deploy workloads on several cloud environments.
Infrastructure as Code
- Hashicorp Terraform – The most widely used IaC for deploying and managing resources on both cloud and on-premise.
Reference: https://registry.terraform.io/browse/providers
Policy as Code
- Hashicorp Sentinel – Policy as code framework that compliments Terraform code.
Reference: https://www.terraform.io/cloud-docs/sentinel/examples
CI/CD pipeline
- Jenkins – The most widely used open-source CI/CD tool.
Reference: https://www.jenkins.io/doc/pipeline/examples/
Containers and Kubernetes
- Docker – The most widely used container run-time for deploying applications.
Reference: https://github.com/dockersamples
- Kubernetes – The most widely used container orchestration open-source platform.
Reference: https://github.com/kubernetes/examples
Summary
In this post, I have reviewed the most common solutions that allow you to automate your workloads’ deployment, management, and maintenance using various scripting languages.
Some of the solutions are bound to a specific cloud provider, while others are considered cloud agnostic.
Use automation to fully embrace the power and benefits of the public cloud.
If you don’t have experience writing code, take the time to learn. The more you practice, the more experience you will gain.
As Werner Vogels, the Amazon CTO always says – “Go Build”.
Cloud and the shared responsibility model misconceptions
One of the most common concepts working with cloud services is the “Shared responsibility mode”.
The model is aim to set the responsibility boundaries between the cloud service provider and the cloud service consumer, depending on the cloud service model (IaaS, PaaS, SaaS).
In this post, I will review common misconceptions regarding the shared responsibility model.
Misconception #1 — My cloud provider’s certifications allow me to comply with regulations
This is a common misconception for companies (and new SaaS providers) who fail to understand the shared responsibility model while deploying their first workload.
Reviewing cloud providers’ compliance pages, we can see that the providers have already certified themselves for most regulations and local laws, and in some cases even offer customers special environments that are already in compliance with regulations such as PCI-DSS or HIPAA.
If you are planning to store sensitive customers’ data (from PII, healthcare, financial, or any other types of sensitive data) in a public cloud, keep in mind that according to the shared responsibility model, the cloud provider is responsible only for the lower layers of the architecture:
· IaaS — the CSP is responsible for all layers, from the physical layer to the virtualization layer
· PaaS — the CSP is responsible for all layers, from the physical layer to the guest operating system, middleware, and even runtime
· SaaS — the CSP is responsible for all layers, from the physical layer to the application layer
Bottom line — the fact that a CSP has all the relevant certifications, means almost nothing when talking about compliance with regulations or protecting customers’ data.
Each organization storing sensitive data in the cloud must conduct a risk assessment, review which data is stored in the cloud (before storing data in the cloud), and set the proper controls to protect customers’ data.
Misconception #2 — Who is responsible for protecting my data?
When customers (either organizations or personal customers) store their data in public cloud services, they sometimes mistakenly think that if they store their data in one of the major CSPs, their data is protected.
This is a misconception.
All major CSPs offer their customers a large variety of services and tools to protect their customers’ data (from network access control lists, encryption in transit and at rest, authentication, authorization, auditing, and more), however, according to the shared responsibility model, it is up to the customer (mostly organizations storing their data in the cloud), to decide which security controls to implement.
In most cases, the CSPs don’t have access to customers’ data stored in the cloud, whether organizations decide to use managed storage services (from object storage to managed CIFS/NFS services), managed database services (from relational databases to NoSQL databases) and more.
The most obvious exception to the mentioned above is SaaS services, where we allow CSP service accounts access to our data, to allow us to perform queries, get insights about our data or even perform regular backups — the access is mostly strict to specific actions, to a specific role or service account, and usually shouldn’t be used by the CSP employees.
At the end of the day, the customer is always the data owner, and as a data owner, the customer must decide whether or not to store sensitive data in the cloud, who should have access to the data stored in the cloud, what access rights do we allow people to access and update/delete our data, and more.
Misconception #3 — Availability is not my concern since the cloud is highly available by design
The above headline is true, mainly for major SaaS services.
When looking at availability and building highly available architectures, specifically in IaaS and PaaS, it is up to us, as organizations, to use the services and the service capabilities that CSPs offer us, to build highly available solutions.
Just because we decided to deploy our application on a VM or store our data in a managed database service, but we failed to deploy everything behind a load-balancer or in a cluster, will not guarantee us the availability that our customers expect.
Even if we are using managed object storage services and we choose a low redundancy tier, using a single availability zone, the CSP does not guarantee high availability.
To achieve high availability to our workloads, we need to review cloud providers’ documentation, such as “Well architected frameworks” and design our workloads to fit business needs.
Misconception #4 — Incident response in the cloud is an impossible mission
This part is a little bit tricky.
Since as AWS always mention, they are responsible for the security of the cloud — they are responsible for the incident response process of the cloud infrastructure, from the physical data center, the host OS, the network equipment, the virtualization, and all the managed services.
We, as customers of cloud services, are responsible for security within our cloud environments.
In IaaS, everything within the guest OS is our responsibility as customers of the cloud.
It is our responsibility to enable auditing as much as possible, and send all logs to a central log repository and from there to our SIEM system (whether it is located on-premise or in a managed cloud service).
There are also documented procedures for building a forensics environment, made out of snapshots of our VMs or databases, for further analysis.
It is not perfect; we still don’t control the entire flow of the packet from the lower network layers to the application layer, and on managed PaaS services we only have audit logs and we can’t perform memory analysis of managed services (such as databases).
In SaaS services, it gets even worse since, in at best case, the SaaS provider is mature enough to allow us to pull audit logs using API and send them to our SIEM system for further analysis — unfortunately, not all SaaS providers are mature enough to provide us access to the audit logs.
Bottom line — challenging, but not completely impossible. Depending on the cloud service model and the maturity of the cloud provider.
Summary
It is important to understand the shared responsibility model, but what is more important is to understand the cloud service model and services or tools available for us, to enable us to build secure and highly available cloud environments.
References
· AWS Compliance Programs
https://aws.amazon.com/compliance/programs
· Azure compliance documentation
https://docs.microsoft.com/en-us/azure/compliance
· GCP Compliance offerings
https://cloud.google.com/security/compliance/offerings
· AWS Well-Architected Framework
https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html
· Forensic investigation environment strategies in the AWS Cloud
https://aws.amazon.com/blogs/security/forensic-investigation-environment-strategies-in-the-aws-cloud
· Computer forensics chain of custody in Azure
https://docs.microsoft.com/en-us/azure/architecture/example-scenario/forensics
Introduction to Policy as Code
Building our first environment in the cloud, or perhaps migrating our first couple of workloads to the cloud is fairly easy until we begin the ongoing maintenance of the environment.
Pretty soon we start to realize we are losing control over our environment – from configuration changes, forgetting to implement security best practices, and more.
At this stage, we wish we could have gone back, rebuilt everything from scratch, and have much more strict rules for creating new resources and their configuration.
Manual configuration simply doesn’t scale.
Developers would like to focus on what they do best – developing new products or features, while security teams would like to enforce guard rails, allowing developers to do their work, while still enforcing security best practices.
In the past couple of years, one of the hottest topics is called Infrastructure as Code, a declarative way to deploy new environments using code (mostly JSON or YAML format).
Infrastructure as Code is a good solution for deploying a new environment or even reusing some of the code to deploy several environments, however, it is meant for a specific task.
What happens when we would like to set guard rails on an entire cloud account or even on our entire cloud organization environment, containing multiple accounts, which may expand or change daily?
This is where Policy as Code comes into the picture.
Policy as Code allows you to write high-level rules and assign them to an entire cloud environment, to be effective on any existing or new product or service we deploy or consume.
Policy as Code allows security teams to define security, governance, and compliance policies according to business needs and assign them at the organizational level.
The easiest way to explain it is – can user X perform action Y on resource Z?
A more practical example from the AWS realm – block the ability to create a public S3 bucket. Once the policy was set and assigned, security teams won’t need to worry whether or not someone made a mistake and left a publicly accessible S3 bucket – the policy will simply block this action.
Looking for a code example to achieve the above goal? See:
https://aws-samples.github.io/aws-iam-permissions-guardrails/guardrails/scp-guardrails.html#scp-s3-1
Policy as Code on AWS
When designing a multi-account environment based on the AWS platform, you should use AWS Control Tower.
The AWS Control Tower is aim to assist organizations deploying multiple AWS accounts under the same AWS organization, with the ability to deploy policies (or Service Control Policies) from a central location, allowing you to have the same policies for every newly created AWS account.
Example of governance policy:
- Enabling resource creation in a specific region – this capability will allow European customers to restrict resource creation in regions outside Europe, to comply with the GDPR.
- Allow only specific EC2 instance types (to preserve cost).
Example of security policies:
- Prevent upload of unencrypted objects to S3 bucket, to protect access to sensitive objects.
https://aws-samples.github.io/aws-iam-permissions-guardrails/guardrails/scp-guardrails.html#scp-s3-2
- Deny the use of the Root user account (least privilege best practice).
AWS Control Tower allows you to configure baseline policies using CloudFormation templates, over an entire AWS organization, or on a specific AWS account.
To further assist in writing CloudFormation templates and service control policies on large scale, AWS offers some additional tools:
Customizations for AWS Control Tower (CfCT) – ability to customize AWS accounts and OU’s, make sure governance and security policies remain synched with security best practices.
AWS CloudFormation Guard – ability to check for CloudFormation templates compliance against pre-defined policies.
Summary
Policy as Code allows an organization to automate governance and security policies deployment on large scale, keeping AWS organizations and accounts secure, while allowing developers to invest time in developing new products, with minimal required changes to their code, to be compliant with organizational policies.
References
- Best Practices for AWS Organizations Service Control Policies in a Multi-Account Environment
- AWS IAM Permissions Guardrails
https://aws-samples.github.io/aws-iam-permissions-guardrails/guardrails/scp-guardrails.html
- AWS Organizations – general examples
- Customizations for AWS Control Tower (CfCT) overview
https://docs.aws.amazon.com/controltower/latest/userguide/cfct-overview.html
- Policy-as-Code for Securing AWS and Third-Party Resource Types
https://aws.amazon.com/blogs/mt/policy-as-code-for-securing-aws-and-third-party-resource-types/
Journey for writing my first book about cloud security
My name is Eyal, and I am a cloud architect.
I have been in the IT industry since 1998 and began working with public clouds in 2015.
Over the years I have gained hands-on experience working on the infrastructure side of AWS, Azure, and GCP.
The more I worked with the various services from the three major cloud providers, the more I had the urge to compare the cloud providers’ capabilities, and I have shared several blog posts comparing the services.
In 2021 I was approached by PACKT publishing after they came across one of my blog posts on social media, and they offered me the opportunity to write a book about cloud security, comparing AWS, Azure, and GCP services and capabilities.
Over the years I have published many blog posts through social media and public websites, but this was my first experience writing an entire book with the support and assistance of a well-known publisher.
As with any previous article, I began by writing down each chapter title and main headlines for each chapter.
Once the chapters were approved, I moved on to write the actual chapters.
For each chapter, I first wrote down the headlines and then began filling them with content.
Before writing each chapter, I have done research on the subject, collected references from the vendors’ documentation, and looked for security best practices.
Once I have completed a chapter, I submitted it for review by the PACKT team.
PACKT team, together with external reviewers, sent me their input, things to change, additional material to add, request for relevant diagrams, and more.
Since copyright and plagiarism are important topics to take care of while writing a book, I have prepared my diagrams and submitted them to PACKT.
Finally, after a lot of review and corrections, which took almost a year, the book draft was submitted to another external reviewer and once comments were fixed, the work on the book (at least from my side as an author) was completed.
From my perspective, the book is unique by the fact that it does not focus on a single public cloud provider, but it constantly compares between the three major cloud providers.
From a reader’s point of view or someone who only works with a single cloud provider, I recommend focusing on the relevant topics according to the target cloud provider.
For each topic, I made a list of best practices, which can also be referenced as a checklist for securing the cloud providers’ environment, and for each recommendation I have added reference for further reading from the vendors’ documentation.
If you are interested in learning how to secure cloud environments based on AWS, Azure, or GCP, my book is available for purchase in one of the following book stores:
- Amazon:
https://www.amazon.com/Cloud-Security-Handbook-effectively-environments/dp/180056919X
- Barnes & Noble:
https://www.barnesandnoble.com/w/cloud-security-handbook-eyal-estrin/1141215482?ean=9781800569195
- PACKT
https://www.packtpub.com/product/cloud-security-handbook/9781800569195
Introduction to cloud financial management on AWS
Cloud financial management (sometimes also referred to as FinOps) is about managing the ongoing cost of cloud services.
Who should care about cloud financial management? Basically, anyone consuming IaaS or PaaS services – from IT, DevOps, developers, architects and naturally finance department.
When we start consume IaaS or PaaS services, we realized that almost any service has its pricing model – we just need to read the service documentation.
Some of the services’ pricing model are easy to understand, such as EC2 (you pay for the amount of time and EC2 instance was up and running), and some of services’ pricing model can be harder to calculate (you pay for the number of times the function was called in a month and the amount of memory allocated to the function).
In this post, we will review the tools that AWS offer us to manage cost.
Step 1 – Cost management for beginners
The first thing that AWS recommend for new customers is to use Amazon CloudWatch to create billing alarms.
Even if you cannot estimate your monthly cost, create a billing alarm (for example – send me email whenever the charges are above 200$). When time goes by, you will be able to adjust the value, per your account usage pattern.
To read more information about billing alarms, see:
If you already know that certain department is using specific AWS account and has a known budget, use AWS budgets, to create a monthly, quarterly or even yearly budget, and configure the budget interface to send you notifications whenever the amount of money consumed is about certain threshold of your pre-defined budget.
To read more about AWS budget creation, see:
https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-create.html
If you wish to visualize your resource consumption over period of time, see trends, generate reports and customize the resource consumption information, use AWS Cost Explorer.
To read more about AWS Cost Explorer, see:
https://docs.aws.amazon.com/cost-management/latest/userguide/ce-what-is.html
Finally, if you wish to receive recommendations about saving costs, you have an easy tool called AWS Trusted Advisor.
The tool helps you get recommendations about cost optimization, performance, security and more.
This tool is the easiest way to get insights about how to save cost on AWS platform.
To read more about AWS Trusted Advisor, see:
https://aws.amazon.com/premiumsupport/knowledge-center/trusted-advisor-cost-optimization
Step 2 – Resource tagging and rightsizing
One of the best ways to detect and monitor cost over time and per business case (project, division, environment, etc.) is to use tagging.
You add descriptive tag for each and every resource you create, that will allow you later on to know which resources has been consumed – for example, which EC2 instances, public IP’s, S3 buckets and RDS instances, all relate to the same project.
For more information about AWS cost allocation tags, see:
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html
If you manage multiple AWS accounts, all relate to the same AWS organization, it is considered best practice to configure all account costs in a single place, also known as consolidated billing.
You will define which AWS account will store billing information, and redirect all AWS accounts in your organization to this central account.
Using consolidated billing, will allow you to achieve volume discount, for example – volume discount for the total data transferred from multiple AWS accounts to the Internet, instead of separate charge per AWS account.
For more information about consolidated billing, see:
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html
By using compute services such as Amazon EC2 or Amazon RDS, you might be wasting money, by not using the right size (amount of memory/CPU) per your actual resource demand (for example – paying on large instance, when it is underutilized).
Tools such as AWS Trusted Advisor mentioned earlier, will help you get insights and recommend you to change instance size, to save money.
Another tool that can assist you choose an optimal size for your instances is AWS Compute Optimizer, which scan your AWS environment and generate recommendations for optimizing your compute resources.
For more information about AWS Cost Optimizer, see:
https://docs.aws.amazon.com/compute-optimizer/latest/ug/getting-started.html
Even when using storage services such as Amazon S3, you can save money, by using the right storage class per actual use (for example Amazon S3 standard for big data analytics, Amazon S3 Glacier for archive, etc.)
There are two options for optimizing S3 cost:
- Using lifecycle policies, you configure how much time will an object stay in specific storage class without using the object, before it moves to a cheaper tier (until the object finally moves into deep archive tier or even deleted completely).
For more information about setting lifecycle policies, see:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html
- Using S3 Intelligent-Tiering, objects will automatically move to the most cost-effective storage tier by their access frequency. Unlike lifecycle policies, object might move between hot storage (such as S3) to archive storage (such as S3 Glacier or deep archive), and vice versa, if an object in an archive tier suddenly was accessed, it will move to hot tier (such as S3).
For more information about S3 Intelligent-tiering, see:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-intelligent-tiering.html
Another simply tip for saving cost is to remove unused resources – from underutilized EC2 instance, unassigned public IP address, unattached EBS volume, etc.
AWS Trusted Advisor can assist you discover underutilized or unused resources.
For more information, see:
Step 3 – Get to know your workloads (cloud optimization)
When you deploy your workload for the first time, you don’t have enough information about its potential usage and cost.
You might choose too small or too large instance type, you might be using too expensive storage tier, etc.
One of the ways to save cost on development or test environments, which might not need to run over weekends or after working hours, is to use AWS Instance scheduler – a combination of tagging and Lambda function, which allow you to schedule instance (both EC2 and RDS) shutdown on pre-defined hours.
For more information about AWS instance scheduler, see:
https://aws.amazon.com/premiumsupport/knowledge-center/stop-start-instance-scheduler
If your workload can survive sudden shutdown and return to function from the moment it stopped (such as video rendering, HPC workloads for genomic sequencing, etc.) and you wish to save money, use AWS Spot instances, which allows you to save up to 90% of the cost, as compared to on-demand cost.
For more information about Spot instances, see:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html#spot-get-started
If your workload has the same usage pattern for long period of time (without shutdown or restart), consider one of the following options:
- Amazon EC2 Reserved Instances – allows you to reserve capacity for 1 or 3 years in advanced, and save up to 72% of the on-demand price.
For more information, see:
https://aws.amazon.com/ec2/pricing/reserved-instances/buyer
- Compute savings plans – commitment to use EC2 instances, regardless of instance family, size, AZ or region. Allows saving up 66% of on-demand price.
- EC2 instance saving plans – commitment to use specific instance family in specific region. Allows saving up to 72% of on-demand price.
For more information, see:
https://aws.amazon.com/savingsplans/faq/#Compute_.26_EC2_Instances_Savings_Plans
Summary
In this introduction post, we have reviewed the most common tools from AWS for detecting, managing and optimizing cost.
Using automated tools, allows organizations to optimize their resource consumption cost over time and over large scale and constant changing environments.
Not all cloud providers are built the same
When organizations debate workload migration to the cloud, they begin to realize the number of public cloud alternatives that exist, both U.S hyper-scale cloud providers and several small to medium European and Asian providers.
The more we study the differences between the cloud providers (both IaaS/PaaS and SaaS providers), we begin to realize that not all cloud providers are built the same.
How can we select a mature cloud provider from all the alternatives?
Transparency
Mature cloud providers will make sure you don’t have to look around their website, to locate their security compliance documents, allow you to download their security controls documentation, such as SOC 2 Type II, CSA Star, CSA Cloud Controls Matrix (CCM), etc.
What happens if we wish to evaluate the cloud provider by ourselves?
Will the cloud provider (no matter what cloud service model), allow me to conduct a security assessment (or even a penetration test), to check the effectiveness of his security controls?
Global presence
When evaluating cloud providers, ask yourself the following questions:
- Does the cloud provider have a local presence near my customers?
- Will I be able to deploy my application in multiple countries around the world?
- In case of an outage, will I be able to continue serving my customers from a different location with minimal effort?
Scale
Deploying an application for the first time, we might not think about it, but what happens in the peak scenario?
Will the cloud provider allow me to deploy hundreds or even thousands of VM’s (or even better, containers), in a short amount of time, for a short period, from the same location?
Will the cloud provider allow me infinite scale to store my data in cloud storage, without having to guess or estimate the storage size?
Multi-tenancy
As customers, we expect our cloud providers to offer us a fully private environment.
We never want to hear about “noisy neighbor” (where one customer is using a lot of resources, which eventually affect other customers), and we never want to hear a provider admits that some or all of the resources (from VMs, database, storage, etc.) are being shared among customers.
Will the cloud provider be able to offer me a commitment to a multi-tenant environment?
Stability
One of the major reasons for migrating to the cloud is the ability to re-architect our services, whether we are still using VMs based on IaaS, databases based on PaaS, or fully managed CRM services based on SaaS.
In all scenarios, we would like to have a stable service with zero downtime.
Will the cloud provider allow me to deploy a service in a redundant architecture, that will survive data center outage or infrastructure availability issues (from authentication services, to compute, storage, or even network infrastructure) and return to business with minimal customer effect?
APIs
In the modern cloud era, everything is based on API (Application programming interface).
Will the cloud provider offer me various APIs?
From deploying an entire production environment in minutes using Infrastructure as Code, to monitoring both performances of our services, cost, and security auditing – everything should be allowed using API, otherwise, it is simply not scale/mature/automated/standard and prone to human mistakes.
Data protection
Encrypting data at transit, using TLS 1.2 is a common standard, but what about encryption at rest?
Will the cloud provider allow me to encrypt a database, object storage, or a simple NFS storage using my encryption keys, inside a secure key management service?
Will the cloud provider allow me to automatically rotate my encryption keys?
What happens if I need to store secrets (credentials, access keys, API keys, etc.)? Will the cloud provider allow me to store my secrets in a secured, managed, and audited location?
In case you are about to store extremely sensitive data (from PII, credit card details, healthcare data, or even military secrets), will the cloud provider offer me a solution for confidential computing, where I can store sensitive data, even in memory (or in use)?
Well architected
A mature cloud provider has a vast amount of expertise to share knowledge with you, about how to build an architecture that will be secure, reliable, performance efficient, cost-optimized, and continually improve the processes you have built.
Will the cloud provider offer me rich documentation on how to achieve all the above-mentioned goals, to provide your customers the best experience?
Will the cloud provider offer me an automated solution for deploying an entire application stack within minutes from a large marketplace?
Cost management
The more we broaden our use of the IaaS / PaaS service, the more we realize that almost every service has its price tag.
We might not prepare for this in advance, but once we begin to receive the monthly bill, we begin to see that we pay a lot of money, sometimes for services we don’t need, or for an expensive tier of a specific service.
Unlike on-premise, most cloud providers offer us a way to lower the monthly bill or pay for what we consume.
Regarding cost management, ask yourself the following questions:
Will the cloud provider charge me for services when I am not consuming them?
Will the cloud provider offer me detailed reports that will allow me to find out what am I paying for?
Will the cloud provider offer me documents and best practices for saving costs?
Summary
Answering the above questions with your preferred cloud provider, will allow you to differentiate a mature cloud provider, from the rest of the alternatives, and to assure you that you have made the right choice selecting a cloud provider.
The answers will provide you with confidence, both when working with a single cloud provider, and when taking a step forward and working in a multi-cloud environment.
References
Security, Trust, Assurance, and Risk (STAR)
https://cloudsecurityalliance.org/star/
SOC 2 – SOC for Service Organizations: Trust Services Criteria
https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc2report.html
Confidential Computing and the Public Cloud
https://eyal-estrin.medium.com/confidential-computing-and-the-public-cloud-fa4de863df3
Confidential computing: an AWS perspective
https://aws.amazon.com/blogs/security/confidential-computing-an-aws-perspective/
AWS Well-Architected
https://aws.amazon.com/architecture/well-architected
Azure Well-Architected Framework
https://docs.microsoft.com/en-us/azure/architecture/framework/
Google Cloud’s Architecture Framework
https://cloud.google.com/architecture/framework
Oracle Architecture Center
https://docs.oracle.com/solutions/
Alibaba Cloud’s Well-Architectured Framework