web analytics

Cloud Native Applications – Part 2: Security

In chapter 1 of this series about cloud-native applications, we have introduced the key characteristics of cloud-native applications.

In this chapter, we will review how to secure cloud-native applications.

Securing the CI/CD pipeline

Due to the dynamic nature of the cloud-native application, we need to begin securing our application stack from the initial steps of the CI/CD pipeline.

Since I have already written posts on how to secure DevOps processes, automation, and supply chain, I will highlight the following:

  • Run code analysis using automated tools (SAST – Static application security tools, DAST – Dynamic application security tools)
  • Run SCA (Software composition analysis) tool to detect known vulnerabilities in open-source binaries and libraries
  • Sign your software package before storing them in a repository
  • Store all your sources (code, container images, libraries) in a private repository, protected by strong authorization mechanisms
  • Invest in security training for developers, DevOps, and IT personnel
  • Make sure no human access is allowed to production environments (use Break Glass accounts for emergency purposes)

Additional references:

Securing infrastructure build process

As I have mentioned in the previous chapter of this series, one of the characteristics of cloud-native applications is the fact that it is built using Infrastructure as Code.

Each cloud provider has its own IaC scripting language, and naturally, there is cloud agnostic (or multi-cloud…) – HashiCorp Terraform.

Since this is code, we need to store the code in a private repository and scan the code for security vulnerabilities, but we need an additional layer of protection for Infrastructure as Code.

This is referred to as Policy as Code, where we can define a set of controls, from enforcing encryption at transit and rest, enabling resource provisioning on specific regions, or prohibiting the creation of instances with public IP.

The next thing in terms of the policy as code is called OPA – Open Policy Agent. It supports all major cloud providers and has built-in integration with Terraform, Kubernetes, and more.

OPA has its declarative language called Rego and it can integrate inside an existing CI/CD pipeline.

Additional references:

Securing Containers / Kubernetes

Containers are one of the most common ways to package and deploy modern applications, and as a result, we need to secure the containerized environment.

It begins with a minimum number of binaries and libraries inside a container image.

We must make sure we scan our container images for vulnerable binaries or open-source libraries, and eventually, we need to store our container images inside a private container registry.

In most cases, when using Kubernetes as an orchestrator, we should choose a managed Kubernetes service (offered by each of the major cloud providers).

Using a Kubernetes control plane based on a managed service shifts the responsibility for securing and maintaining the Kubernetes control plane on the cloud provider.

One thing to keep in mind – we should always create private clusters, and make sure the control plane is never accessible outside our private subnets, to reduce the attack surface on our Kubernetes cluster.

In terms of authorization, we should follow the principle of least privilege and use RBAC (Role-based access control), to allow our application to function and our developers or support team the minimum number of required permissions to do their job.

In terms of network connectivity to and between pods, we should use one of the service mesh solutions (such as Istio), and set network policies that clearly define which pod can communicate with which pod, and who can access the API server.

In terms of secrets management that the containers need access to, we need to make sure all sensitive data (secrets, credentials, API keys, etc.) are stored in a secured location (such as AWS Secrets Manager, Azure Key Vault, Google Secret Manager, Oracle Cloud Infrastructure Vault or HashiCorp Vault), where all requests to pull a secret are authorized and audited, and secrets can automatically rotate.

Additional references:

Securing APIs

As we have mentioned in the previous chapter, communication between containers is done using APIs. Also, when communicating with applications deployed inside pods as part of the Kubernetes cluster, all communication is done through the Kubernetes API server.

Not to mention that modern applications, websites and naturally mobile applications are exposing APIs to customers from the public internet (unless your application is meant for private use only…).

Below are the main best practices for securing APIs:

  • Authentication – make sure all your APIs require authentication. Regardless if your API is supposed to share public stock exchange data, a retail book catalog, or weather statistics, all requests to pull data from an exposed API must be authenticated.
  • Authorization – make sure you set strict access control on each API request, whether it is read data from a database, update records, or privileged actions such as deleting data. Keep in mind the principle of least privilege.
  • Encryption – all traffic to an exposed API must be encrypted at transit using the most up-to-date encryption protocol (for example TLS 1.2 or above). Encryption keeps the data confidential and proves the identity of your API (or server) to your customers.
  • Auditing – make sure all actions done on your APIs are auditing and all logs are sent to a central logging system (or SIEM) for further archive and analysis (to find out if someone is trying to take actions they are not supposed to).
  • Input validation – make sure all input coming to your APIs is been validated, before storing it in a backend database. It will allow you to limit the chance of injection attacks.
  • DDoS and web-related attacks – make sure all your exposed APIs are protected behind anti-DDoS and behind a web application firewall. If it will not block 100% of the attacks, at least you will be able to block the well-known and signature-based attacks and decrease the amount of unwanted traffic against your APIs.
  • Code review – API is a piece of code. Before pushing new changes to any API, make sure you run static and dynamic code analysis, to locate security vulnerabilities embed in your code.
  • Throttling – make sure you enforce a throttling mechanism, in case someone tries to access your API multiple times from the same source, to avoid a situation where your API is unavailable for all your customers.

Additional reference:

Authorization

Authorization in a cloud-native application can be challenging.

On legacy applications all components were built as part of a single monolith, users had to log in from a single-entry point, and once we have authenticated and authorized them, they were to access data and with proper permissions to make changes to data as well.

Since modern applications are built upon micro-service architecture, we need to think not just about end users communicating with our application, but also about how each component in our architecture is going to communicate with other components (such as pod-to-pod communication required authorization).

If every component in our entire application is developed by a separate team, we need to think about a central authorization mechanism.

But central authorization mechanism is not enough.

We need to integrate our authorization mechanism with a central IAM (Identity and Access Management) system.

I would not recommend to re-invent the wheel – try to use the IAM service from your cloud provider of choice. Cloud-native IAM systems have built-in integration with the cloud eco-system, including auditing capabilities – this way you will be able to consume the service, without maintaining the underlining infrastructure.

Checking the end-users’ privileges at login time might not be sufficient. We need to think about fine-grain permissions – is a generic “Reader user” enough? Do the user needs read access to all data stored in our data store? Perhaps he only needs read access to a specific line of business customers database and nothing more. Always keep in mind the principle of least privilege.

Our authorization mechanism needs to be dynamic according to each request and data the user is trying to access, be verified constantly and allow us to easily revoke permissions in case of suspicious activity, when permissions are no longer needed or if data confidentially has changed over time.

We need to make sure our authorization mechanism can be easily integrated and consumed by each of the various development groups, as a standard authorization mechanism.

Additional references:

Summary

In this post, we have reviewed various topics we need to take into consideration when talking about how to secure cloud-native applications.

We have reviewed the highlights of securing the build process, the infrastructure provisioning, Kubernetes (as an orchestrator engine to run our applications), and not forgetting topics that are part of the secure development lifecycle (securing APIs and authorization mechanism).

Naturally, we have just covered some of the highlights of security in cloud-native applications.

I strongly recommend you to deep dive into each topic, read the references and search for additional information that will allow any developer, DevOps, DevSecOps, architect, or security professional, to better secure cloud-native applications.

Additional References:

Cloud Native Applications – Part 1: Introduction

In the past couple of years, there is a buzz about cloud-native applications.

In this series of posts, I will review what exactly is considered a cloud-native application and how can we secure cloud-native applications.

Before speaking about cloud-native applications, we should ask ourselves – what is cloud native anyway?

The CNCF (Cloud Native Computing Foundation) provides the following definition:

“Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”

Source: https://github.com/cncf/toc/blob/main/DEFINITION.md

It means – taking full advantage of cloud capabilities, from elasticity (scale in and scale out according to workload demands), use of managed services (such as compute, database, and storage services), and the use of modern design architectures based on micro-services, APIs, and event-driven applications.

What are the key characteristics of cloud-native applications?

Use of modern design architecture

Modern applications are built from loosely coupled architecture, which allows us to replace a single component of the application, with minimal or even no downtime to the entire application.

Examples can be code update/change or scale in/scale out a component according to application load.

  • RESTful APIs are suitable for communication between components when fast synchronous communication is required. We use API gateways as managed service to expose APIs and control inbound traffic to various components of our application.

Example of services:

  • Amazon API Gateway
  • Azure API Management
  • Google API Gateway
  • Oracle API Gateway
  • Event-driven architecture is suitable for asynchronous communication. It uses events to trigger and communicate between various components of our application. In this architecture, one component produces/publishes an event (such as a file uploaded to object storage) and another component subscribes/consumes the events (in a Pub/Sub model) and reacts to the event (for example reads the file content and steam it to a database). This type of architecture handles load very well.

Example of services:

Additional References:

Use of microservices

Microservices represent the concept of distributed applications, and they enable us to decouple our applications into small independent components.

Components in a microservice architecture usually communicate using APIs (as previously mentioned in this post).

Each component can be deployed independently, which provides a huge benefit for code change and scalability.

Additional references:

Use of containers

Modern applications are heavily built upon containerization technology.

Containers took virtual machines to the next level in the evolution of computing services.

They contain a small subset of the operating system – only the bare minimum binaries and libraries required to run an application.

Containers bring many benefits – from the ability to run anywhere, small footprint (for container images), isolation (in case of a container crash), fast deployment time, and more.

The most common orchestration and deployment platform for containers is Kubernetes, used by many software development teams and SaaS vendors, capable of handling thousands of containers in many production environments.

Example of services:

 Additional References:

Use of Serverless / Function as a Service

More and more organizations are beginning to embrace serverless or function-as-a-service technology.

This is considered the latest evolution in computing services.

This technology allows us to write code and import it into a managed environment, where the cloud provider is responsible for the maintenance, availability, scalability, and security of the underlining infrastructure used to run our code.

Serverless / Function as a Service, demonstrates a very good use case for event-driven applications (for example – an event written to a log file triggers a function to update a database record).

Functions can also be part of a microservice architecture, where some of the application components are based on serverless technology, to run specific tasks.

Example of services:

Additional References:

Use of DevOps processes

To support rapid application development and deployment, modern applications use CI/CD processes, which follow DevOps principles.

We use pipelines to automate the process of continuous integration and continuous delivery or deployment.

The process allows us to integrate multiple steps or gateways, where in each step we can embed additional automated tests, from static code analysis, functional test, integration test, and more.

Example of services:

Additional References:

Use of automated deployment processes

Modern application deployment takes an advantage of automation using Infrastructure as Code.

Infrastructure as Code is using declarative scripting languages, in in-order to deploy an entire infrastructure or application infrastructure stack in an automated way.

The fact that our code is stored in a central repository allows us to enforce authorization mechanisms, auditing of actions, and the ability to roll back to the previous version of our Infrastructure as Code.

Infrastructure as Code integrates perfectly with CI/CD processes, which enables us to re-use the knowledge we already gained from DevOps principles.

Example of solutions:

Additional References:

Summary

In this post, we have reviewed the key characteristics of cloud-native applications, and how can we take full advantage of the cloud, when designing, building, and deploying modern applications.

I recommend you continue expanding your knowledge about cloud-native applications, whether you are a developer, IT team member, architect, or security professional.

Stay tuned for the next chapter of this series, where we will focus on securing cloud-native applications.

Additional references

Mitigating the risk of a cloud outage or lack of cloud resources

Organizations migrating to the public cloud, or already provisioning workloads in the cloud come across limitations, either on production workloads or issues published in the media, as you can read below:

Source: https://news.ycombinator.com/item?id=33743567
Source: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/allocation-failure
Source: https://www.mirror.co.uk/news/uk-news/breaking-gmail-down-hundreds-email-28701750
Source: https://www.datacenterknowledge.com/amazon/breaking-aws-experienced-outage-its-us-east-2-availability-zone

As a cloud consumer, you might be asking yourself, how do I mitigate such risks from affecting my production workloads? (Assuming your organization has already invested a lot of money and resources in the public cloud)

There is no one answer to this question, but in the following post, I will try to review some of the alternatives for protecting yourself or at least try to mitigate the risks.

Alternative 1 – Switching to a cloud-native application

This alternative takes full advantage of cloud benefits.

Instead of using VMs to run or process your workload, you will have to re-architect your application and use cloud-native services such as Serverless / Function as Service, managed services (from serverless database services, object storage, etc.), and event-driven services (such as Pub/Sub, Kafka, etc.)

Pros

  • You decrease your application dependencies on virtual machines (on-demand, reserved instances, and even Spot instances), so resource allocation limits of VMs should be less concerning.

Cons

  • Full re-architecture of your applications can be an expensive and time-consuming process. It requires a deep review of the entire application stack, understanding the application requirements and limitations, and having an experienced team of developers, DevOps, architects, and security personnel, knowledgeable enough about your target cloud providers ecosystem.
  • The more you use a specific cloud’s ecosystem (such as proprietary Serverless / Function as a Service), the higher your dependency on specific cloud technology, which will cause challenges in case you are planning to switch to another cloud provider sometime in the future (or consider the use of multi-cloud).

Additional references:

Alternative 2 – The multi-region architecture

This alternative suggests designing a multi-region architecture, where you use several (separate) regions from your cloud provider of choice.

Pros

  • The use of multi-region architecture will decrease the chance of having an outage of your services or the chance of having resource allocation issues.

Cons

  • In case the cloud provider fails to create a complete separation between his regions (see: https://www.theguardian.com/technology/2020/dec/14/google-suffers-worldwide-outage-with-gmail-youtube-and-other-services-down), multi-region architecture will not resolve potential global outage issues (or limit the blast radius).
  • In case you have local laws or regulations which force you to store personal data on data centers in a specific jurisdiction, a multi-region solution is not an option.
  • Most IaaS / PaaS services offered today by cloud providers are regional, meaning, they are limited to a specific region and do not span across regions, and as a result, you will have to design a plan for data migration or synchronization across regions, which increases the complexity of maintaining this architecture.
  • In a multi-region architecture, you need to take into consideration the cost of egress data between separate regions.

Additional references:

Alternative 3 – Cloud agnostic architecture (or preparation for multi-cloud)

This alternative suggests using services that are available for all major cloud providers.

An example can be – to package your application inside containers and manage the containers orchestration using a Kubernetes-managed service (such as Amazon EKS, Azure AKS, Google GKE, or Oracle OKE).

To enable cloud agnostic architecture from day 1, consider provisioning all resources using HashiCorp Terraform – both for Kubernetes resources and any other required cloud service, with the relevant adjustments for each of the target cloud providers.

Pros

  • Since container images can be stored in a container registry of your choice, you might be able to migrate between cloud providers.

Cons

  • Using Kubernetes might resolve the problem of using the same underlining orchestrator engine, but you will still need to think about the entire cloud provider ecosystem (from data store services, queuing services, caching, identity authentication and authorization services, and more.
  • In case you have already invested a lot of resources in a specific cloud provider and already stored a large amount of data in a specific cloud provider’s storage service, migrating to another cloud provider will be an expensive move, not to mention the cost of egress data between different cloud providers.
  • You will have to invest time in training your teams on working with several cloud providers’ infrastructures and maintain several copies of your Terraform code, to suit each cloud provider infrastructure.

Summary

Although there is no full-proof answer to the question “How do I protect myself from the risk of a cloud outage or lack of cloud resources”, we need to be aware of both types of risks.

We need to be able to explain the risks and the different alternatives provided in this post and explain them to our organization’s top management.

Once we understand the risks and the pros and cons of each alternative, our organization will be able to decide how to manage the risk.

I truly believe that the future of IT is in the public cloud, but migrating to the cloud blindfolded, is the wrong way to fully embrace the potential of the public cloud.

Securing the software supply chain in the cloud

The software supply chain is considered one of the common threats in today’s modern cloud-native development, which poses a high risk to any organization.

It is about consuming software packages, source code, or even APIs from a third-party or untrusted source.

The last thing we wish to do is to block developers from building new applications, but we need to understand the threats to the software supply chain.

What are the common threats?

There are a couple of common threats that can arise from a software supply chain attack:

As we can see, most supply chain attacks begin with a download of an untrusted piece of code, which leads to malware infection, or pulling data from an external API, which inserts unverified data into a backend system.

Steps to mitigate the risk of supply chain attacks

The modern development lifecycle is based on CI/CD (Continuous Integration / Continuous Deployment or Delivery), we can embed security gates at various stages of the CI/CD pipeline, as explained below.

Source Code

  • Scan for software vulnerabilities (such as binaries and open-source libraries), before storing components/code/libraries inside VM or container images inside an image repository.

Example of services:

  • Amazon Inspector – Vulnerability scanner for Amazon EC2, container images (inside Amazon ECR), and Lambda functions
  • Microsoft Defender for Containers – Vulnerability scanner for containers
  • Google Container Analysis – Vulnerability scanner for containers
  • Scan your code stored in your repositories, to make sure it does not contain sensitive data (such as secrets, API keys, credentials, etc.)

Example of tools:

Example of tools:

  • Snyk – Scan for open-source, code, container, and Infrastructure-as-Code vulnerabilities
  • Trivy – Scan for open-source, code, container, and Infrastructure-as-Code vulnerabilities
  • Chekov – Scan for open-source and Infrastructure-as-Code vulnerabilities
  • KICS – Scan for Infrastructure-as-Code vulnerabilities
  • Terrascan – Scan for Infrastructure-as-Code vulnerabilities
  • Kubescape – Scan for Kubernetes vulnerabilities
  • Scan your binaries to verify their trustworthiness – especially important when you import binaries from an external source.

Example of services:

Repositories

  • Create a private repository for storing source code, VM images, or container images
  • Enforce authentication and authorization for who can access and make changes to the repository
  • Sign all source code/images stored in the repository
  • Audit access to the repositories

Example of services for storing source code:

Example of services for storing VM images:

Example of services for storing container images:

Example of service for storing serverless code:

Authentication & Authorization

  • Configure authentication and authorization process (who has written permissions to the repository), and enforce the use of MFA.

Example of services:

Example of services:

Handling data from external APIs

There are many cases where we rely on data from external third parties, exposed using APIs.

Since we cannot verify the trustworthiness of external data, we must follow the following guidelines:

  • Never rely on unauthenticated APIs – always make sure the connectivity to the external APIs requires proper authentication (such as certificates, rotated API key, etc.) and proper
  • Always make sure the remote API enforces proper authorization mechanism – if the remote API allows admin or even write access to anyone on the Internet, the data it provides is not considered trusted anymore
  • Always make sure data is encrypted at transit – it allows to keep data confidentiality and provides a high degree of trust in the remote endpoint
  • Always perform input validation and proper escaping, before storing data from an external source into any backend database

For further reading, see:

Summary

In the post, we have reviewed threats as a result of software supply chain vulnerabilities, and various tools and services that can assist us in securing the modern development process of cloud-native applications.

It is possible to mitigate the risks coming from the software supply chain, whether it is code that we develop in-house or code/binaries/libraries that we import from a third-party source, but we must always follow the concept of “Trust but verify”.

References

Using immutable infrastructure to achieve cloud security

Maintaining cloud infrastructure, especially compute components, requires a lot of effort – from patch management, secure configuration, and more.

Other than the efforts it takes for the maintenance part, it simply will not scale.

Will we be able to support our workloads when we need to scale to thousands of machines at peak?

Immutable infrastructure is a deployment method where compute components (virtual machines, containers, etc.) are never updated – we simply replace a running component with a new one and decommission the old one.

Immutable infrastructure has its advantages, such as:

  • No dependent on previous VM/container state
  • No configuration drifts
  • The fast configuration management process
  • Easy horizontal scaling
  • Simple rollback/recovery process

The Twelve-Factor App

Designing modern or cloud-native applications requires us to follow 12 principles, documents in https://12factor.net

Looking at this guide, we see that factor number 3 (config) guides us to store configuration in environment variables, outside our code (or VMs/containers).

For further reading, see:

  • The Twelve-Factor App – Config

https://12factor.net/config

  • AWS – Applying the Twelve-Factor App Methodology to Serverless Applications

https://aws.amazon.com/blogs/compute/applying-the-twelve-factor-app-methodology-to-serverless-applications/#config

  • Azure – The Twelve-Factor Application

https://learn.microsoft.com/en-us/dotnet/architecture/cloud-native/definition#the-twelve-factor-application

  • GCP – Twelve-factor app development on Google Cloud

https://cloud.google.com/architecture/twelve-factor-app-development-on-gcp#3_configuration

If we continue to follow the guidelines, factor number 6 (processes) guides us to create stateless processes, meaning, separating the execution environment and the data, and keeping all stateful or permanent data in an external service such as a database or object storage.

For further reading, see:

  • The Twelve-Factor App – Processes

https://12factor.net/processes

How do we migrate to immutable infrastructure?

Build a golden image

Follow the cloud vendor’s documentation about how to download the latest VM image or container image (from a container registry), update security patches, binaries, and libraries to the latest version, customize the image to suit the application’s needs, and store the image in a central image repository.

It is essential to copy/install only necessary components inside the image and remove any unnecessary components – it will allow you to keep a minimal image size and decrease the attack surface.

It is recommended to sign your image during the storage process in your private registry, to make sure it was not changed and that it was created by a known source.

For further reading, see:

  • Automate OS Image Build Pipelines with EC2 Image Builder

https://aws.amazon.com/blogs/aws/automate-os-image-build-pipelines-with-ec2-image-builder/

  • Creating a container image for use on Amazon ECS

https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-container-image.html

  • Azure VM Image Builder overview

https://learn.microsoft.com/en-us/azure/virtual-machines/image-builder-overview

  • Build and deploy container images in the cloud with Azure Container Registry Tasks

https://learn.microsoft.com/en-us/azure/container-registry/container-registry-tutorial-quick-task

  • Create custom images

https://cloud.google.com/compute/docs/images/create-custom

  • Building container images

https://cloud.google.com/build/docs/building/build-containers

Create deployment pipeline

Create a CI/CD pipeline to automate the following process:

  • Check for new software/binaries/library versions against well-known and signed repositories
  • Pull the latest image from your private image repository
  • Update the image with the latest software and configuration changes in your image registry
  • Run automated tests (unit tests, functional tests, acceptance tests, integration tests) to make sure the new build does not break the application
  • Gradually deploy a new version of your VMs / containers and decommission old versions

For further reading, see:

  • Create an image pipeline using the EC2 Image Builder console wizard

https://docs.aws.amazon.com/imagebuilder/latest/userguide/start-build-image-pipeline.html

  • Create a container image pipeline using the EC2 Image Builder console wizard

https://docs.aws.amazon.com/imagebuilder/latest/userguide/start-build-container-pipeline.html

  • Streamline your custom image-building process with the Azure VM Image Builder service

https://azure.microsoft.com/de-de/blog/streamline-your-custom-image-building-process-with-azure-vm-image-builder-service/

  • Build a container image to deploy apps using Azure Pipelines

https://learn.microsoft.com/en-us/azure/devops/pipelines/ecosystems/containers/build-image

  • Creating the secure image pipeline

https://cloud.google.com/software-supply-chain-security/docs/create-secure-image-pipeline

  • Using the secure image pipeline

https://cloud.google.com/software-supply-chain-security/docs/use-image-pipeline

Continues monitoring

Continuously monitor for compliance against your desired configuration settings, security best practices (such as CIS benchmark hardening settings), and well-known software vulnerabilities.

In case any of the above is met, create an automated process, and use your previously created pipeline to replace the currently running images with the latest image version from your registry.

For further reading, see:

  • How to Set Up Continuous Golden AMI Vulnerability Assessments with Amazon Inspector

https://aws.amazon.com/blogs/security/how-to-set-up-continuous-golden-ami-vulnerability-assessments-with-amazon-inspector/

  • Scanning Amazon ECR container images with Amazon Inspector

https://docs.aws.amazon.com/inspector/latest/user/enable-disable-scanning-ecr.html

  • Manage virtual machine compliance

https://learn.microsoft.com/en-us/azure/architecture/example-scenario/security/virtual-machine-compliance

  • Use Defender for Containers to scan your Azure Container Registry images for vulnerabilities

https://learn.microsoft.com/en-us/azure/defender-for-cloud/defender-for-containers-vulnerability-assessment-azure

  • Automatically scan container images for known vulnerabilities

https://cloud.google.com/kubernetes-engine/docs/how-to/security-posture-vulnerability-scanning

Summary

In this article, we have reviewed the concept of immutable infrastructure, its benefits, and the process for creating a secure, automated, and scalable solution for building immutable infrastructure in the cloud.

References

  • The History of Pets vs Cattle and How to Use the Analogy Properly

https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/

  • Deploy using immutable infrastructure

https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/rel_tracking_change_management_immutable_infrastructure.html

  • Immutable infrastructure CI/CD using Jenkins and Terraform on Azure

https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/immutable-infrastructure-cicd-using-jenkins-and-terraform-on-azure-virtual-architecture-overview

  • Automate your deployments

https://cloud.google.com/architecture/framework/operational-excellence/automate-your-deployments

Where is the OSI model in the public cloud?

When talking about the public cloud, I always like the analogy to the OSI model.

“The Open Systems Interconnection model (OSI model) is a conceptual model. Communications between a computing system are split into seven different abstraction layers: Physical, Data Link, Network, Transport, Session, Presentation, and Application” (Wikipedia)

A similar and shorter model of the OSI model is the TCP/IP model.

Here is a comparison of the two models:

In the public cloud, we find a similar concept when talking about the shared responsibility model, where we draw the line of responsibility between the public cloud provider and the customers, in the different cloud service models, usually in terms of security, as we can see in the diagram below:

Where do public cloud services fit in the OSI model?

There are many networks related services in each of the major public cloud providers.

To make things easy to understand, I have prepared the following diagram, comparing common network-related services to the various OSI model layers:

Encryption / Cryptography and the OSI Model

Layer 6 of the OSI model is the presentation layer.

Among the things, we can find in this layer is data encryption.

Encryption in this context is about encryption at rest – from object storage, block storage, file storage, and various data services.

Encryption includes symmetric and asymmetric encryption keys, secrets, passwords, API keys, certificates, etc.

The process includes the generation, storage, retrieval, and rotation of encryption keys.

Here are the most common encryption /cryptography-related services:

Identity Management and the OSI Model

Layer 7 of the OSI model is the application layer.

Among the things we can find in this layer are related to authentication and authorization, or the entire identity management.

Identity management is about managing the entire lifecycle of identity – from an end user, service account, computer accounts, etc.

The process includes account provisioning, password management (and MFA), permission management (role assignments), and finally account de-provisioning.

Here are the most common identity-related services:

How does everything come together?

When reviewing a cloud architecture, I like to compare the various services in the architecture to the different layers of the OSI model, from the bottom up:

  • Network connectivity and traffic flow
  • Encryption (according to data sensitivity)
  • Authentication and Authorization (according to the least privilege principle)

The OSI model analogy, assist me to make sure I do not forget any important aspect when reviewing an architecture for a cloud workload.

Sustainability in the cloud era

When thinking about cloud computing, we immediately think about technology.

Have we ever stopped to think about how much energy this sort of technology requires to operate an average cloud data center, and what is the environmental effect of running such huge data centers around the world?

Data centers generate around 1% of the energy consumed around the world, daily.

Data centers consume a lot of energy – electricity (for running the servers) and water (for cooling the servers).

The more energy a common data center consumes, the bigger its carbon footprint (the total amount of greenhouse gases that is generated by running a data center).

In the past couple of years, there is a new concept for professionals working with cloud services, with high environmental awareness called cloud sustainability.

The idea behind it (from a cloud provider’s point of view) is to achieve 100% renewable energy – replace fuel-based electricity with wind and solar power, within a few years.

All major cloud providers (AWS, Azure, and GCP) put a lot of effort into building a new data center to be powered by green energy and making changes to the existing data center to lower their emissions as much as possible and use green energy as well.

To remain transparent to their customers, the major cloud providers have created carbon footprint tools:

  • AWS customer carbon footprint tool

https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/what-is-ccft.html

  • Microsoft Sustainability Calculator

https://aka.ms/SustainabilityCalculator

  • GCP Carbon Footprint

https://cloud.google.com/carbon-footprint

  • Cloud Carbon Footprint (Open source) tool

https://www.cloudcarbonfootprint.org/docs/getting-started

Indeed, most of the responsibility for keeping the cloud data centers green is under the responsibility of the cloud providers, since they build and maintain their data centers, but what is our responsibility as consumers?

As an example, here is AWS’s point of view regarding the shared responsibility model, in the context of sustainability:

Source: https://docs.aws.amazon.com/wellarchitected/latest/sustainability-pillar/the-shared-responsibility-model.html

How to act as responsible cloud consumers?

Region selection

Review business requirements (compliance, latency, cost, service, and features), and pay attention to regions with a low carbon footprint.

Additional information:

  • AWS – What to Consider when Selecting a Region for your Workloads

https://aws.amazon.com/blogs/architecture/what-to-consider-when-selecting-a-region-for-your-workloads

  • Carbon-free energy for Google Cloud regions

https://cloud.google.com/sustainability/region-carbon

  • Measuring greenhouse gas emissions in data centers: the environmental impact of cloud computing

https://www.climatiq.io/blog/measure-greenhouse-gas-emissions-carbon-data-centres-cloud-computing

Architecture design considerations

Use cloud-native design patterns:

  • Microservices – use containers (and Kubernetes) to deploy your applications and leverage the scaling capabilities of the cloud
  • Serverless – use serverless (or function as a service) whenever you can decouple your applications into small functions
  • Use message queues as much as possible, to decouple your applications and lower the number of requests between the various services/components
  • Use caching mechanisms to lower the number of queries to backend systems

Infrastructure considerations

Embed the following as part of your infrastructure considerations:

  • Right-sizing – when using VMs, always remember to right-size the VM size to your application demands
  • Use up-to-date hardware – when using VMs, always use the latest VM family types and the latest block storage type, to suit your application demands
  • ARM-based processors – consider using ARM processors (such as AWS Graviton Processor, Azure Ampere Altra Arm-based processors, GCP Ampere Altra Arm processors, and more), whenever your application supports the ARM technology (for better performance and lower cost)
  • Idle hardware – monitor and shut down (or even delete) unused or idle hardware (VMs, databases, etc.)
  • GPU – use GPUs only for tasks that are considered more efficient than CPUs (such as machine learning, rendering, transcoding, etc.)
  • Spot instances – use spot instances, whenever your application supports sudden interruptions
  • Schedule automatic start and stop of VMs – use scheduling capabilities (such as AWS Instance scheduler, Azure Start/Stop VMs, GCP start and stop virtual machine (VM) instances, etc.) to control the behavior of your workload VMs
  • Managed services – prefer to use PaaS or managed services (from databases, storage, load-balancers, and more)
  • Data lifecycle management – use object storage (or file storage) lifecycle policies to archive or remove unused or unnecessary data
  • Auto-scaling – use the cloud built-in capabilities to scale horizontally according to your application load
  • Content Delivery Network – use CDN (such as Amazon CloudFront, Azure Content Delivery Network, Google Cloud CDN, etc.) to lower the amount of customer traffic to your publicly exposed workloads

Summary

Sustainability and green computing are here to stay.

Although the large demand for cloud services has a huge environmental impact, I strongly believe that the use of cloud services is much more environmentally friendly than any use of legacy data center, for the following reasons:

  • Efficient hardware utilization (nearly 100% of hardware utilization)
  • Fast hardware replacement (due to high utilization)
  • Better energy use (high use of renewable energy sources to support the electricity requirements)

I advise all cloud customers, to put sustainability higher in their design considerations.

Additional reading materials

  • AWS Well-Architected Framework – Sustainability Pillar

https://docs.aws.amazon.com/wellarchitected/latest/sustainability-pillar/sustainability-pillar.html

  • Microsoft Azure Well-Architected Framework – Sustainability

https://learn.microsoft.com/en-us/azure/architecture/framework/sustainability/sustainability-get-started

  • Google Cloud – Design for environmental sustainability

https://cloud.google.com/architecture/framework/system-design/sustainability

Data protection in cloud services

Storing data in the cloud, raise questions regarding data protection.

Data can be customers’ data (PII, healthcare data, credit cards, etc.), company data (financial information, trade secrets, security vulnerabilities, etc.), or any information with value to our organization.

As in the traditional data center, we still have concerns regarding who has access to our data and what can he do with the access provided.

In this blog post, I will review the required controls for protecting data stored in cloud services.

Data discovery and classification

The first action we need to take regarding sensitive data is discovery and classification.

Data classification is the action of assigning labels or categories to our data, such as public information, internal, confidential, highly confidential, etc.

Discovery tools allow us to detect where we store sensitive information in storage locations such as object storage, file storage, databases, and more.

Examples of services for the discovery process:

  • Amazon Macie – discover sensitive information stored in Amazon S3 buckets.
  • Microsoft Purview – map and discover data on-premise and in the cloud.

Entitlement

Entitlement deals with the questions – who has access, to what resources, and what can he do with his access rights?

In any access request, we should always make sure the identity (human, service account, computer account, etc.) is authenticated against our system, preferably using a central identity provider.

Once the identity is authenticated against our system, we need to make sure it has proper access rights to take the exact number of privileges required to accomplish its desired task, according to the principle of least privilege (such as view configuration, read customer data, update records, etc.)

Entitlement combines authentication with authorization.

Examples of services for entitlements:

Encryption

To protect data, we need to protect it in any state the data resides:

  • Data in transit – all cloud services (from object storage, file storage, and databases) support encryption in transit using TLS protocol. Unlike the traditional data center where encryption in transit was either not supported or required an additional effort from our side, in the cloud, services support encryption in transit by default, and in many cases, we have no option to disable this feature.
  • Data at rest – all cloud storage services (from object storage, file storage, and databases) support encryption at rest using the AES256 algorithm.

In the traditional data center, encryption key management and key rotation were challenging.

Today, most cloud providers allow us to choose between encryption at rest using encryption keys generated and managed by the cloud provider, or using encryption keys that we generate and control (to minimize the risk of rough cloud provider admin having access to our data).

Examples of services for storing encryption keys and sensitive data:
AWS KMS – controls the entire lifecycle of cryptographic keys.
AWS Secrets Manager – controls the entire lifecycle of secrets, credentials, API keys, etc.
Azure Key Vault – controls the entire lifecycle of cryptographic keys, secrets, credentials, API keys, etc.

  • Data in use – even if we encrypt the data while in transit and while at rest, at some point, we need to have the data accessible for reading or update, while in the memory of a server in the cloud. The common name for this technology is “confidential computing“, which in most cases relies upon hardware capabilities to encrypt data and make sure data in memory is kept confidential.

Examples of solutions that provide confidential computing capabilities:

Auditing and threat detection

The final action we need to take protecting data is to audit who accessed our data and detect anomalous behavior with actions performed on our data.

Although it is considered a detective control, it is still an important phase in data control.

Examples of services that perform audit trails:

Now that we record all actions, we need a solution to review the logs and notify us about anomalous behavior that requires our attention.

Examples of threat detection services:

Summary

In this blog post, I have reviewed the necessary controls for protecting data stored in the cloud.

It is essential to understand that to get effective protection for data stored in the cloud, we must configure strong controls of both encryption at rest (preferred with customer-managed encryption keys), combined with entitlement process (which enforces the least privilege) – we cannot rely on single security control and pray that no unauthorized person will ever access our data.

Automation as key to cloud adoption success

After deploying several workloads in the public cloud, making mistakes, failing, fixing, and beginning using the cloud for production workloads, it is now the time to think about the next step in cloud adoption.

To be able to fully embrace the benefits of the public cloud, the scale, the elasticity, and the short time it takes to deploy new resources, it is time to put automation in place.

Automation allows us to do the same tasks over and over again, deploying the same configuration to multiple environments (Dev, Test, Prod) and get the same results – no human errors (assuming you have tested your code…)

Automation can be achieved in various ways – from using the CLI, using the cloud vendor’s SDK (languages such as Python, Go, Java, and more), or using Infrastructure as Code (such as Terraform, AWS CloudFormation, Azure Resource Manager, and more).

In this article, we shall review some of the common alternatives for using automation using code.

Why use code?

The clear benefit of using code for automation is the ability to have change management. Simply choose your favorite source control (such as GitHub, AWS CodeCommit, Azure Repos, and more), upload your scripts and have the version history of your code, and be able to know at each stage who made changes to the code.

Another benefit of using code for automation is the fact that the Internet is full of samples you can find to automate (almost) anything in your cloud environment.

The downside of doing everything using code, is the learning curve required by your organization’s IT or DevOps teams, learning new languages, but once they pass this stage, you can have all the benefits of the scripting languages.

Automation – the AWS way

If AWS is your sole cloud provider, you should learn and start using the following built-in services or capabilities offered by AWS:

Infrastructure as Code

  • AWS CloudFormation – The built-in IaC for deploying and managing AWS resources.

Reference: https://github.com/aws-cloudformation/aws-cloudformation-samples

  • AWS Cloud Development Kit (AWS CDK) – Ability to write CloudFormation templates, based on common programming languages such as Python, Java, DotNet, and more.

Reference: https://github.com/aws-samples/aws-cdk-examples

Policy as Code

  • Service control policies (SCPs) – Managing permissions in AWS Organizations.

Reference: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps_examples.html

CI/CD pipeline

  • AWS CodePipeline – A fully managed continuous delivery service.

Reference: https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials.html

Containers and Kubernetes

  • Amazon ECS – Container management service based on the AWS platform.

Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/example_task_definitions.html

  • Amazon Elastic Kubernetes Service (EKS) – Managed Kubernetes service.

Reference: https://github.com/aws-quickstart/quickstart-amazon-eks

Automation – the Azure way

If Azure is your sole cloud provider, you should learn and start using the following built-in services or capabilities offered by Azure:

Infrastructure as Code

  • Azure Resource Manager templates (ARM templates) – The built-in IaC for deploying and managing Azure resources.

Reference: https://github.com/Azure/azure-quickstart-templates

  • Bicep – Declarative language for deploying Azure resources.

Reference: https://github.com/Azure/azure-docs-bicep-samples

Policy as Code

  • Azure Policy – Enforce organizational standards across the Azure organization.

Reference: https://github.com/Azure/azure-policy

CI/CD pipeline

  • Azure Pipelines – A fully managed continuous delivery service.

Reference: https://github.com/microsoft/azure-pipelines-yaml

Containers and Kubernetes

  • Azure Container Instances – Container management service based on the Azure platform.

Reference: https://docs.microsoft.com/en-us/samples/browse/?products=azure&terms=container%2Binstance

  • Azure Kubernetes Service (AKS) – Managed Kubernetes service.

Reference: https://github.com/Azure/AKS

Automation – the Google Cloud way

If GCP is your sole cloud provider, you should learn and start using the following built-in services or capabilities offered by GCP:

Infrastructure as Code

  • Google Cloud Deployment Manager – The built-in IaC for deploying and managing GCP resources.

Reference: https://github.com/GoogleCloudPlatform/deploymentmanager-samples

Policy as Code

  • Google Organization Policy Service – Programmatic control over the organization’s cloud resources.

Reference: https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints#how-to_guides

CI/CD pipeline

  • Google Cloud Build – A fully managed continuous delivery service.

Reference: https://github.com/GoogleCloudPlatform/cloud-build-samples

Containers and Kubernetes

  • Google Kubernetes Engine (GKE) – Managed Kubernetes service.

Reference: https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

Automation – the cloud agnostic way

If you plan for the future, plan for multi-cloud. Look for solutions that are capable of connecting to multiple cloud environments, to decrease the learning curve of your DevOps team learning the various scripting languages and being able to deploy workloads on several cloud environments.

Infrastructure as Code

  • Hashicorp Terraform – The most widely used IaC for deploying and managing resources on both cloud and on-premise.

Reference: https://registry.terraform.io/browse/providers

Policy as Code

  • Hashicorp Sentinel – Policy as code framework that compliments Terraform code.

Reference: https://www.terraform.io/cloud-docs/sentinel/examples

CI/CD pipeline

  • Jenkins – The most widely used open-source CI/CD tool.

Reference: https://www.jenkins.io/doc/pipeline/examples/

Containers and Kubernetes

  • Docker – The most widely used container run-time for deploying applications.

Reference: https://github.com/dockersamples

  • Kubernetes – The most widely used container orchestration open-source platform.

Reference: https://github.com/kubernetes/examples

Summary

In this post, I have reviewed the most common solutions that allow you to automate your workloads’ deployment, management, and maintenance using various scripting languages.

Some of the solutions are bound to a specific cloud provider, while others are considered cloud agnostic.

Use automation to fully embrace the power and benefits of the public cloud.

If you don’t have experience writing code, take the time to learn. The more you practice, the more experience you will gain.

As Werner Vogels, the Amazon CTO always says – “Go Build”.

Cloud and the shared responsibility model misconceptions

One of the most common concepts working with cloud services is the “Shared responsibility mode”.

The model is aim to set the responsibility boundaries between the cloud service provider and the cloud service consumer, depending on the cloud service model (IaaS, PaaS, SaaS).

In this post, I will review common misconceptions regarding the shared responsibility model.

Misconception #1 — My cloud provider’s certifications allow me to comply with regulations

This is a common misconception for companies (and new SaaS providers) who fail to understand the shared responsibility model while deploying their first workload.

Reviewing cloud providers’ compliance pages, we can see that the providers have already certified themselves for most regulations and local laws, and in some cases even offer customers special environments that are already in compliance with regulations such as PCI-DSS or HIPAA.

If you are planning to store sensitive customers’ data (from PII, healthcare, financial, or any other types of sensitive data) in a public cloud, keep in mind that according to the shared responsibility model, the cloud provider is responsible only for the lower layers of the architecture:

· IaaS — the CSP is responsible for all layers, from the physical layer to the virtualization layer

· PaaS — the CSP is responsible for all layers, from the physical layer to the guest operating system, middleware, and even runtime

· SaaS — the CSP is responsible for all layers, from the physical layer to the application layer

Bottom line — the fact that a CSP has all the relevant certifications, means almost nothing when talking about compliance with regulations or protecting customers’ data.

Each organization storing sensitive data in the cloud must conduct a risk assessment, review which data is stored in the cloud (before storing data in the cloud), and set the proper controls to protect customers’ data.

Misconception #2 — Who is responsible for protecting my data?

When customers (either organizations or personal customers) store their data in public cloud services, they sometimes mistakenly think that if they store their data in one of the major CSPs, their data is protected.

This is a misconception.

All major CSPs offer their customers a large variety of services and tools to protect their customers’ data (from network access control lists, encryption in transit and at rest, authentication, authorization, auditing, and more), however, according to the shared responsibility model, it is up to the customer (mostly organizations storing their data in the cloud), to decide which security controls to implement.

In most cases, the CSPs don’t have access to customers’ data stored in the cloud, whether organizations decide to use managed storage services (from object storage to managed CIFS/NFS services), managed database services (from relational databases to NoSQL databases) and more.

The most obvious exception to the mentioned above is SaaS services, where we allow CSP service accounts access to our data, to allow us to perform queries, get insights about our data or even perform regular backups — the access is mostly strict to specific actions, to a specific role or service account, and usually shouldn’t be used by the CSP employees.

At the end of the day, the customer is always the data owner, and as a data owner, the customer must decide whether or not to store sensitive data in the cloud, who should have access to the data stored in the cloud, what access rights do we allow people to access and update/delete our data, and more.

Misconception #3 — Availability is not my concern since the cloud is highly available by design

The above headline is true, mainly for major SaaS services.

When looking at availability and building highly available architectures, specifically in IaaS and PaaS, it is up to us, as organizations, to use the services and the service capabilities that CSPs offer us, to build highly available solutions.

Just because we decided to deploy our application on a VM or store our data in a managed database service, but we failed to deploy everything behind a load-balancer or in a cluster, will not guarantee us the availability that our customers expect.

Even if we are using managed object storage services and we choose a low redundancy tier, using a single availability zone, the CSP does not guarantee high availability.

To achieve high availability to our workloads, we need to review cloud providers’ documentation, such as “Well architected frameworks” and design our workloads to fit business needs.

Misconception #4 — Incident response in the cloud is an impossible mission

This part is a little bit tricky.

Since as AWS always mention, they are responsible for the security of the cloud — they are responsible for the incident response process of the cloud infrastructure, from the physical data center, the host OS, the network equipment, the virtualization, and all the managed services.

We, as customers of cloud services, are responsible for security within our cloud environments.

In IaaS, everything within the guest OS is our responsibility as customers of the cloud.

It is our responsibility to enable auditing as much as possible, and send all logs to a central log repository and from there to our SIEM system (whether it is located on-premise or in a managed cloud service).

There are also documented procedures for building a forensics environment, made out of snapshots of our VMs or databases, for further analysis.

It is not perfect; we still don’t control the entire flow of the packet from the lower network layers to the application layer, and on managed PaaS services we only have audit logs and we can’t perform memory analysis of managed services (such as databases).

In SaaS services, it gets even worse since, in at best case, the SaaS provider is mature enough to allow us to pull audit logs using API and send them to our SIEM system for further analysis — unfortunately, not all SaaS providers are mature enough to provide us access to the audit logs.

Bottom line — challenging, but not completely impossible. Depending on the cloud service model and the maturity of the cloud provider.

Summary

It is important to understand the shared responsibility model, but what is more important is to understand the cloud service model and services or tools available for us, to enable us to build secure and highly available cloud environments.

References

· AWS Compliance Programs

https://aws.amazon.com/compliance/programs

· Azure compliance documentation

https://docs.microsoft.com/en-us/azure/compliance

· GCP Compliance offerings

https://cloud.google.com/security/compliance/offerings

· AWS Well-Architected Framework

https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html

· Forensic investigation environment strategies in the AWS Cloud

https://aws.amazon.com/blogs/security/forensic-investigation-environment-strategies-in-the-aws-cloud

· Computer forensics chain of custody in Azure

https://docs.microsoft.com/en-us/azure/architecture/example-scenario/forensics