Archive for the ‘Cloud Adoption’ Category
Navigating Brownfield Environments in AWS: Steps for Successful Cloud Use
In an ideal world, we would have the luxury of building greenfield cloud environments, however, this is not always the situation we as cloud architects have to deal with.
Greenfield environments allow us to design our cloud environment following industry (or cloud vendor) best practices, setting up guardrails, selecting an architecture to meet the business requirements (think about event-driven architecture, scale, managed services, etc.), backing cost into architecture decisions, etc.
In many cases, we inherit an existing cloud environment due to mergers and acquisitions or we just stepped into the position of a cloud architect in a new company, that already serves customers, and there are almost zero chances that the business will grant us the opportunity to fix mistakes already been taken.
In this blog post, I will try to provide some steps for handling brownfield cloud environments, based on the AWS platform.
Step 1 – Create an AWS Organization
If you have already inherited multiple AWS accounts, the first thing you need to do is create a new AWS account (without any resources) to serve as the management account and create an AWS organization, as explained in the AWS documentation.
Once the new AWS organization is created, make sure you select an email address (from your organization’s SMTP domain), select a strong password for the Root AWS user account, revoke and remove all AWS access keys (if there are any), and configure an MFA for the Root AWS user account.
Update the primary and alternate contact details for the AWS organization (recommend using an SMTP mailing list instead of a single-user email address).
The next step is to design an OU structure for the AWS organization. There are various ways to structure the organization, and perhaps the most common one is by lines of business, and underneath, a similar structure by SDLC stage – i.e., Dev, Test, and Prod, as discussed in the AWS documentation.
Step 2 – Handle Identity and Access management
Now that we have an AWS organization, we need to take care of identity and access management across the entire organization.
To make sure all identities authenticate against the same identity provider (such as the on-prem Microsoft Active Directory), enable AWS IAM Identity Center, as explained in the AWS documentation.
Once you have set up the AWS IAM Identity Center, it is time to avoid using the Root AWS user account and create a dedicated IAM user for all administrative tasks, as explained in the AWS documentation.
Step 3 – Moving AWS member accounts to the AWS Organization
Assuming we have inherited multiple AWS accounts, it is now the time to move the member AWS accounts into the previously created OU structure, as explained in the AWS documentation.
Once all the member accounts have been migrated, it is time to remove the Root AWS user account, as explained in the AWS documentation.
Step 4 – Manage cost
The next thing we need to consider is cost. If a workload was migrated from the on-prem using a legacy data center mindset, or if a temporary or development environment became a production environment over time, designed by an inexperienced architect or engineer, there is a good chance that cost was not a top priority from day 1.
Even before digging into cost reduction or right-sizing, we need to have visibility into cost aspects, at least to be able to stop wasting money regularly.
Define the AWS management account as the payer account for the entire AWS organization, as explained in the AWS documentation.
Create a central S3 bucket to store the cost and usage report for the entire AWS organization, as explained in the AWS documentation.
Create a budget for each AWS account, create alerts once a certain threshold of the monthly budget has been reached (for example at 75%, 85%, and 95%), and send alerts once a pre-defined threshold has been reached.
Create a monthly report for each of the AWS accounts in the organization and review the reports regularly.
Enforce tagging policy across the AWS organization (such as tags by line of business, by application, by SDLC stage, etc.), to be able to track resources and review their cost regularly, as explained in the AWS documentation.
Step 5 – Creating a central audit and logging
To have central observability across our AWS organization, it is recommended to create a dedicated AWS member account for logging.
Create a central S3 bucket to store CloudTrail logs from all AWS accounts in the organization, as explained in the AWS documentation.
Make sure access to the CloudTrail bucket is restricted to members of the SOC team only.
Create a central S3 bucket to store CloudWatch logs from all AWS accounts in the organization, and export CloudWatch logs to the central S3 bucket, as explained in the AWS documentation.
Step 6 – Manage security posture
Now that we become aware of the cost, we need to look at our entire AWS organization security posture, and a common assumption is that we have public resources, or resources that are accessible by external identities (such as third-party vendors, partners, customers, etc.)
To be able to detect access to our resources by external identities, we should run the IAM Access Analyzer, generate access reports, and regularly review the report (or send their output to a central SIEM system), as explained in the AWS documentation.
We should also use the IAM Access Analyzer to detect excessive privileges, as explained in the AWS documentation.
Begin assigning Service control policies (SCPs) to OUs in the AWS organizations, with guardrails such as denying the ability to create resources in certain regions (due to regulations) or preventing Internet access.
Use tools such as Prowler, to generate security posture reports for every AWS account in the organization, as mentioned in the AWS documentation – focus on misconfigurations such as resources with public access.
Step 7 – Observability into cloud resources
The next step is visibility into our resources.
To have a central view of logs, metrics, and traces across AWS organizations, we can leverage the CloudWatch cross-account capability, as explained in the AWS documentation. This capability will allow us to create dashboards and perform queries to better understand how our applications are performing, but we need to recall that the more logs we store, has cost implications, so for the first stage, I recommend selecting production applications (or at least the applications that produces the most value to our organization).
To have central visibility over vulnerabilities across the AWS organizations (such as vulnerabilities in EC2 instances, container images in ECR, or Lambda functions), we can use Amazon Inspector, to regularly scan and generate findings from all members in our AWS organizations, as explained in the AWS documentation. With the information from Amazon Inspector, we can later use the AWS SSM to deploy missing security patches, as explained in the AWS documentation.
Summary
In this blog post, I have reviewed some of the most common recommendations I believe should grant you better control and visibility into existing brownfield AWS environments.
I am sure there are many more recommendations and best practices, and perhaps next steps such as resource rightsizing, re-architecting existing workloads, adding third-party solutions for observability and security posture, and more.
I encourage readers of this blog post, to gain control over existing AWS environments, question past decisions (for topics such as cost, efficiency, sustainability, etc.), and always look for the next level in taking full benefit of the AWS environment.
About the author
Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry.
You can connect with him on social media (https://linktr.ee/eyalestrin).
Opinions are his own and not the views of his employer.
Stop bringing old practices to the cloud
When organizations migrate to the public cloud, they often mistakenly look at the cloud as “somebody else’s data center”, or “a suitable place to run a disaster recovery site”, hence, bringing old practices to the public cloud.
In the blog, I will review some of the common old (and perhaps bad) practices organizations are still using today in the cloud.
Mistake #1 — The cloud is cheaper
I often hear IT veterans comparing the public cloud to the on-prem data center as a cheaper alternative, due to versatile pricing plans and cost of storage.
In some cases, this may be true, but focusing on specific use cases from a cost perspective is too narrow, and missing the benefits of the public cloud — agility, scale, automation, and managed services.
Don’t get me wrong — cost is an important factor, but it is time to look at things from an efficiency point of view and embed it as part of any architecture decision.
Begin by looking at the business requirements and ask yourself (among others):
- What am I trying to achieve, what capabilities do I need, and then figure out which services will allow you to accomplish your needs?
- Do you need persistent storage? Great. What are your data access patterns? Do you need the data to be available in real time, or can you store data in an archive tier?
- Your system needs to respond to customers’ requests — does your application need to provide a fast response to API calls, or is it ok to provide answers from a caching service, while calls are going through an asynchronous queuing service to fetch data?
Mistake #2 — Using legacy architecture components
Many organizations are still using legacy practices in the public cloud — from moving VMs in a “lift & shift” pattern to cloud environments, using SMB/CIFS file services (such as Amazon FSx for Windows File Server, or Azure Files), deploying databases on VMs and manually maintaining them, etc.
For static and stable legacy applications, the old practices will work, but for how long?
Begin by asking yourself:
- How will your application handle unpredictable loads? Autoscaling is great, but can your application scale down when resources are not needed?
- What value are you getting by maintaining backend services such as storage and databases?
- What value are you getting by continuing to use commercial license database engines? Perhaps it is time to consider using open-source or community-based database engines (such as Amazon RDS for PostgreSQL, Azure Database for MySQL, or OpenSearch) to have wider community support and perhaps be able to minimize migration efforts to another cloud provider in the future.
Mistake #3 — Using traditional development processes
In the old data center, we used to develop monolith applications, having a stuck of components (VMs, databases, and storage) glued together, making it challenging to release new versions/features, upgrade, scale, etc.
The more organizations began embracing the public cloud, the shift to DevOps culture, allowed organizations the ability to develop and deploy new capabilities much faster, using smaller teams, each own specific component, being able to independently release new component versions, and take the benefit of autoscaling capability, responding to real-time load, regardless of other components in the architecture.
Instead of hard-coded, manual configuration files, pruning to human mistakes, it is time to move to modern CI/CD processes. It is time to automate everything that does not require a human decision, handle everything as code (from Infrastructure as Code, Policy as Code, and the actual application’s code), store everything in a central code repository (and later on in a central artifact repository), and be able to control authorization, auditing, roll-back (in case of bugs in code), and fast deployments.
Using CI/CD processes, allows us to minimize changes between different SDLC stages, by using the same code (in code repository) to deploy Dev/Test/Prod environments, by using environment variables to switch between target environments, backend services connection settings, credentials (keys, secrets, etc.), while using the same testing capabilities (such as static code analysis, vulnerable package detection, etc.)
Mistake #4 — Static vs. Dynamic Mindset
Traditional deployment had a static mindset. Applications were packed inside VMs, containing code/configuration, data, and unique characteristics (such as session IDs). In many cases architectures kept a 1:1 correlation between the front-end component, and the backend, meaning, a customer used to log in through a front-end component (such as a load-balancer), a unique session ID was forwarded to the presentation tier, moving to a specific business logic tier, and from there, sometimes to a specific backend database node (in the DB cluster).
Now consider what will happen if the front tier crashes or is irresponsive due to high load. What will happen if a mid-tier or back-end tier is not able to respond on time to a customer’s call? How will such issues impact customers’ experience having to refresh, or completely re-login again?
The cloud offers us a dynamic mindset. Workloads can scale up or down according to load. Workloads may be up and running offering services for a short amount of time, and decommission when not required anymore.
It is time to consider immutability. Store session IDs outside compute nodes (from VMs, containers, and Function-as-a-Service).
Still struggling with patch management? It’s time to create immutable images, and simply replace an entire component with a newer version, instead of having to pet each running compute component.
Use CI/CD processes to package compute components (such as VM or container images). Keep artifacts as small as possible (to decrease deployment and load time).
Regularly scan for outdated components (such as binaries and libraries), and on any development cycle update the base images.
Keep all data outside the images, on a persistent storage — it is time to embrace object storage (suitable for a variety of use cases from logging, data lakes, machine learning, etc.)
Store unique configuration inside environment variables, loaded at deployment/load time (from services such as AWS Systems Manager Parameter Store or Azure App Configuration), and for variables containing sensitive information use secrets management services (such as AWS Secrets Manager, Azure Key Vault, or Google Secret Manager).
Mistake #5 — Old observability mindset
Many organizations migrated workloads to the public cloud, still kept their investment in legacy monitoring solutions (mostly built on top of deployed agents), and shipping logs (from application, performance, security, etc.) from the cloud environment back to on-prem, without considering the cost of egress data from the cloud, or the cost to store vast amounts of logs generated by the various services in the cloud, in many cases still based on static log files, and sometimes even based on legacy protocols (such as Syslog).
It is time to embrace a modern mindset. It is fairly easy to collect logs from various services in the cloud (as a matter of fact, some logs such as audit logs are enabled for 90 days by default).
Time to consider cloud-native services — from SIEM services (such as Microsoft Sentinel or Google Security Operations) to observability services (such as Amazon CloudWatch, Azure Monitor, or Google Cloud Observability), capable of ingesting (almost) infinite amount of events, streaming logs and metrics in near real-time (instead of still using log files), and providing an overview of entire customers service (made out of various compute, network, storage and database services).
Speaking about security — the dynamic nature of cloud environments does not allow us to keep legacy systems scanning configuration and attack surface in long intervals (such as 24 hours or several days) just to find out that our workload is exposed to unauthorized parties, that we made a mistake leaving configuration in a vulnerable state (still deploying resources expose to the public Internet?), or kept our components outdated?
It is time to embrace automation and continuously scan configuration and authorization, and gain actionable insights on what to fix, as soon as possible (and what is vulnerable, but not directly exposed from the Internet, and can be taken care of at lower priority).
Mistake #6 — Failing to embrace cloud-native services
This is often due to a lack of training and knowledge about cloud-native services or capabilities.
Many legacy workloads were built on top of 3-tier architecture since this was the common way most IT/developers knew for many years. Architectures were centralized and monolithic, and organizations had to consider scale, and deploy enough compute resources, many times in advance, failing to predict spikes in traffic/customer requests.
It is time to embrace distributed systems, based on event-driven architectures, using managed services (such as Amazon EventBridge, Azure Event Grid, or Google Eventarc), where the cloud provider takes care of load (i.e., deploying enough back-end compute services), and we can stream events, and be able to read events, without having to worry whether the service will be able to handle the load.
We can’t talk about cloud-native services without mentioning functions (such as AWS Lambda, Azure Functions, or Cloud Run functions). Although functions have their challenges (from vendor opinionated, maximum amount of execution time, cold start, learning curve, etc.), they have so much potential when designing modern applications. To name a few examples where FaaS is suitable we can look at real-time data processing (such as IoT sensor data), GenAI text generation (such as text response for chatbots, providing answers to customers in call centers), video transcoding (such as converting videos to different formats of resolutions), and those are just small number of examples.
Functions can be suitable in a microservice architecture, where for example one microservice can stream logs to a managed Kafka, some microservices can be trigged to functions to run queries against the backend database, and some can store data to a persistent datastore in a fully-managed and serverless database (such as Amazon DynamoDB, Azure Cosmos DB, or Google Spanner).
Mistake #7 — Using old identity and access management practices
No doubt we need to authenticate and authorize every request and keep the principle of least privileged, but how many times we have seen bad malpractices such as storing credentials in code or configuration files? (“It’s just in the Dev environment; we will fix it before moving to Prod…”)
How many times we have seen developers making changes directly on production environments?
In the cloud, IAM is tightly integrated into all services, and some cloud providers (such as the AWS IAM service) allow you to configure fine-grained permissions up to specific resources (for example, allow only users from specific groups, who performed MFA-based authentication, access to specific S3 bucket).
It is time to switch from using static credentials to using temporary credentials or even roles — when an identity requires access to a resource, it will have to authenticate, and its required permissions will be reviewed until temporary (short-lived / time-based) access is granted.
It is time to embrace a zero-trust mindset as part of architecture decisions. Assume identities can come from any place, at any time, and we cannot automatically trust them. Every request needs to be evaluated, authorized, and eventually audited for incident response/troubleshooting purposes.
When a request to access a production environment is raised, we need to embrace break-glass processes, making sure we authorize the right number of permissions (usually for members of the SRE or DevOps team), and permissions will be automatically revoked.
Mistake #8 — Rushing into the cloud with traditional data center knowledge
We should never ignore our team’s knowledge and experience.
Rushing to adopt cloud services, while using old data center knowledge is prone to failure — it will cost the organization a lot of money, and it will most likely be inefficient (in terms of resource usage).
Instead, we should embrace the change, learn how cloud services work, gain hands-on practice (by deploying test labs and playing with the different services in different architectures), and not be afraid to fail and quickly recover.
To succeed in working with cloud services, you should be a generalist. The old mindset of specialty in certain areas (such as networking, operating systems, storage, database, etc.) is not sufficient. You need to practice and gain wide knowledge about how the different services work, how they communicate with each other, what are their limitations, and don’t forget — what are their pricing options when you consider selecting a service for a large-scale production system.
Do not assume traditional data center architectures will be sufficient to handle the load of millions of concurrent customers. The cloud allows you to create modern architectures, and in many cases, there are multiple alternatives for achieving business goals.
Keep learning and searching for better or more efficient ways to design your workload architectures (who knows, maybe in a year or two there will be new services or new capabilities to achieve better results).
Summary
There is no doubt that the public cloud allows us to build and deploy applications for the benefit of our customers while breaking loose from the limitations of the on-prem data center (in terms of automation, scale, infinite resources, and more).
Embrace the change by learning how to use the various services in the cloud, adopt new architecture patterns (such as event-driven architectures and APIs), prefer managed services (to allow you to focus on developing new capabilities for your customers), and do not be afraid to fail — this is the only way you will gain knowledge and experience using the public cloud.
About the author
Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry.
You can connect with him on social media (https://linktr.ee/eyalestrin).
Opinions are his own and not the views of his employer.
Checklist for designing cloud-native applications – Part 1: Introduction
This post was originally published by the Cloud Security Alliance.
When organizations used to build legacy applications in the past, they used to align infrastructure and application layers to business requirements, reviewing hardware requirements and limitations, team knowledge, security, legal considerations, and more.
In this series of blog posts, we will review considerations when building today’s cloud-native applications.
Readers of this series of blog posts can use the information shared, as a checklist to be embedded as part of a design document.
Introduction
Building a new application requires a thorough design process.
It is ok to try, fail, and fix mistakes during the process, but you still need to design.
Since technology keeps evolving, new services are released every day, and many organizations now begin using multiple cloud providers, it is crucial to avoid biased decisions.
During the design phase, avoid locking yourself to a specific cloud provider, instead, fully understand the requirements and constraints, and only then begin selecting the technology and services you will be using to architect your application’s workload.
Business Requirements
The first thing we need to understand is what is the business goal. What is the business trying to achieve?
Business requirements will impact architectural decisions.
Below are some of the common business requirements:
- Service availability – If an application needs to be available for customers around the globe, design a multi-region architecture.
- Data sovereignty – If there is a regulatory requirement to store customers data in a specific country, make sure it is possible to deploy all infrastructure components in a cloud region located in a specific country. Examples of data sovereignty services: AWS Digital Sovereignty, Microsoft Cloud for Sovereignty, and Google Digital Sovereignty
- Response time – If the business requirement is to allow fast response to customer requests, you may consider the use of API or caching mechanisms.
- Scalability – If the business requirement is to provide customers with highly scalable applications, to be able to handle unpredictable loads, you may consider the use of event-driven architecture (such as the use of message queues, streaming services, and more)
Compute Considerations
Compute may be the most important part of any modern application, and today there are many alternatives for running the front-end and business logic of our applications:
- Virtual Machines – Offering the same alternatives as we used to run legacy applications on-premise, but can also be suitable for running applications in the cloud. For most cases, use VMs if you are migrating an application from on-premise to the cloud. Examples of services: Amazon EC2, Azure Virtual Machines, and Google Compute Engine.
- Containers and Kubernetes – Most modern applications are wrapped inside containers, and very often are scheduled using Kubernetes orchestrator. Considered as a medium challenge migrating container-based workloads between cloud providers (you still need to take under consideration the integration with other managed services in the CSPs eco-system). Examples of Kubernetes services: Amazon EKS, Azure AKS, and Google GKE.
- Serverless / Functions-as-a-Service – Modern way to run various parts of applications. The underlying infrastructure is fully managed by the cloud provider (no need to deal with scaling or maintenance of the infrastructure). Considered as a vendor lock-in since there is no way to migrate between CSPs, due to the unique characteristics of each CSP’s offering. Examples of FaaS: AWS Lambda, Azure Functions, and Google Cloud Functions.
Data Store Considerations
Most applications require a persistent data store, for storing and retrieval of data.
Cloud-native applications (and specifically microservice-based architecture), allow selecting the most suitable back-end data store for your applications.
In a microservice-based architecture, you can select different data stores for each microservice.
Alternatives for persistent data can be:
- Object storage – The most common managed storage service that most cloud applications are using to store data (from logs, archives, data lake, and more). Examples of object storage services: Amazon S3, Azure Blob Storage, and Google Cloud Storage.
- File storage – Most CSPs support managed NFS services (for Unix workloads) or SMB/CIFS (for Windows workloads). Examples of file storage services: Amazon EFS, Azure Files, and Google Filestore.
When designing an architecture, consider your application requirements such as:
- Fast data retrieval requirements – Requirements for fast read/write (measures in IOPS)
- File sharing requirements – Ability to connect to the storage from multiple sources
- Data access pattern – Some workloads require constant access to the storage, while other connects to the storage occasionally, (such as file archive)
- Data replication – Ability to replicate data over multiple AZs or even multiple regions
Database Considerations
It is very common for most applications to have at least one backend database for storing and retrieval of data.
When designing an application, understand the application requirements to select the most suitable database:
- Relational database – Database for storing structured data stored in tables, rows, and columns. Suitable for complex queries. When selecting a relational database, consider using a managed database that supports open-source engines such as MySQL or PostgreSQL over commercially licensed database engine (to decrease the chance of vendor lock-in). Examples of relational database services: Amazon RDS, Azure SQL, and Google Cloud SQL.
- Key-value database – Database for storing structured or unstructured data, with requirements for storing large amounts of data, with fast access time. Examples of key-value databases: Amazon DynamoDB, Azure Cosmos DB, and Google Bigtable.
- In-memory database – Database optimized for sub-millisecond data access, such as caching layer. Examples of in-memory databases: Amazon ElastiCache, Azure Cache for Redis, and Google Memorystore for Redis.
- Document database – Database suitable for storing JSON documents. Examples of document databases: Amazon DocumentDB, Azure Cosmos DB, and Google Cloud Firestore.
- Graph database – Database optimized for storing and navigating relationships between entities (such as a recommendation engine). Example of Graph database: Amazon Neptune.
- Time-series database – Database optimized for storing and querying data that changes over time (such as application metrics, data from IoT devices, etc.). Examples of time-series databases: Amazon Timestream, Azure Time Series Insights, and Google Bigtable.
One of the considerations when designing highly scalable applications is data replication – the ability to replicate data across multiple AZs, but the more challenging is the ability to replicate data over multiple regions.
Few managed database services support global tables, or the ability to replicate over multiple regions, while most databases will require a mechanism for replicating database updates between regions.
Automation and Development
Automation allows us to perform repetitive tasks in a fast and predictable way.
Automation in cloud-native applications, allows us to create a CI/CD pipeline for taking developed code, integrating the various application components, and underlying infrastructure, performing various tests (from QA to securing tests) and eventually deploying new versions of our production application.
Whether you are using a single cloud provider, managing environments on a large scale, or even across multiple cloud providers, you should align the tools that you are using across the different development environments:
- Code repositories – Select a central place to store all your development team’s code, hopefully, it will allow you to use the same code repository for both on-prem and multiple cloud environments. Examples of code repositories: AWS CodeCommit, Azure Repos, and Google Cloud Source Repositories.
- Container image repositories – Select a central image repository, and sync it between regions, and if needed, also between cloud providers, to keep the same source of truth. Examples of container image repositories: Amazon ECR, Azure ACR, and Google Artifact Registry.
- CI/CD and build process – Select a tool to allow you to manage the CI/CD pipeline for all deployments, whether you are using a single cloud provider, or when using a multi-cloud environment. Examples of CI/CD build services: AWS CodePipeline, Azure Pipelines, and Google Cloud Build.
- Infrastructure as Code – Mature organizations choose an IaC tool to provision infrastructure for both single or multi-cloud scenarios, lowering the burden on the DevOps, IT, and developers’ teams. Examples of IaC: AWS CloudFormation, Azure Resource Manager, Google Cloud Deployment Manager, and HashiCorp Terraform.
Resiliency Considerations
Although many managed services in the cloud, are offered resilient by design by the cloud providers, consider resiliency when designing production applications.
Design all layers of the infrastructure to be resilient.
Regardless of the computing service you choose, always deploy VMs or containers in a cluster, behind a load-balancer.
Prefer to use a managed storage service, deployed over multiple availability zones.
For a persistent database, prefer a managed service, and deploy it in a cluster, over multiple AZs, or even better, look for a serverless database offer, so you won’t need to maintain the database availability.
Do not leave things to the hands of faith, embed chaos engineering experimentations as part of your workload resiliency tests, to have a better understanding of how your workload will survive a failure. Examples of managed chaos engineering services: AWS Fault Injection Service, and Azure Chaos Studio.
Business Continuity Considerations
One of the most important requirements from production applications is the ability to survive failure and continue functioning as expected.
It is crucial to design both business continuity in advance.
For any service that supports backups or snapshots (from VMs, databases, and storage services), enable scheduled backup mechanisms, and randomly test backups to make sure they are functioning.
For objects stored inside an object storage service that requires resiliency, configure cross-region replication.
For container registry that requires resiliency, configure image replication across regions.
For applications deployed in a multi-region architecture, use DNS records to allow traffic redirection between regions.
Observability Considerations
Monitoring and logging allow you insights into your application and infrastructure behavior.
Telemetry allows you to collect real-time information about your running application, such as customer experience.
While designing an application, consider all the options available for enabling logging, both from infrastructure services and from the application layer.
It is crucial to stream all logs to a central system, aggregated and timed synched.
Logging by itself is not enough – you need to be able to gain actionable insights, to be able to anticipate issues before they impact your customers.
It is crucial to define KPIs for monitoring an application’s performance, such as CPU/Memory usage, latency and uptime, average response time, etc.
Many modern tools are using machine learning capabilities to review large numbers of logs, be able to correlate among multiple sources and provide recommendations for improvements.
Cost Considerations
Cost is an important factor when designing architectures in the cloud.
As such, it must be embedded in every aspect of the design, implementation, and ongoing maintenance of the application and its underlying infrastructure.
Cost aspects should be the responsibility of any team member (IT, developers, DevOps, architect, security staff, etc.), from both initial service cost and operational aspects.
FinOps mindset will allow making sure we choose the right service for the right purpose – from choosing the right compute service, the right data store, or the right database.
It is not enough to select a service –make sure any service selected is tagged, monitored for its cost regularly, and perhaps even replaced with better and cost-effective alternatives, during the lifecycle of the workload.
Sustainability Considerations
The architectural decision we make has an environmental impact.
When developing modern applications, consider the environmental impact.
Choosing the right computing service will allow running a workload, with a minimal carbon footprint – the use of containers or serverless/FaaS wastes less energy in the data centers of the cloud provider.
The same thing when selecting a datastore, according to an application’s data access patterns (from hot or real-time tier, up to archive tier).
Designing event-driven applications, adding caching layers, shutting down idle resources, and continuously monitoring workload resources, will allow to design of an efficient and sustainable workload.
Sustainability related references: AWS Sustainability, Azure Sustainability, and Google Cloud Sustainability.
Employee Knowledge Considerations
The easiest thing is to decide to build a new application in the cloud.
The challenging part is to make sure all teams are aligned in terms of the path to achieving business goals and the knowledge to build modern applications in the cloud.
Organizations should invest the necessary resources in employee training, making sure all team members have the required knowledge to build and maintain modern applications in the cloud.
It is crucial to understand that all team members have the necessary knowledge to maintain applications and infrastructure in the cloud, before beginning the actual project, to avoid unpredictable costs, long learning curves, while running in production, or building a non-efficient workload due to knowledge gap.
Training related references: AWS Skill Builder, Microsoft Learn for Azure, and Google Cloud Training.
Summary
In the first blog post in this series, we talked about many aspects, organizations should consider when designing new applications in the cloud.
In this part of the series, we have reviewed various aspects, from understanding business requirements to selecting the right infrastructure, automation, resiliency, cost, and more.
When creating the documentation for a new development project, organizations can use the information in this series, to form a checklist, making sure all-important aspects and decisions are documented.
In the next chapter of this series, we will discuss security aspects when designing and building a new application in the cloud.
About the Author
Eyal Estrin is a cloud and information security architect, and the author of the book Cloud Security Handbook, with more than 20 years in the IT industry. You can connect with him on Twitter.
Opinions are his own and not the views of his employer.
Why choosing “Lift & Shift” is a bad migration strategy
One of the first decisions organizations make before migrating applications to the public cloud is deciding on a migration strategy.
For many years, the most common and easy way to migrate applications to the cloud was choosing a rehosting strategy, also known as “Lift and shift”.
In this blog post, I will review some of the reasons, showing that strategically this is a bad decision.
Introduction
When reviewing the landscape of possibilities for migrating legacy or traditional applications to the public cloud, rehosting is the best option as a short-term solution.
Taking an existing monolith application, and migrating it as-is to the cloud, is supposed to be an easy task:
- Map all the workload components (hardware requirements, operating system, software and licenses, backend database, etc.)
- Choose similar hardware (memory/CPU/disk space) to deploy a new VM instance(s)
- Configure network settings (including firewall rules, load-balance configuration, DNS, etc.)
- Install all the required software components (assuming no license dependencies exist)
- Restore the backend database from the latest full backup
- Test the newly deployed application in the cloud
- Expose the application to customers
From a time and required-knowledge perspective, this is considered a quick-win solution, but how efficient is it?
Cost-benefit
Using physical or even virtual machines does not guarantee us close to 100% of hardware utilization.
In the past organizations used to purchase hardware, and had to commit to 3–5 years (for vendor support purposes).
Although organizations could use the hardware 24×7, there were many cases where purchased hardware was consuming electricity and floor-space, without running at full capacity (i.e., underutilized).
Virtualization did allow organizations to run multiple VMs on the same physical hardware, but even then, it did not guarantee 100% hardware utilization — think about Dev/Test environments or applications that were not getting traffic from customers during off-peak hours.
The cloud offers organizations new purchase/usage methods (such as on-demand or Spot), allowing customers to pay just for the time they used compute resources.
Keeping a traditional data-center mindset, using virtual machines, is not efficient enough.
Switching to modern ways of running applications, such as the use of containers, Function-as-a-Service (FaaS), or event-driven architectures, allows organizations to make better use of their resources, at much better prices.
Right-sizing
On day 1, it is hard to predict the right VM instance size for the application.
When migrating applications as-is, organizations tend to select similar hardware (mostly CPU/Memory), to what they used to have in the traditional data center, regardless of the application’s actual usage.
After a legacy application is running for several weeks in the cloud, we can measure its actual performance, and switch to a more suitable VM instance size, gaining better utilization and price.
Tools such as AWS Compute Optimizer, Azure Advisor, or Google Recommender will allow you to select the most suitable VM instance size, but the VM still does not utilize 100% of the possible compute resources, compared to containers or Function-as-a-Service.
Scaling
Horizontal scaling is one of the main benefits of the public cloud.
Although it is possible to configure multiple VMs behind a load-balancer, with autoscaling capability, allowing adding or removing VMs according to the load on the application, legacy applications may not always support horizontal scaling, and even if they do support scale out (add more compute nodes), there is a very good chance they do not support scale in (removing unneeded compute nodes).
VMs do not support the ability to scale to zero — i.e., removing completely all compute nodes, when there is no customer demand.
Cloud-native applications deployed on top of containers, using a scheduler such as Kubernetes (such as Amazon EKS, Azure AKS, or Google GKE), can horizontally scale according to need (scale out as much as needed, or as many compute resources the cloud provider’s quota allows).
Functions as part of FaaS (such as AWS Lambda, Azure Functions, or Google Cloud Functions) are invoked as a result of triggers, and erased when the function’s job completes — maximum compute utilization.
Load time
Spinning up a new VM as part of auto-scaling activity (such as AWS EC2 Auto Scaling, Azure Virtual Machine Scale Sets, or Google Managed instance groups), upgrade, or reboot takes a long time — specifically for large workloads such as Windows VMs, databases (deployed on top of VM’s) or application servers.
Provisioning a new container (based on Linux OS), including all the applications and layers, takes a couple of seconds (depending on the number of software layers).
Invoking a new function takes a few seconds, even if you take into consideration cold start issues when downloading the function’s code.
Software maintenance
Every workload requires ongoing maintenance — from code upgrades, third-party software upgrades, and let us not forget security upgrades.
All software upgrade requires a lot of overhead from the IT, development, and security teams.
Performing upgrades of a monolith, where various components and services are tightly coupled together increases the complexity and the chances that something will break.
Switching to a microservice architecture, allows organizations to upgrade specific components (for example scale out, upgrade new version of code, new third-party software component), with small to zero impact on other components of the entire application.
Infrastructure maintenance
In the traditional data center, organizations used to deploy and maintain every component of the underlying infrastructure supporting the application.
Maintaining services such as databases or even storage arrays requires a dedicated trained staff, and requires a lot of ongoing efforts (from patching, backup, resiliency, high availability, and more).
In cloud-native environments, organizations can take advantage of managed services, from managed databases, storage services, caching, monitoring, and AI/ML services, without having to maintain the underlying infrastructure.
Unless an application relies on a legacy database engine, most of the chance, you will be able to replace a self-maintained database server, with a managed database service.
For storage services, most cloud providers already offer all the commodity storage services (from a managed NFS, SMB/CIFS, NetApp, and up to parallel file system for HPC workloads).
Most modern cloud-native services, use object storage services (such as Amazon S3, Azure Blob Storage, or Google Filestore), allowing scalable file systems for storing large amounts of data (from backups, and log files to data lake).
Most cloud providers offer managed networking services for load-balancing, firewalls, web application firewalls, and DDoS protection mechanisms, supporting workloads with unpredictable traffic.
SaaS services
Up until now, we mentioned lift & shift from the on-premise to VMs (mostly IaaS) and managed services (PaaS), but let us not forget there is another migration strategy — repurchasing, meaning, migrating an existing application, or selecting a managed platform such as Software-as-a-Service, allowing organizations to consume fully managed services, without having to take care of the on-going maintenance and resiliency.
Summary
Keeping a static data center mindset, and migrating using “lift & shift” to the public cloud, is the least cost-effective strategy and in most chances will end up with medium to low performance for your applications.
It may have been the common strategy a couple of years ago when organizations just began taking their first step in the public cloud, but as more knowledge is gained from both public cloud providers and all sizes of organizations, it is time to think about more mature cloud migration strategies.
It is time for organizations to embrace a dynamic mindset of cloud-native services and cloud-native applications, which provide organizations many benefits, from (almost) infinite scale, automated provisioning (using Infrastructure-as-Code), rich cloud ecosystem (with many managed services), and (if managed correctly) the ability to suit the workload costs to the actual consumption.
I encourage all organizations to expand their knowledge about the public cloud, assess their existing applications and infrastructure, and begin modernizing their existing applications.
Re-architecture may demand a lot of resources (both cost and manpower) in the short term but will provide an organization with a lot of benefits in the long run.
References:
- 6 Strategies for Migrating Applications to the Cloud
- Overview of application migration examples for Azure
- Migrate to Google Cloud
About the Author
Eyal Estrin is a cloud and information security architect, and the author of the book Cloud Security Handbook, with more than 20 years in the IT industry. You can connect with him on Twitter.
Opinions are his own and not the views of his employer.
Embracing Cloud-Native Mindset

This post was originally published by the Cloud Security Alliance.
The use of the public cloud has become the new norm for any size organization.
Organizations are adopting cloud services, migrating systems to the cloud, consuming SaaS applications, and beginning to see the true benefits of the public cloud.
In this blog post, I will explain what it means to embrace a cloud-native mindset.
What is Cloud-Native?
When talking about cloud-native, there are two complimentary terms:
- Cloud-Native Infrastructure — Services that were specifically built to run on public cloud environments, such as containers, API gateways, managed databases, and more.
- Cloud-Native applications — Applications that take the full benefits of the public cloud, such as auto-scaling (up or down), microservice architectures, function as a service, and more.
Cloud First vs. Cloud-Native
For many years, there was a misconception among organizations and decision-makers, should we embrace a “cloud first” mindset, meaning, any new application we develop or consume must reside in the public cloud?
Cloud-first mindset is no longer relevant.
Cloud, like any other IT system, is meant to support the business, not to dictate business decisions.
One of the main reasons for any organization to create a cloud strategy is to allow decision-makers to align IT capabilities or services to business requirements.
There might be legacy systems generating value for the organization, and the cost to re-architect and migrate to the cloud is higher than the benefit of migration — in this case, the business should decide how to manage this risk.
When considering developing a new application or migrating an existing application to the cloud, consider the benefits of cloud-native (see below), and in any case, choosing the cloud makes sense (in terms of alignment to business goals, costs, performance, etc.), make it your first choice.
What are the benefits of Cloud-Native?
Since we previously mentioned cloud-native, let us review some of the main benefits of cloud-native:
Automation
One of the pre-requirements of cloud-native applications is the ability to deploy an entire workload in an automated manner using Infrastructure as Code.
In cloud environments, IaC comes naturally, but do not wait until your workloads are migrated or developed in the cloud — begin automating on-premise infrastructure deployments using scripts today.
Scale
Cloud-native applications benefit from the infinite scale of the public cloud.
Modern applications will scale up or down according to customers’ demand.
Legacy environments may have the ability to add more virtual machines in case of high load, but in most cases, they fail to release unneeded compute resources when the load on the application goes down, increasing resource costs.
Microservice architecture
One of the main benefits of cloud-native applications is the ability to break down complex architecture into small components (i.e., microservices)
Microservices allows development teams to own, develop, and maintain small portions of an application, making upgrading to newer versions an easy task.
If you are building new applications today, start architecting your applications using a microservices architecture, regardless if you are developing on-premise or in the public cloud.
It is important to note that microservices architecture increases the overall complexity of an application, by having many small components, so plan carefully.
Managed services
One of the main benefits when designing applications (or migrating an existing application) in the cloud, is to gain the benefit of managed services.
By consuming managed services (such as managed databases, storage, API gateways, etc.), you shift the overall maintenance, security, and stability to the cloud provider, which allows you to consume a service, without having to deal with the underlying infrastructure maintenance.
Whenever possible, prefer to choose a serverless managed service, which completely removes your requirement to deal with infrastructure scale (you simply do not specify how much computing power is required to run a service at any given time).
CI/CD pipeline
Modern applications are developed using a CI/CD pipeline, which creates a fast development lifecycle.
Each development team shares its code using a code repository, able to execute its build process, which ends up with an artifact ready to be deployed in any environment (Dev, Test, or Prod).
Modern compute services
Cloud-native applications allow us to have optimum use of the hardware.
Compute services such as containers and function as a service, make better use of hardware resources, when compared to physical or even virtual machines.
Containers can run on any platform (from on-premise to cloud environments), and although it may take some time for developers and DevOps to learn how to use them, they can suit most workloads (including AI/ML), and be your first step in embracing cloud-native applications.
Function as a Service is a different story — they suit specific tasks, and in most cases bound to a specific cloud environment, but if used wisely, they offer great efficiency when compared to other types of compute services.
Summary
What does it mean to embrace a cloud-native mindset?
Measuring the benefits of cloud-native applications, consuming cloud-native services, looking into the future of IT services, and wisely adopting the public cloud.
Will the public cloud suit 100% of scenarios? No, but it has more benefits than keeping legacy systems inside traditional data centers.
Whether you are a developer, DevOps, architect, or cybersecurity expert, I invite you to read, take online courses, practice, and gain experience using cloud-native infrastructure and applications, and consider them the better alternatives for running modern applications.
About the Author
Eyal Estrin is a cloud and information security architect, and the author of the book Cloud Security Handbook, with more than 20 years in the IT industry. You can connect with him on Twitter.
Opinions are his own and not the views of his employer.
Security challenges with SaaS applications

This post was originally published by the Cloud Security Alliance.
According to the Shared Responsibility Model, “The consumer does not manage or control the underlying cloud infrastructure”.
As customers, this leaves us with very little control over services managed by remote service providers, as compared to the amount of control we have over IaaS (Infrastructure as a Service), where we control the operating system and anything inside it (applications, configuration, etc.)
The fact that many modern applications are offered as a SaaS, has many benefits such as:
- (Almost) zero maintenance (we are still in charge of authorization)
- (Almost) zero requirements to deal with availability or performance issues (depending on business requirements and the maturity of the SaaS vendor)
- (Almost) zero requirement to deal with security and compliance (at the end of the day, we are still responsible for complying with laws and regulations and we still have obligations to our customers and employees, depending on the data classification we are about to store in the cloud)
- The minimum requirement to handle licensing (depending on the SaaS pricing offers)
- As customers, we can consume a service and focus on our business (instead of infrastructure and application maintenance)
While there are many benefits of switching from maintaining servers to consuming (SaaS) applications, there are many security challenges we need to be aware of and risks to control.
In this blog post, I will review some of the security challenges facing SaaS applications.
Identity and Access Management
We may not control the underlining infrastructure, but as customers, we are still in charge of configuring proper authentication and authorization for our customers (internal or external).
As customers, we would like to take advantage of our current identities and leverage a federation mechanism to allow our end-users to log in once and through SSO to be able to access the SaaS application, all using standard protocols such as SAML, OAuth, or OpenID Connect.
Once the authentication phase is done, we need to take care of access permissions, following the role description/requirement.
We must always follow the principle of least privilege.
We should never accept a SaaS application that does not support granular role-based access control.
While working with SaaS applications, we need to make sure we can audit who had access to our data and what actions have been done.
The final phase is to make sure access is granted by business needs – once an employee no longer needs access to a SaaS application, we must revoke the access immediately.
Data Protection
Once we are using SaaS applications, we need to understand we no longer have “physical” control over our data – whether it is employee’s data, customers’ data, intellectual property, or any other type of data.
Once data is stored and processed by an external party, there is always a chance for a data breach, that may lead to data leakage, data tampering, encryption by ransomware, and more.
If we are planning to store sensitive data (PII’s, financial, healthcare, etc.) in the cloud, we must understand how data is being protected.
We must make sure data is encrypted both in transit and at rest (including backups, logs, etc.) and at any given time, access to data by anyone (from our employees, SaaS vendor employees, or even third-party companies), must be authenticated, authorized, and audited.
Misconfiguration
The most common vulnerability is misconfiguration.
The easiest way is for an employee with administrative privileges to make a configuration mistake and grant someone unnecessary access permissions, make data publicly available, forget to turn encryption at rest on (depending on specific SaaS applications), and more.
Some SaaS applications allow you to set configuration control using CASB (Cloud Access Security Brokers) or SSPM (SaaS Security Posture Management).
The problem is the lack of standardization in the SaaS industry.
There is no standard for allowing central configuration management using APIs.
If you are using common SaaS applications such as Office 365, Dropbox, SalesForce, or any other common SaaS application, you may be able to find many third-party security solutions that will allow you to mitigate misconfiguration.
Otherwise, if you are working with a small start-up vendor or with an immature SaaS vendor, your only options are a good legal contract (defining the obligations of the SaaS vendor), demand for certifications (such as SOC2 Type II reports) and accepting the risk (depending on the business risk tolerance).
Insecure API’s
Many SaaS applications allow you to connect using APIs (from audit logs to configuration management).
Regardless of the data classification, you must always make sure your SaaS vendor’s APIs support the following:
- All APIs require authentication and perform a back-end authorization process.
- All traffic to the API is encrypted in transit
- All-access to API is audited (for further analysis)
- If the SaaS application allows traffic initiation through API back to your organization, make sure you enforce input validation to avoid inserting malicious code into your internal systems
I recommend you never rely on third-party SaaS vendors – always coordinate penetration testing on exposed APIs to mitigate the risk of insecure APIs.
Third-Party Access
Some SaaS vendors allow (or rely on) third-party vendor access.
When conducting due diligence with SaaS vendors, make sure to check if it allows any third-party vendor access to customers’ data and how is data protected.
Also, make sure the contract specifies if data is transferred to third-party vendors, who are they and for which purpose.
Make sure everything is written in the contract with the SaaS vendor and that the SaaS vendor must notify you of any change regarding data access or transfer to third-party vendors.
Patch Management and System Vulnerabilities
Since we are only consumers of a managed service, we have no control or visibility to infrastructure or application layers.
Everything is made of software and software is vulnerable by design.
We may be able to coordinate vulnerability scanning or even short-term penetration testing with the SaaS vendor (depending on the SaaS vendor maturity), but we are still dependent on the transparency of the SaaS vendor and this is a risk we need to accept (depending on the business risk tolerance).
Lack of SaaS Vendor Transparency
This is very important.
Mature SaaS vendors will make sure we are up to date with information such as breach notifications, outages, and scheduled maintenance (at least when everybody on the Internet talks about critical software vulnerabilities requiring immediate patching, and assuming downtime is required).
As part of vendor transparency, I would expect the legal contract to force the SaaS vendor to keep us up to date with data breach incidents or potential unauthorized access to customers’ data.
Since in most cases, we do not have a real way to audit SaaS vendors’ security controls, I recommend working only with mature vendors who can provide proof of their maturity level (such as SOC 2 Type II reports every year) and coordinate your assessments on the SaaS vendor.
Mature SaaS vendors will allow us access to audit logs, to query who has access to our data and what actions have been done with the data.
Regulatory Compliance
Regardless of the cloud service model, we are always responsible for our data and we must always comply with laws and regulations, wherever our customers reside or wherever our SaaS vendor stores our data.
Mature SaaS vendors allow us to comply with data residency and make sure data does not leave a specific country or region.
Compliance goes for the entire lifecycle of our data – from upload/store, process, data backup or retention, to finally data destruction.
Make sure the legal contract specifies data residency and the vendor’s obligations regarding compliance.
From a customer’s point of view, make sure you get legal advice on how to comply with all relevant laws and regulations.
Summary
In this blog post, I have reviewed some of the most common security challenges working with SaaS applications.
SaaS applications have many benefits (from a customer point of view), but they also contain security risks that we need to be aware of and manage regularly.
Digital Transformation in the Post-Covid Era

In 2020, the world has suddenly stopped due to the pandemic.
A couple of years later, we began to see changes in the way both home consumers and organizations are using technology.
Common areas that have changed in the post-covid era
Here are a couple of areas that adapted in the post-covid era:
Customer support
Traditional engagement methods (such as phone calls, Fax, or even emails) have decreased in the past couple of years.
Today, customers are looking for fast and mobile methods to connect to contact centers from anywhere – from using mobile apps to connecting via chat (in some cases even a chatbot is a viable solution).
The use of mobile apps
Customers are using mobile apps for more than a decade – from social networks (for personal interaction), e-commerce (for purchasing products), banking (checking account status, money transfer, etc.), travel (for ordering flight tickets or hotels), and more.
The use of mobile apps is not something new, but in the past couple of years, we see customers using mobile for almost every step of their daily life.
The use of the public cloud
The public cloud has been in use for almost two decades, but during the Covid, more and more organizations began to see the benefits of the cloud and began migrating systems to the cloud.
It is true, that there are still organizations choosing to invest in maintaining data centers for running their applications, but as time goes by, more and more organizations are embracing the public cloud.
The fact that we pay for the resources we consume, and the (almost) infinite compute capacity, made the cloud very attractive to most organizations around the world – from large corporations to newly founded start-ups.
Hybrid work and work-life balance
In the post-covid era, more and more organizations are offering their employees the option of choosing whether to work from the office or from anywhere else, as long as the employee gets their work done.
Employers understand the importance of work-life balance and began to respect employee’s personal life, which decrease the levels of stress and creates satisfied and productive employees.
The use of AI
For many years, researchers have tried to teach computers how to support people in decision-making.
In the past couple of years, we see AI/ML solutions for almost any area in human lives.
From advising on which music should listen to (based on their past listening history), aiding doctors in providing better medical care to patients (based on their health status and technological improvements), using AI to quickly calculate customers’ credit scores and be able to offer customers with relevant investment plans, and more.
How can organizations prepare for the digital transformation?
There are various areas where organizations should adapt and better prepare themselves for the digital transformation:
Customer-centric
Organizations should change their mindset and put their customers first.
Conduct customer surveys and research what can provide your customers the most value, from better customer service to an easy-to-navigate mobile app or anything that will keep your customers satisfied.
Be transparent with your customers, for example – if your organization is collecting personal data from customers, provide them with information about the data you are collecting, the purpose and what are you planning to do with the data, and allow your customers to choose whether to provide you their data.
Keep your employees engaged
Explain to your employees about the coming changes, allow them to provide their feedback, and be part of the process.
As technology evolves, employees would like to re-invent themselves or even choose a different career path.
An organization should support its employees and find ways to allow employees to expand their knowledge or even switch to a different role within the organization.
Conduct training, allowing employees to expand their knowledge (from new ways to interact with customers, new technologies, new or modern development languages, and more).
Allow your employees the ability to combine work from the office with remote work from home, to support employees’ work-life balance.
Embrace the public cloud
No matter how professional your employees are, most of the chance your organization will never have the expertise of the public cloud providers, the scale, or the elasticity that the public cloud allows your organization and customers to get.
Develop a cloud strategy that clearly defines what workloads or data can be migrated to the cloud and begin to modernize your applications.
Modernize your applications
Your organization may have many applications, already serving you and your customers.
Now is the time to ask yourself, what applications are still providing you value and what applications can be modernized or re-architect and provide better usability, higher availability, and elasticity with lower cost.
Consider embracing cloud-native applications and gain the full benefit of the public cloud.
Summary
Digital transformation is disrupting the way home consumers and organizations are using technology to make everyday life better.
Every day we find new ways to consume information, purchase products, get better healthcare or financial services, or even better ways to conduct business and interact with our customers.
To embrace digital transformation, we need to adapt to the change.
If you have not done so yet, now is the time to jump on the digital transformation train.