Key considerations & best practices of AWS cloud native applications
Millions of customers leverage AWS cloud products and solutions to build sophisticated applications with increased flexibility, scalability and reliability.
AWS has established itself as a leading platform for enterprise development teams with a variety of command-line tools and software development kits (SDKs) to deploy and manage applications and services. AWS SDKs are available for any number of platforms and programming languages including Java, PHP, Python, Node.js, Ruby, C++, Android and iOS.
Developed and deployed properly, cloud native applications provide substantial benefits including high-level services, greater elasticity, on-demand delivery, and streamlined global deployment.
The AWS development platform is aligned entirely with the cloud native approach and all its benefits.
CNCF Cloud Native Definition v1.0
AWS provides scalable platform, integration, development and deployment support to web centric programming languages like Ruby, NodeJS, JavaScript, CSS, HTML, Python and more. It also provides platforms like Elastic Bean Stalk to build and run this code.
Cloud applications require a mechanism to scale up and down as web traffic increases or decreases, thereby making it flexible and dynamic in nature. AWS-ECS, Autoscaling, and AWS-EKS provide full-featured functionality to address dynamic scalability in a cloud native stack.
AWS SNS helps to coordinate the delivery of push notifications for apps, to either subscribing endpoints, or clients. AWS SQS is a message queuing system that helps developers de-couple as well as scale distributed systems, serverless apps, and microservices that are deployed on cloud native stacks. AWS Cognito facilitates the control of user authentication on the cloud native stack.
Cloud Native Applications require storage and network capability that is mutable and flexible in nature. AWS S3 storage helps to provide online backup as well as archive data and application programs. AWS VPC, Subnets, and Elastic IP help to build flexible network segmentation over cloud space to keep it secure and safe from external non-authorized access.
Amazon ECS (Elastic Container Service) is a highly scalable, high-performance container orchestration service that supports Docker containers that enable run and scale containerized cloud applications on the cloud native stack. It is containers without servers making it secure and cost effective. In addition, Amazon EKS runs the Kubernetes management infrastructure across multiple availability zones to eliminate a single point of failure.
AWS also offers CodeDeploy, CodeCommit and CodePipeline to manage code in the cloud native stack. CodeDeploy helps to deploy code at scale, with a focus on rapid development and rapid deployment in mission-critical situations where the cost of failure is high. Code Commit is a managed revision control service that hosts Git repositories and works with all Git-based tools and CodePipeline is used to model and automate software release process on the cloud native stack.
CodebaseMain objective of this factor is to have all cloud application code in revision control – If events are shared (such as a common Amazon API Gateway API), then the Lambda function code can be used for those events to put in the same repository. Otherwise, AWS Lambda break functions can be used along with event sources into their own repositories.
DependenciesTo build special processing or business logic the best solution may be to create a purposeful library using following languages – Node.js, Python: pip, Java: Maven, C#: NuGet and Go: Go get Packages.
ConfigAWS Lambda and AWS API Gateway can be used to set configuration information using the environment in which each service runs.
Backing ServicesAWS Lambda has a default model for this factor. Typically, it is possible to reference any database or data store as an external resource via HTTP endpoint or DNS name.
Build, Release, RunAWS CodeDeploy, CodeCommit and CodePipeline can be utilized for this factor.
Port BindingAWS Lambda can be used with either of these three invocation modes – Synchronous, Asynchronous and Stream-based.
ConcurrencyAWS Lambda can be used which has massive concurrency and scaling capacity.
Stateless ProcessesAWS Lambda functions can be used and treated as being stateless, despite the ability to potentially store some information locally between execution environment re-use.
DisposabilityAWS best practices help to identify where to place certain logic, how to re-use execution environments, and how by configuring function for more memory to get a proportional increase in CPU available. With AWS X-Ray, it is possible to gather some insight as to what a function is doing during an execution and to make adjustments accordingly.
Environment ParityAWS serverless application model allows users to model serverless applications in greatly simplified AWS CloudFormation syntax. With this model, it can use CloudFormation’s capabilities – such as Parameters and Mappings to build dynamic templates.
LogsAWS Cloud watch logs as well as API Gateway provides two different method of logs – 1) Execution logs 2) Access logs.
Admin ProcessesTypically, cloud functions are scoped down to single or limited use cases and have individual functions for different components of the web application. Even if they share a common invoking resource, such as an API Gateway endpoint and stage, it is possible to separate the individual API resources and actions to their own Lambda functions, so it is possible to build design that meets the requirement.
Functionally, there should be no change in the test approach between web-based and cloud environment. However, more focus may be required in certain aspects such as performance, security, environment availability, integration with interfaces, customer experience and adherence to SLAs when testing in a cloud-based environment.
The test strategy and test plan document may also change depending on the service models and implementation models selected. Choice of tools and techniques should be similar to that of web-based applications, however, during tool selection it is important to make sure that they are available on cloud and that they support the pay-as-you-use model.
CI Job:
CI Job:
CI Job:
Unit/Component Testing
Testing Framework – JUnit, NUnit, RSpec
Automation Tools: Pact/QMetry Automation Framework
1. Boundary condition
2. Status code and Status message verification
3. Positive and Negative business flows
Integration Testing
Testing Framework – JUnit, NUnit, RSpec
Automation Tools: Pact/QMetry Automation Framework
1. Response chaining
2. Status code and Status message verification at the end level
3. Session management
4. Positive and Negative scenarios with respect to third party
5. Consumer scenarios to verify nothing is breaking after service deployment
Contract Testing
Automation Tools: Pact/QMetry Automation Framework
1. Each consumer needs to verify each contract
• Key and value pair
• Value type
2. Positive and Negative contact verification
End-user business scenarios
Validation of the data breach (data swap)
Multi-user/Concurrent testing
Validation of the session continuity
Validation of auto scaling
Proper FPT Testing The table below compares the differences and impact of a proper FPT approach in most enterprises.
1.Very limited functional testing under load (eg. Data Swap)
2.Very limited multi user testing (in Appt Center Booking Flow)
3.Lack of awareness among team members on edge conditions/corner conditions that resulted
a) Potential high risk for bad member experience
b) Compromise on member PHI Data
4.Lack of enforcement as a practice across the squad members
1.Formalized and documented approach for Functional Performance Testing (FPT)
2.FPT Suite contains collection of tests including data breach validations
3.Functional verification of business use cases while application under load
4.Enforcement of FPT via DoD Checklist (Squad level and Release QA checklist)
FPT Testing Types
1. Data Swap Validation (multi-user, negative concurrency)
Helps in validating the data integrity for the end-user data
Validate the data integrity of the end-users with single user or multi-user.
Multi User Testing → Up to 5 TPS for Service Layer via Live Services
Concurrent Users (High Traffic) → Up to 50 TPS for Service Layer via Virtualized Services
2. Auto-Scaling Policy Validation
Helps in identifying the potential issues related to data loss, session management, session continuity, performance degradation during the auto scaling of the infrastructure
Verify the application behavior is working as expected when infrastructure resources are updated via auto scaling (scale in or scale out) policy
Security is absolutely key when working in the cloud and there are a number of important factors when deciding how to approach it.
There are two types of Security to focus on – User-Based and Resource-Based.
User-Based Security
Is managed by IAM, which helps determine what kinds of API calls should be allowed from a specific user according to their authorization level.
User-Based Security
Depends on every resource, according to its communication method and data usage method.
Why Load and Performance Test?
There are two types of Security to focus on – User-Based and Resource-Based.
Does the application respond quickly enough for the number of users?
Will the application handle load beyond expected user demand?
Is the application stable under expected and unexpected user load?
Do we believe the application will handle the required amount of load?
Tests the application for its performance on normal and peak usage by checking the response to user requests consistently within accepted tolerance.
Key Considerations
Involves testing beyond normal operational capacity, often to a breaking point, in order to observe the results. This kind of test is started with good load test for which the application has already been tested. The load is increased slowly up to the point where the system fails to respond.
Key Considerations
Involves increasing the system load rapidly using bursts of concurrent users to check behavior of the system. The goal is to determine whether performance will suffer, the system will fail, or if it is able to handle dramatic changes in load.
Key Considerations
Involves huge volumes of data exchanged in database and testing its relation to performance of application. This can be performed in short and long duration.
Key Considerations
Also known as Soak Testing, is typically done to determine if the system can sustain the continuous expected load. During tests, memory utilization is monitored to detect potential leaks. For example, in software testing, a system may behave exactly as expected when tested for 1 hour but when the same system is tested for 3 hours, problems such as memory leaks cause the system to fail or behave randomly.
AWS Load and Performance Testing: Parameters
Response Time – the total amount of time it takes to respond to a request for service. That service can be anything from a memory fetch, to a disk IO, to a complex database query, or loading a full web page.
Throughput – calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
Latency – the measure of responsiveness that represents the time taken to complete execution of a request. It can also represent sum of several latencies or subtasks.
90% Line (Percentile) – The 90th percentile is the value for which 90% of the data points are smaller. This is a standard statistical measure.
Baseline – A set of tests run to capture performance metric data for the purpose of evaluating the effectiveness of subsequent tuning activities performed to the application.
Benchmarking – a process of comparing system performance against the baseline that is created internally or against an industry standard recognized by another organization.
Transaction Response Time – represents the time taken for the application to complete a defined transaction or business process.
Hits per Seconds – Hits are requests of any kind made from the virtual client to the application being tested (Client to Server). It is measured by number of Hits and the higher the Hits Per Second, the more requests the application is handling per second.
Throughput – The amount of data transferred across the network is called throughput. It considers the amount of data transferred from the server to client only and is measured in bytes/sec.
Collecting key project information including performance testing goals (expected users, response time, etc.), required credentials, application URL, tool preference, etc.
Creating complete test plans that incorporate objectives, scope, approach, and focus of the software testing effort. Test planning includes planning load test with the reference of Capacity Planning conducted at the time of Application Development. Capacity Planning helps determine what type of hardware and software configuration is required to meet application needs, and how many resources an application uses. The main goal is to identify the right amount of resources required to meet service demands now and in the future.
In this step, actual script is created per the business flows using Performance Testing tools such as JMeter, Load Runner, etc. A sample execution with 1-5 users is typically done to verify created script.
Users execute created script with provided parameters (such as # of user’s, ramp-up, ramp-down time, etc.). It requires monitoring servers for memory utilization, CPU utilization, Disk I/O, etc.
In this phase, users take the test results and perform analysis and create a report which identifies performance bottlenecks such as memory leak, server errors, response time, etc.
The above reports enable users to fix these bottlenecks and continue to tune the application until specified goals are met.
aimed at preventing an event from occurring
aimed at fixing a system in case of a negative event or disaster
aimed at detecting and discovering negative events
The terms are similar, but fundamentally different. RTO is the maximum length of time after an outage that your company is willing to wait for the recovery process to finish. RPO is the maximum amount of data loss your company is willing to accept as measured in time.
Key steps for backup and recovery:
Key steps in the Preparation Phase include:
Key steps include:
Key steps for recovery in a Multi-Site DR approach include:
Distance between the sites – longer distances are typically subject to more latency or jitter
Available bandwidth – the breadth and variability of the interconnections
Data rate required by the application – should be lower than the available bandwidth
Replication technology – should be parallel (so that it can use the network effectively)
Working together, Infostretch and AWS are helping enterprises leverage the cloud to develop, test and deploy new digital products and services more quickly, flexibly, and efficiently while assuring the highest quality and service levels. We help delivery organizations unearth the full potential of cloud computing by enabling them to capitalize on the agility and security that DevOpsSecTest brings to accelerate their digital transformation initiatives.
Our close collaboration with our AWS partner team brings value to clients’ cloud migration journey at every phase of the project covering design, build, test and delivery. We utilize time-tested frameworks, standards and procedures that are instrumental to realizing cloud migration objectives. Our experience and expertise covers a broad cross-section of industry verticals such as wearables, digital health, and IoT for the gaming industry to name a few and wide variety of technical domains such as DevOpsSecTest, IoT, and micro-services.
Our AWS cloud hosting and software services are primarily focused on architecting and implementing in four key areas: Re-hosting, Re-Platforming, Replacing, and Re-factoring. We help customers plan their migration path to the AWS Cloud based on actual system conditions and criticality to operations by analyzing various aspects of their environment including security, networking, reliability and failover, services and support requirements, billing, licensing, transfer cost and 3rd- party tool support.
Our cloud migration plan is focused on identifying the mission and business-critical functions of the customer and relaying that to a delivery roadmap that can produce time-bound expected ROI.
The key pillars of our delivery roadmap include:
Our project plan for AWS cloud hosting and related software services are classified into three areas: Gap Analysis; Migrate & Adopt; and Sustain & Scale. The planning process helps customers build collaboration between product-as-a-service, data, and cloud services to take full advantage of the scalability, cost and agility advantages of the AWS platform.
Infostretch helps customers classify their applications on a spectrum of complexity and criticality based on direct impact to business-critical functions. We also develop migration plans that cater to each customer’s specific business needs. In addition, Infostretch’s cloud factory model and readily available accelerators help customers gain a holistic picture of “what it really takes” in terms of resources, skills, effort, timelines and risks as they progress on their AWS cloud journey.
We would love to hear more about your project.
Even a short phone call can help us explain how our solutions can accelerate your mobility, jump start your continuous delivery and help reduce costs. And that’s just for starters, understanding more about your project will enable us to build a solution that fits your objectives, infrastructure and aspirations!
Contact us