Implementing a QA Environment on IBM Cloud | stack.io

Introduction

We recently completed a project for a North American-based organization that specializes in providing VoIP services for enterprise clients that process telephone payment services. Rooted in the context of a rapidly evolving technological landscape, they were eager to transition from their traditional application infrastructure into a more agile and scalable model. They aspired to:

Construct a QA environment using Kubernetes: They already had a foundation with Docker, but their structure leaned heavily towards monolithic applications.
Migrate from their conventional model of local testing: They wanted to deploy directly to production using a contemporary CI pipeline.

Their traditional development cycle, optimized for past challenges now faced the pressures of modern demands. It was clear that adapting to the new era would be pivotal for their continued success.

Background

A popular adoption of K8s often involves breaking down a monolithic application into microservices. This shift provides:

Cost Savings: More efficient resource utilization as individual components can be scaled based on demand.
High Availability: The failure of one microservice doesn't necessarily bring down the entire application.
Optimized Scalability: Only scale the necessary parts of the application, rather than the entire monolith.

In this scenario, the client was indeed operating a legacy monolithic application. Their deployment process was largely manual, which was both time-consuming and prone to human errors. Introducing K8s would allow them to automate many of these steps via an API, integrating seamlessly into a Continuous Integration (CI) process. This modern approach would not only reduce manual interventions but also significantly speed up deployment times.

When given the option of moving to microservices, the client opted to retain their legacy setup when adopting K8s. This choice was anchored in several critical business considerations:

Resource Implications: Refactoring the app would demand significant engineering efforts and costs.
Business Focus: The time taken to refactor could detract from other urgent business priorities and revenue-generating activities.
Change Overhead: A shift to microservices would necessitate training staff, which incurs additional costs and time.
Operational Risks: With any major change comes the potential for disruptions and unforeseen issues. The client was wary of these risks, especially in their established operations.

After a thorough evaluation, the client found that the immediate and potential challenges of refactoring outweighed its benefits.

Drawing upon our expertise and the client's insights, we successfully implemented solutions such as:

Integrating IBM Cloud FileStorage (FS) for VPC - this is equivalent to AWS’ EFS offering
Routing Session Initiation Protocol SIP and Real-Time Transport RTP traffic within the Kubernetes cluster
Pioneering a self-service CI that respects the legacy system's necessities

This case study sheds light on the challenges, our approach, and the resulting benefits.

Challenges
Adapting to IBM Cloud's Intricacies: Leveraging IBM Cloud was a strategic choice by the client, grounded in specific business considerations. However, the application design was initially configured for specific on-premise requirements, meaning there would be some work required to get it working with IBM.
Implementing SIP: SIP is stateful, while Kubernetes was mainly designed for distributed workloads. SIP is also latency-sensitive; every time we add something in the way (a pod, a proxy, a NAT device - either from Kubernetes or the cloud environment itself), we are increasing latency and potentially creating a problem that we would need to work around.
Layer of Complexity with Storage Communication: The client's applications had a unique mechanism of inter-communication. Instead of employing a queue or a messaging system, they utilized shared storage volumes. This design meant that multiple applications could access and share data through a common storage medium. While this was a cost-effective method for them initially, it presented challenges when attempting to integrate with a Kubernetes environment.
Ensuring Manageability and Cost-effectiveness: The client's focus on cost-effectiveness manifested in unique monitoring strategies, like leveraging two different providers for similar purposes. This dual approach was indeed cost-effective but required a tailored strategy to ensure seamless monitoring.

Approach

Our collaboration with the client was marked by a series of intricate challenges, some rooted in IBM's system and others intrinsic to the client's legacy structures. Yet, each hurdle presented an opportunity to innovate and tailor solutions to the client's unique needs.

IBM FS for VPC Integration

Challenge: The client's dependency on shared volumes brought us to IBM's managed NFS offering, the IBM FS for VPC. We were met with the revelation that, though managed, the service wasn't inherently multi-zone. This presented a potential bottleneck for ensuring high-availability (HA) for both the applications and the volumes.
What We Did: The service was not general availability and no solution existed, thus, we opted for a more hands-on approach. We launched an NFS Provisioner, an open-source Kubernetes project to help manage NFS volumes, within the IBM Kubernetes Service (IKS) Cluster. The NFS Provisioner was backed by IBM BlockStorage volumes managed by the IKS Cluster (IBM BlockStorage volumes are equivalent to the AWS EBS volumes). This ensured the much-needed availability of volumes within the Kubernetes cluster, circumventing the single-zone restrictions.
Why: Recognizing the pivotal role of shared volumes in the client's application operations, we were committed to ensuring that the NFS integration was both efficient and congruent with their HA requirements.

Routing SIP and RTP traffic

Challenge: SIP and RTP protocols were pivotal for the client's communications. IBM's architecture doesn't provide public IPs directly to Kubernetes instances, and their LoadBalancer solution didn’t quite match the client's requirements. This posed a challenge in routing public SIP and RTP traffic into the Kubernetes cluster.
What We Did: As per the client's requirements and the IBM IKS design, we had to host an IBM Cloud VM for the VPC outside of the Kubernetes cluster to handle the SIP and RTP requests. The traffic was routed from the Proxy SIP/RTP VM to the Kubernetes cluster. This was broken down as follows:

External IBM Instance Configuration: To ensure optimal handling of the SIP and RTP traffic

Instance Selection: Outside of K8s (due to the constraints), we chose a high-performance IBM instance type, taking into consideration CPU, memory, and network performance, optimizing for low latency and high throughput required by real-time protocols.
Deployment: The selected instance was deployed next to the Kubernetes cluster using Terraform.
OS Tuning: After installing a Linux distribution, network parameters like net.core.rmem_max, net.core.wmem_max, and fs.file-max were fine-tuned to enhance the performance for high-volume SIP and RTP traffic.
Security Measures: We implemented strict firewall rules using iptables to allow only SIP and RTP related traffic and deny any potential threats.

Traffic Capture and Redirection

Traffic Sniffing: Tools like tcpdump and ngrep were set up to monitor and troubleshoot traffic, ensuring that all packets were correctly captured and relayed.
SIP/RTP Proxy Tools: We employed openSIPS for SIP routing, which was optimized for handling massive simultaneous connections. Coupled with RTPEngine for RTP relay, they acted as the core components to route traffic between the external instance and Kubernetes pods.
Load Balancing: For redundancy and scalability, we implemented an IBM load balancer (similar to HAProxy), ensuring traffic distribution was even and failovers were handled efficiently.

Integration with Kubernetes

Endpoint Discovery: We leveraged Kubernetes services and endpoints API to dynamically discover the IP addresses of the SIP/RTP pods, ensuring traffic redirection was always up-to-date.
Network Policies: To ensure that only our external instance could communicate with the SIP/RTP pods, specific Kubernetes network policies were designed and applied, providing an additional layer of security.

What We Did: As per the client's requirements and the IBM IKS design, we had to host an IBM Cloud VM for the VPC outside of the Kubernetes cluster to handle the SIP and RTP requests. The traffic was routed from the Proxy SIP/RTP VM to the Kubernetes cluster. This was broken down as follows:

Introducing Self-service CI

Challenge: The client's aspiration to modernize their CI process met with the hard reality of their legacy-based application design. The monolithic nature of the applications and the necessity for persistent volumes made it challenging to find a readily available solution for managing multiple QA environments. All the applications in the environment were tied to each other, as the applications relied on persistent volumes to process audio recordings and load batches of phone numbers. Deploying all the applications would require setting up new databases, with static configurations inside of them, new volumes, and the SIP/RTP proxy solution, which was considered to be too expensive to test small changes in a single application.
What We Did:There are certainly off-the-shelf tools that would do the trick here. However, due to business constraints, employing a tool such as Backstage did not fit their needs since the learning curve as well as the resources required to adopt a new development model would just be too onerous. Instead, we crafted bespoke CI scripts; this approach allowed the client to instantiate new environments based on distinct branch names, offering flexibility while acknowledging the legacy structures.

Once tasks like testing the application, building the artifacts, and scanning the code to ensure code quality were complete, the deployment step would kick in to check what was changed and how to deploy it to the Kubernetes cluster using the current volumes and SIP resources (as most of the application's state lives in static volumes).
Why: Modernization was on the horizon, and we aimed to facilitate that journey. Recognizing the importance of persistent volumes in their operations and the uniqueness of their application setup, we provided a CI process tailored to their specific needs.

This deep dive into our approach illustrates how we combined innovation with the client's distinctive requirements, bridging the gap between legacy systems and modern solutions.

Results

The efforts bore fruit:

The client's primary goal, a functional QA environment, was successfully achieved. While it entailed multiple adaptations and custom solutions, the final result was an environment that aligned with their requirements.
Key functionalities, like the SIP/RTP communication features, were seamlessly integrated and operational.

“The technical expertise of the team is impressive. Their DevOps specialists are not only highly skilled, but they also demonstrated a clear understanding of our specific needs. This facilitated constructive technical discussions, ultimately leading to significant improvements in our operations.”

— Deputy CTO

Recommendations For The Future

For the client's continued journey, we believe that they would benefit from:

A gradual evolution of their infrastructure and application designs to match the demands of modern, sophisticated environments.

This story showcases a classic journey of transformation, where collaboration, expertise, and a deep understanding of the client's needs paved the way for success.

Alternative Approaches

The journey with our client brought about pivotal learnings, and while we crafted solutions tailored to their current circumstances, we envision a few alternative approaches in a world without constraints:

Cloud-Agnostic Infrastructure
Without vendor-specific limitations, we'd recommend leveraging a cloud-agnostic solution like Terraform for infrastructure automation. This would allow the client to deploy infrastructure across multiple cloud providers, providing flexibility and avoiding vendor lock-in.
Stateless Application Design
Ideally, applications would be designed as stateless, allowing them to scale horizontally without dependency on persistent volumes. This not only improves resilience and scalability but also simplifies disaster recovery and backup processes.
Native Managed Services
For the handling of SIP and RTP traffic, we'd have leveraged native cloud solutions. For instance, on AWS, we could employ AWS Global Accelerator to handle the traffic and ensure low-latency and high-availability without the need for an external instance.
Advanced CI/CD Workflows
With no application constraints, advanced CI/CD tools like ArgoCD or Jenkins X could be adopted. They would enable GitOps practices, ensuring consistent and auditable deployments across environments.
Microservices and Service Mesh
Refactoring the application into microservices could improve scalability, resilience, and maintainability. Paired with a service mesh like Istio, we would achieve enhanced traffic management, security, and observability.
Database Decoupling
We'd recommend moving away from file system-based databases like NFS to managed relational or NoSQL databases. This would enhance performance, availability, and enable advanced features like automatic backups, replication, and data sharding.
Continuous Learning and Upgrading
Adopting a culture of continuous learning ensures the client's team stays updated with industry best practices and emerging technologies. Periodic training sessions, workshops, and attending industry conferences could be part of this strategy.

In a future without constraints, these recommendations would form the cornerstone of a robust, scalable, and efficient application environment. While the current situation demands adaptive solutions, our vision for the client is one of continuous evolution toward best practices and technological excellence.

WORK WITH US

Introduction

Background

Challenges

Approach

IBM FS for VPC Integration

Routing SIP and RTP traffic

Introducing Self-service CI

Results

Recommendations For The Future

Alternative Approaches

Cloud-Agnostic Infrastructure

Stateless Application Design

Native Managed Services

Advanced CI/CD Workflows

Microservices and Service Mesh

Database Decoupling

Continuous Learning and Upgrading

stack.io

We are the Ops Side of DevOps…