Application Modernization and Site Reliability Engineering (SRE)
We often get into conversations with our clients and partners on application modernization in totality. In the context of application modernization, application resiliency, design simplification, and operational excellence are very common asks from our clients. Site Reliability Engineering (SRE) is also getting traction among CIOs and eventually it would become a defacto organization operational practice. I thought of sharing a simple perspective application modernization, site reliability engineering and correlation among them.
Application modernization is primarily around:
1. modernizing application technology i.e., operating platform (OS, container, VM etc.), application middleware (MW) technologies (web/app runtime like .Net framework to .Net core, Jboss, Weblogic, WebSphere traditional to WAS liberty, Open liberty, container and others), and back-end data modernization (database technology transformation, distributed database to Datalake, DBMS pattern modernization like SQL to NoSQL, NewSQL etc.)
2. process modernization i.e., application process simplification, transforming tightly coupled process, system and data by adopting various modular design pattern and build intelligent workflow pattern to bring automation among various processes.
3. platform modernization for target application deployment such as but not limited to deploy pattern from tradition OS/MW to container, cloud foundries to Kubernetes to OpenShift, moving to enterprise level Linux flavors; Red Hat Linux by exploiting various RHEL runtime capabilities like Quarkus, Vert.x, Thorntail, Node.js, Spring Boot.
4. application rewrite either in parts or selective modules to cloud native by simplifying design leveraging microservice architecture, modernizing application modules from traditional application pattern to lightweight pattern e.g., spring patterns(MVC, AOP, security, JDBC etc.) to spring boot etc. Write some of the application functions into cloud native, leveraging Open Source cloud native projects, adopting cloud native java platform like Quarkus, a recent article published by Ram Ravishankar.
Now let’s spend a minute on Site Reliability Engineering (SRE) before understanding the correlation.
SRE is a software-engineering practice, and its primary goal is to improve the reliability of software services. This becomes even more important now than ever due to COVID-19, where organizations felt the need of increasing application resiliency due to a massive rise of their digital footprint of B2B, B2C, B2E applications. SRE was developed by Google, SRE is now an industry standard, practiced in companies of all sizes. It’s not something new, you can refer to cloud approach to operation blog. Some of the key benefits of SRE are shown in the diagram below:
Basic principles of SRE are:
- Higher Uptime and availability — leveraging technology, establish a high degree of collaboration between engineers, clients, and product owners, site reliability engineers define and protect uptime and availability targets.
- A framework for reliability measurement — here engineers define service-level indicators (SLIs) and service level objectives (SLOs) to maintain consistent operations. Defining proper SLIs that directly impact the goals of the service is an important engineering task.
- High degree of automation — here key practices is eliminating heavy-duty tasks related to production operations, particularly maintenance tasks. This frees up engineers’ time on troubleshooting the systems and they can concentrate more on refining platform architecture and operations.
- End-2-end understanding — here reliability engineers are trained to develop a holistic understanding of operations and its operational components, and systems and how these all get stitched together.
- Early issue detection — here the concept of fail first approach in the context of operation implies and site engineers are continuously searching for limitations in operations and ways to improve configurations. Identify the potential problems earlier and address those issues before they impact your systems. Proactive measurement is key compared to reactive actions.
Recently, I read an article published by KEET MALIN SUGATHADASA and he explained very nicely in this below diagram.
The system which Site Engineers are maintaining might have agreed to comply to industry standards like ISO 27001 security standards and ISO 9001 quality standards. In this above case, there should be a way to monitor whether the system is inline with this standard or whether it is declining in them, over time. More detail can be found in his article here.
The center of Application Modernization and SRE is DevSecOps as shown in the diagram 3 below. The simple way to illustrate that application modernization is a need, DevSecOps is a principle while SRE is the implementation.
More details on DevOps can be explored with Andrea Crawford, IBM Distinguished Engineer, Cloud Garage Solution Engineering, DevOps. As per Andrea “DevOps is all about striking a balance between velocity and quality, which is key. Velocity without quality destroys digital reputations. Quality without velocity kills digital agility and responsiveness”. You can watch her here https://youtu.be/UbtB4sMaaNM.
I have further illustrated below with some of the key guiding principles in the context of application modernization.
The commonalities between DevSecOps and SRE are automation, scale, optimize capacity and systematic perspective. A large-scale modernization program also leverages similar principles among many others as I have depicted in the above diagram.
A detail on the relationship of SRE and DevOps (now DevSecOps) is explained below:
DevOps principles: Reduce organizational silos, leverage tooling and automation ->> SRE practice: Use the same tooling to automate and improve operations as developers use to develop and improve software
DevOps principles: Accept failure as normal, implement gradual changes ->> SRE practice: Use error budgets to continually deploy new features and functionality within acceptable levels
DevOps principle: Measure everything ->> SRE practice: Base decisions to release new software on SLA metrics
More detail can be found in IBM Cloud Learn Hub.
Summarizing all these in my buffet platter below, it shows that application modernization is needed to meet the business objective of our clients while application development is a means to support the application modernization. Operational excellence goal of platform modernization should be adopted by DevSecOps principles by practicing SRE as one of the key pathways.
Application modernization powered by various cloud capabilities is an essential strategy for businesses growth as they look to increase agility, experiential learning, accelerate innovation, and reduce TCO along with unwanted technical debt. It’s also important to develop new applications into strategic platform than traditional by cross-skilling and upskilling organization talent. DevSecOps and SRE as combo are an integral part of application modernization strategy. I hope this article gave you an understanding of what SRE is, what SRE’s basic principles are, how you can correlate in the context of application modernization in your organization.
IBM approach of application modernization — https://www.ibm.com/cloud/application-modernization
Google research on SRE Principle — https://research.google/pubs/pub46904
IBM Cloud Approach to Operation — https://www.ibm.com/cloud/blog/site-reliability-engineering-cloud-approach-operations
Good learning on SRE principle: https://blog.goodelearning.com/subject-areas/devops/what-are-the-most-significant-benefits-of-search-reliability-engineering-sre
IBM SRE Best practice: https://www.ibm.com/cloud/learn/site-reliability-engineering
Site reliability article by Keet — https://keetmalin.wixsite.com/keetmalin/post/what-is-site-reliability-engineering-sre