Welcome!

Linux Containers Authors: Liz McMillan, James Carlini, Elizabeth White, Vaibhaw Pandey, Pat Romanski

Related Topics: @DevOpsSummit, Linux Containers, Containers Expo Blog

@DevOpsSummit: Blog Feed Post

Metrics and KPIs for Test Environment Stability | @DevOpsSummit #DevOps #APM #Monitoring

How often is an environment unavailable due to factors within your project’s control?

How often is an environment unavailable due to factors within your project’s control? How often is an environment unavailable due to external factors? Are the software and hardware in an environment up to date with the target Production systems? How often do you have to resort to manual workarounds due to an environment?

Metric: Availability and Uptime Percentage
QA and Staging environments seldom require the same level of uptime as Production, but tell that to a team of developers working 24/7 on a project that has an aggressive deadline. As a Test Environment Manager, you know that when a QA system is unavailable, you will get immediate calls from developers and managers.

As a Test Environment Manager, you will also want to understand the root cause of every outage. If you follow a problem management process for Production outages, you should follow a similar process with test environment management. Understanding why an outage happened is critical for communicating with a development team. Very often a QA environment will become unavailable due to a factor far outside the control of a Test Environment Manager. If one team pushes bad code that interrupts the QA process for all teams you need to be able to identify this clearly.

How to Measure Availability and Uptime?
Keep track of system availability with a standard monitoring tool such as Zabbix or Nagios. If your systems are visible to the public internet, you can also use hosted platforms like Pingdom to measure system availability.

Example Metric: Goal for Availability
An uptime of 95% is usually sufficient for a QA or Staging environment.
If your development is limited to a few time zones, you can also further qualify this by only measuring availability during development hours. While your Production availability commitment is often higher that 99% or 99.5%, you don’t have to treat every QA outage as an emergency. But, your developers may have other opinions—95% uptime still allows for eight hours of downtime a week. You may want to aim higher.

How Does This Metric Motivate Concrete Action?
When you measure system availability and make these numbers public, you encourage Test Environment Managers to make a commitment to uptime. This results in fewer obstacles for QA and development, allowing them to deliver software faster. There’s nothing more debilitating to an organization than disruptions in QA and testing. Measuring this metric allows you to encourage movement toward always-available QA systems.

Metric: Mean Time Between Outages
If your system has a 95% availability, then almost seventy-five minutes of downtime is acceptable every day. If your system fails for ten minutes every hour during an eight-hour work day due to a build or deployment, you’ll be creating a QA or Staging environment that has a 5% chance of losing developer and QA confidence. To get an accurate picture of system availability you need to couple an availability percentage metric with your mean time between outages (MTBO).

How to Measure MTBO
If you follow a process that keeps track of outages and strives to understand the root causes of these outages, you’ll develop a database of issues that you can use to derive your MTBO. If you have a monitoring system configured to calculate availability percentages automatically, you can use this same system to record your MTBO.

Example Metric: Goal for MTBO
This depends on your availability goal. The lower your availability goal, the higher your MTBO should be. For example, if you have a 95% uptime commitment then your outages need to be spaced over a day or a week. You might have eight hours of downtime each weekend to perform system upgrades or a nightly build and deploy process that takes about an hour, but what you can’t have is an MTBO of 45–60 minutes. This will mean that QA and Staging systems will be unavailable for a few minutes every hour, which will result in dissatisfied customers.

How does this Metric Motivate Concrete Action?
If your MBTO is very short, this suggests that build and deploy activity from a continuous integration environment is frequently interrupting both Development and QA. If your MBTO is very high, but your availability is very low (95% or lower) this means that you are experiencing multi-hour downtime at least once a day. When you measure MBTO, you encourage your Release Engineers and Test Environment Managers to work together to create build and deployment scripts that don’t affect availability, and you encourage your staff to approach QA and Staging uptime with care. Without this metric, you run the risk of having teams grow complacent with frequent, low-level unavailability as long as they satisfy overall availability metrics.

Metric: Downtime Requirement for a Test Environment Build and Deploy
When software is deployed to any system, there is a natural tendency for disruption
. If new code is being deployed to an application server that server often requires a restart so that new code can be loaded. If a web server such as Apache or Nginx is being reconfigured this often requires a fast restart measured in seconds.

Some of these build and deploy related disruptions can be avoided through the use of load balancers and clusters of machines. On the largest projects, this is essential in both Production as well as Staging and QA systems. An example is a QA system for a large bank’s transaction processing system. There are so many teams that depend on this system to be up and running 24/7 that causing any disruption would run the risk of freezing the QA process across the entire company.

Other build and deploy downtimes are unavoidable. A frequent example is changes to a database schema. Certain changes to tables and indexes require systems to be stopped and rebooted to reach a state where database activity isn’t competing with DDL statements.

The downtime requirement for a given build and deploy to a test environment is a central measure that is directly related to the availability metrics mentioned before in this section.

How to Measure Build/Deploy Downtime
It’s simple: run a build and deployment and keep track of the downtime that falls into the timespan of each build and deploy function. If you have a continuous integration system such as Jenkins or Bamboo, grab the timestamps of the last few builds and look at your monitoring metrics on QA and Staging to see if there is a system impact.

Example Metric: Goal for Build/Deploy Downtime
Your goal for this metric depends on your level of availability
. If you are working on a shared service, your build and deploy downtime requirement should be as close to zero as possible. If you are working on a less critical application, then your build and deploy downtime should be measured in minutes or seconds.

How does this Metric Motivate Concrete Action?
This metric encourages your Release Engineers and Test Environment Managers to drive build and deploy downtime to zero. With the tools available to developers and DevOps professionals it is possible to achieve zero-downtime deployments to QA and Staging systems. Doing this will give your internal customers more confidence in the systems you are delivering.

The post Metrics and KPIs for Test Environment Stability appeared first on Plutora.

Read the original blog entry...

More Stories By Plutora Blog

Plutora provides Enterprise Release and Test Environment Management SaaS solutions aligning process, technology, and information to solve release orchestration challenges for the enterprise.

Plutora’s SaaS solution enables organizations to model release management and test environment management activities as a bridge between agile project teams and an enterprise’s ITSM initiatives. Using Plutora, you can orchestrate parallel releases from several independent DevOps groups all while giving your executives as well as change management specialists insight into overall risk.

Supporting the largest releases for the largest organizations throughout North America, EMEA, and Asia Pacific, Plutora provides proof that large companies can adopt DevOps while managing the risks that come with wider adoption of self-service and agile software development in the enterprise. Aligning process, technology, and information to solve increasingly complex release orchestration challenges, this Gartner “Cool Vendor in IT DevOps” upgrades the enterprise release management from spreadsheets, meetings, and email to an integrated dashboard giving release managers insight and control over large software releases.

@ThingsExpo Stories
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics gr...