Linux Containers Authors: Liz McMillan, Pat Romanski, Flint Brenton, Elizabeth White, Yeshim Deniz

Related Topics: @DevOpsSummit, Linux Containers, Containers Expo Blog

@DevOpsSummit: Blog Feed Post

Metrics and KPIs for Test Environment Stability | @DevOpsSummit #DevOps #APM #Monitoring

How often is an environment unavailable due to factors within your project’s control?

How often is an environment unavailable due to factors within your project’s control? How often is an environment unavailable due to external factors? Are the software and hardware in an environment up to date with the target Production systems? How often do you have to resort to manual workarounds due to an environment?

Metric: Availability and Uptime Percentage
QA and Staging environments seldom require the same level of uptime as Production, but tell that to a team of developers working 24/7 on a project that has an aggressive deadline. As a Test Environment Manager, you know that when a QA system is unavailable, you will get immediate calls from developers and managers.

As a Test Environment Manager, you will also want to understand the root cause of every outage. If you follow a problem management process for Production outages, you should follow a similar process with test environment management. Understanding why an outage happened is critical for communicating with a development team. Very often a QA environment will become unavailable due to a factor far outside the control of a Test Environment Manager. If one team pushes bad code that interrupts the QA process for all teams you need to be able to identify this clearly.

How to Measure Availability and Uptime?
Keep track of system availability with a standard monitoring tool such as Zabbix or Nagios. If your systems are visible to the public internet, you can also use hosted platforms like Pingdom to measure system availability.

Example Metric: Goal for Availability
An uptime of 95% is usually sufficient for a QA or Staging environment.
If your development is limited to a few time zones, you can also further qualify this by only measuring availability during development hours. While your Production availability commitment is often higher that 99% or 99.5%, you don’t have to treat every QA outage as an emergency. But, your developers may have other opinions—95% uptime still allows for eight hours of downtime a week. You may want to aim higher.

How Does This Metric Motivate Concrete Action?
When you measure system availability and make these numbers public, you encourage Test Environment Managers to make a commitment to uptime. This results in fewer obstacles for QA and development, allowing them to deliver software faster. There’s nothing more debilitating to an organization than disruptions in QA and testing. Measuring this metric allows you to encourage movement toward always-available QA systems.

Metric: Mean Time Between Outages
If your system has a 95% availability, then almost seventy-five minutes of downtime is acceptable every day. If your system fails for ten minutes every hour during an eight-hour work day due to a build or deployment, you’ll be creating a QA or Staging environment that has a 5% chance of losing developer and QA confidence. To get an accurate picture of system availability you need to couple an availability percentage metric with your mean time between outages (MTBO).

How to Measure MTBO
If you follow a process that keeps track of outages and strives to understand the root causes of these outages, you’ll develop a database of issues that you can use to derive your MTBO. If you have a monitoring system configured to calculate availability percentages automatically, you can use this same system to record your MTBO.

Example Metric: Goal for MTBO
This depends on your availability goal. The lower your availability goal, the higher your MTBO should be. For example, if you have a 95% uptime commitment then your outages need to be spaced over a day or a week. You might have eight hours of downtime each weekend to perform system upgrades or a nightly build and deploy process that takes about an hour, but what you can’t have is an MTBO of 45–60 minutes. This will mean that QA and Staging systems will be unavailable for a few minutes every hour, which will result in dissatisfied customers.

How does this Metric Motivate Concrete Action?
If your MBTO is very short, this suggests that build and deploy activity from a continuous integration environment is frequently interrupting both Development and QA. If your MBTO is very high, but your availability is very low (95% or lower) this means that you are experiencing multi-hour downtime at least once a day. When you measure MBTO, you encourage your Release Engineers and Test Environment Managers to work together to create build and deployment scripts that don’t affect availability, and you encourage your staff to approach QA and Staging uptime with care. Without this metric, you run the risk of having teams grow complacent with frequent, low-level unavailability as long as they satisfy overall availability metrics.

Metric: Downtime Requirement for a Test Environment Build and Deploy
When software is deployed to any system, there is a natural tendency for disruption
. If new code is being deployed to an application server that server often requires a restart so that new code can be loaded. If a web server such as Apache or Nginx is being reconfigured this often requires a fast restart measured in seconds.

Some of these build and deploy related disruptions can be avoided through the use of load balancers and clusters of machines. On the largest projects, this is essential in both Production as well as Staging and QA systems. An example is a QA system for a large bank’s transaction processing system. There are so many teams that depend on this system to be up and running 24/7 that causing any disruption would run the risk of freezing the QA process across the entire company.

Other build and deploy downtimes are unavoidable. A frequent example is changes to a database schema. Certain changes to tables and indexes require systems to be stopped and rebooted to reach a state where database activity isn’t competing with DDL statements.

The downtime requirement for a given build and deploy to a test environment is a central measure that is directly related to the availability metrics mentioned before in this section.

How to Measure Build/Deploy Downtime
It’s simple: run a build and deployment and keep track of the downtime that falls into the timespan of each build and deploy function. If you have a continuous integration system such as Jenkins or Bamboo, grab the timestamps of the last few builds and look at your monitoring metrics on QA and Staging to see if there is a system impact.

Example Metric: Goal for Build/Deploy Downtime
Your goal for this metric depends on your level of availability
. If you are working on a shared service, your build and deploy downtime requirement should be as close to zero as possible. If you are working on a less critical application, then your build and deploy downtime should be measured in minutes or seconds.

How does this Metric Motivate Concrete Action?
This metric encourages your Release Engineers and Test Environment Managers to drive build and deploy downtime to zero. With the tools available to developers and DevOps professionals it is possible to achieve zero-downtime deployments to QA and Staging systems. Doing this will give your internal customers more confidence in the systems you are delivering.

The post Metrics and KPIs for Test Environment Stability appeared first on Plutora.

Read the original blog entry...

More Stories By Plutora Blog

Plutora provides Enterprise Release and Test Environment Management SaaS solutions aligning process, technology, and information to solve release orchestration challenges for the enterprise.

Plutora’s SaaS solution enables organizations to model release management and test environment management activities as a bridge between agile project teams and an enterprise’s ITSM initiatives. Using Plutora, you can orchestrate parallel releases from several independent DevOps groups all while giving your executives as well as change management specialists insight into overall risk.

Supporting the largest releases for the largest organizations throughout North America, EMEA, and Asia Pacific, Plutora provides proof that large companies can adopt DevOps while managing the risks that come with wider adoption of self-service and agile software development in the enterprise. Aligning process, technology, and information to solve increasingly complex release orchestration challenges, this Gartner “Cool Vendor in IT DevOps” upgrades the enterprise release management from spreadsheets, meetings, and email to an integrated dashboard giving release managers insight and control over large software releases.

@ThingsExpo Stories
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
As IoT continues to increase momentum, so does the associated risk. Secure Device Lifecycle Management (DLM) is ranked as one of the most important technology areas of IoT. Driving this trend is the realization that secure support for IoT devices provides companies the ability to deliver high-quality, reliable, secure offerings faster, create new revenue streams, and reduce support costs, all while building a competitive advantage in their markets. In this session, we will use customer use cases...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO LLC announced today that "Miami Blockchain Event by FinTechEXPO" has announced that its Call for Papers is now open. The two-day event will present 20 top Blockchain experts. All speaking inquiries which covers the following information can be submitted by email to [email protected] Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expe...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that ICOHOLDER named "Media Sponsor" of Miami Blockchain Event by FinTechEXPO. ICOHOLDER give you detailed information and help the community to invest in the trusty projects. Miami Blockchain Event by FinTechEXPO has opened its Call for Papers. The two-day event will present 20 top Blockchain experts. All speaking inquiries which covers the following information can be submitted by email to [email protected] Miami Blockchain Event by FinTechEXPO also offers s...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...