Welcome!

Cloud Sherpas: Your expert guide to Google's cloud

Cloud Sherpas

Subscribe to Cloud Sherpas: eMailAlertsEmail Alerts
Get Cloud Sherpas via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Cloud Computing, Change Leadership Journal, Government Cloud Computing, Java in the Cloud

Blog Feed Post

Lessons Learned from Healthcare.gov IT Service Problems

The discussion has now moved to software issues that may be affecting the website's ability to operate properly

Disclaimer: This post does not endorse or criticize healthcare reform in any way. It is not meant to be political in nature. Rather, it is solely a high-level analysis of the IT issues that affected the release of the healthcare exchange website.

On October 1, healthcare.gov, the healthcare exchange website authorized under the United States' new healthcare reforms, launched to the US population. The site has since received a reported 20 million hits, though it has not performed as expected due to a series of IT issues.

Although this poor performance was first attributed to high levels of traffic, the discussion has now moved to software issues that may be affecting the website's ability to operate properly. To IT service experts, the issues - and possible solutions - are fairly evident. Furthermore, the site's IT service problems reveal important lessons for anyone launching new IT services or functionality.

We'll use the ITIL (Information Technology Infrastructure Library) as a common reference point. Specifically we'll focus in on four ITIL areas and how (when working completely and in a systematic way) they could have helped avoid the issues, accelerate remediation and provide better production controls going forward.

1) Service Design
The service design portion of ITIL offers best practices on designing IT services and processes that focus not only on the technology, but also on the services delivered by this technology. Specifically, service design focuses on how service solutions interact with the technical environment, site architecture and supply chain. The areas of service design where healthcare.gov experienced issues include:

  • Service-level management, which works to make sure that the service-level agreements (SLA's) are met

  • Availability management, which focuses on the ability of the site to perform at the agreed service levels

  • Capacity management, which deals with strategic capacity issues such as system and component capacity

  • IT service continuity management, which reviews the planning and management of IT services to ensure that they can recover and continue operating after serious incidents

Where did healthcare.gov go wrong? Based on the widespread issues, it is clear that healthcare.gov has been unable to meet its SLA's. Initially the Department of Health and Human Services claimed that the issues were due to high levels of traffic and, if traffic did indeed play a role, then incomplete capacity management is also a major contributor to the problem. Finally, the fact that these issues are ongoing reveals that the site's IT service continuity management also needs to be improved.

2) Service Transition
Service transition deals with how the actual services are delivered during live use of the website. Healthcare.gov experienced service transition issues around change management and release and deployment management. Change management works to ensure that standard methods are in place to handle system changes so that any changes that do occur pose minimal disruption to services. Release and deployment management focuses on quality control during the development and implementation of the website to guarantee that it meets the demands that will be expected.

Where did healthcare.gov go wrong? The fact that IT issues initially occurred altogether and that issues have continued weeks after its launch reveals that complete change management and release and deployment management practices are not in place. Further, if these processes were operating completely and in an end-to-end manner, it is very likely that underlying issues would have surfaced and been corrected with greater control prior to the site's nationwide launch.

3) Service Operation
Service Operation works to deliver the agreed level of service to the website hosts and its end users. This portion of ITIL covers the actual delivery of services and monitors any problems with service reliability. The healthcare exchange experienced issues with application management, specifically, which delivers best practices around enhancing the quality of IT software development and support throughout the project lifecycle. Additional problems centered on problem management and root cause analysis. Problem management works to correct the root causes of incidents to minimize future occurrences while root cause analysis is the formal problem identification and solving process that takes place during problem management.

Where did healthcare.gov going wrong? As far as service operation, improved problem management and root cause analysis would enable the IT team to pinpoint the root causes of the site's issues and implement solutions to prevent their recurrence. Using formal processes for Service Operation and relying on service management systems would improve the ability to correlate these factors quickly and ensure even more rapid problem identification, isolation and correction.

4) Continual Service Improvement
As the name implies, continual service improvement (CSI) is responsible for the continued level of service delivered by the website. CSI ensures that IT services are in line with current and any changing needs and identifies possible areas for improvement.

Where will healthcare.gov need to focus? While rigorous problem identification and management of the issues occurring in real-time is needed reactively, it will be just as important that the ongoing service management system include a framework for continual alignment of service performance expectations and agreements with those of actual service performance. Both reactive and leading indicators of the system's performance should be established and monitored wherever possible to ensure the system is performing at a production level expectation.

The post Lessons Learned From Healthcare.gov IT Service Problems appeared first on Cloud Sherpas.

Read the original blog entry...

More Stories By Cloud Sherpas

Cloud Sherpas [www.cloudsherpas.com] is a leading Google Apps Reseller, systems integrator and application developer. Our Google Apps Certified Deployment Specialists have migrated tens of thousands of users from legacy, on-premise messaging systems to Google Apps and Google App Engine. We help organizations adopt cloud computing to innovate and dramatically reduce their IT expenses. SherpaTools for Google Apps [www.sherpatools.com] is a free app from Cloud Sherpas that enhances the functionality and ease-of-use of Google Apps for both administrators and end-users.