Center for Government Interoperability

Blueprint For Better Government

http://gov-ideas.com/

Center for Government Interoperability

Risk Management

Discussion Ideas for Mitigating Risk During Software and Hardware Updates

Introduction

There are many opportunities to minimize risk in software implementation and hardware installation. The role of risk mitigation becomes more important the larger the project is, or the more impact it has. This paper explores methods to minimize risk and recommends establishing a Risk Management Policy.

Scope

Risk herein pertains to risk during software migration to production or hardware installation. Related methodologies are PMBOK (which defines traditional risk management) and software development methodology, SDLC (such as Agile, Waterfall, Spiral, etc.). This paper addresses specialized migration planning concerns beyond PMBOK and SDLC and does not supersede them.

Stakeholder list and policy impact

Stakeholders: IT, PMO, business clients, change control board, ITIL

Impact to current policy:

Change Control Board
Project Management planning guide
SDLC

Methodology

The general methodology involves configuring project time schedules to separate out project components in a way that optimizes vetting in a stepwise manner.

Example: A component is being added to the mainframe that consists of a new table and new programming code. Instead of moving them to production at the same time for client use after testing in the test environment, move the table to production first and test it alone for a day or less without the programming code (test with ad hoc queries and joins that only programmers can see). This may unearth a host of potential problems such as permissions authorization, size, data-type constraints, compatibility with the OS, etc., but will not interfere with clients using the rest of the system. All of the above problems could have appeared on production-implementation day, throwing the implementation schedule off and causing clients to wait in standby mode during their scheduled testing time. With this methodology, these problems are already out of the way before implementation day.

The idea is to break a project down to the highest number of components that have vetting value and test them early during non-critical times, including migrations to production, to reduce risk as much as possible. Analysis is required to:

Determine how a component can be broken down into smaller components to reveal risk reduction opportunities. Opportunities include how a component can be designed to be independently, then collectively tested during non-critical times.
Identify stakeholders such as downstream programmers to see if implementation design can be optimized for them.
Plan in what order component testing and implementation should be scheduled. Determine if early testing will reveal opportunities for the rest of the implementation plans. Look for ways to bring clients into the testing process as early as possible.
Determine how far into the production environment can the component be integrated and tested without incurring risk. The further the integration into production, the better.
Design implementation so that each change can be backed out to the original system for all project teams at any stage.
Design testing so that downstream programmers receive daily, real and full production data during early project stages.

What is new in this methodology:

Treatment of the production environment as an extension of the testing environment.
Identifying migration of groups of components to production concurrently as a risk
Programmer teams have a policy mechanism for collaborating with other programmer teams
A formal risk management document added to programmer teams' SDLC
Project Management Office has a formal risk management document added to its operation procedures
Migration and installation planning begins during project conception so that stakeholders can become involved as early as possible.

Treatment of the production environment as an extension of the testing environment is important because production may have patches, data, security and other things in its environment that the test environment doesn't, that can affect the success of migrated components that worked correctly in the test environment.

Early stakeholder identification is important. For example, suppose Team A plans a major change that affects Team B. If Team B is brought in early in the conception stage, they may be able to recommend a project design that avoids funneling all change implementation into one instant switchover from the old to the new system. Otherwise, Team B loses control of their time schedule and is compelled to migrate everything into production at once with a very small tolerance window for errors. The goal is to avoid introducing unnecessary errors and staff stress by intelligently configuring implementation time lines. This will produce a much wider tolerance window. Another example, if a field length is increased for Team A, Team B may be able to increase their field length early and fully test in the test environment, then production environment, because additional field length testing doesn't require that actual Team A data be in production yet. This can remove substantial pressure off of Team B because the amount of time they have to implement has been stretched out by good design planning early on.

Below is an example of identifying the Legal Department as a stakeholder and bringing them into implementation planning early.

Example 3: Assume that two regulatory agencies, Board A and Board B, will be merged into one agency. Here is a broad outline of possible risk management steps (besides standard PMBOK):

Harmonize data
Centralize and consolidate data
Switch-enable programming code

Harmonization could include:

Legal Department reviewing laws, terms, and definitions that the agencies share that might have conflicting meanings needing resolution. E.g., (a) The same phrase might have a different meaning for each agency or (b) each agency could have different legal terms for the same legal meaning. These must be worked out before any coding begins.
Status code harmonization to make sure that status codes of each agency represent the same thing.

Centralizing data could include: Replacing both agencies' tables with a single centralized table before programming code is changed so that problems are isolated to data, identified early, and not combined with other programming problems on implementation day.

Switch-enabling programming code could include: Writing programming code in a way that allows concurrent implementation of both new and old systems in production. The choice of which one runs, or if both run simultaneously, can be determined by a conditional, programmatic switch based on a "yes", "no" or "both" value in a data field. If, for example, there are many users, this provides a good back-out method to revert to the old system with minimal down time.

In the above example, component vetting in production has been spread out to non-critical times. Had everything been migrated into production at one time, many problems occurring at the same time would make implementation much more complicated.

The above is only a general description. The data centralization step might be omitted so that the entire implementation would involve concurrent systems. Case by case analysis is required to determine variables such as: "Is redundant data entry required to implement two concurrent systems?"

Risk Management Policy

To bring about implementation-design collaboration of different teams, it is recommended that a Risk Management Policy be established.
The policy would be a simple memorandum requiring that project teams meet with appropriate stakeholders, including other teams, to plan risk management using the checklist in Appendix A as a guide. Policy compliance would be established by:

Adding the methodology to programmers' SDLC
Adding the methodology to Project Management procedures
Adding the methodology to the Change Control Procedures.

Implementation plan

The risk management policy and methodology themselves should be implemented in a stepwise manner with prototype testing before mandatory use.

Performance Measurement and Process Improvement

Each risk management effort would be measured by a follow-up survey of stakeholders affected by the methodology (Appendix B). Use history and recommendations to improve the methodology or policy would be recorded. Recommendations would be logged onto the risk management forum.

Responsible parties

PMO: Improve risk management methodology and policy.
PMO: Prototype testing and moderating risk management forum discussions.
CIO: Resolves policy disagreements.
Project managers: Risk management policy compliance.
Quality Management Team: Conducts follow-up surveys (Appendix B).

Future Expansion And Enterprise Architecture Considerations

Analysis after implementation would indicate if there is value to further expansion of risk management policy. The policy is currently written with the narrowest scope, but the idea can appllied across many business processes.

1. The project scope can remain narrow, limited to software and hardware installation.
2. Risk management policy can be expanded to a broader policy overlapping PMBOK and SDLC areas.
3. The policy can be expanded to all areas because risk management methods can benefit many other areas and disciplines.
4. A Risk Management Team can be established to enhance all business units. The team can be added to the enterprise architecture process. In other words, the Risk Management Team would have an enterprise role in the same way that a data architect has.

Conclusion

The goal of the methodology is to implement change in a way that no longer becomes an "all or nothing" panic-filled project requiring hundreds of components and variables to work in perfection at implementation time. The premise is that there are many hidden opportunities to reduce risk that are revealed when this methodology is used.

Except for early stakeholder notification, these recommendations should be tailored to the individual project without rigid compliance when common sense dictates that low return on investment would result.

Designing for implementation will bring benefits of less downtime, greater client satisfaction and will simplify troubleshooting on complex hardware installations and software updates. Much of this can be achieved by simply rearranging the order that components are introduced into production and tested.

As an additional benefit, over time, each team could establish a body of best practices, some of which would result from use of these guidelines. Much of this methodology can apply to any business undertaking and should be freely borrowed.

Appendix A

Risk Management Checklist

Project name
Project manager
Stakeholder list and date they were notified of project conception
Stakeholders were surveyed regarding their risk priorities
Project has been analyzed for risk mitigation opportunities
Project has been designed to give stakeholders daily, real and complete production data for testing purposes in the test environment early as possible
Project has been designed to give all programming teams opportunity to flexibly back out all the way to the old system at any stage of the project
Analysis has been made to see if early testing will reveal opportunities for the rest of the implementation plans. If "yes", project implementation has been designed to optimize early testing of components by clients
Project implementation has been designed to individually integrate as many components into production as possible without affecting current production operations
Risk management survey has been completed or scheduled for relevant stakeholder

Appendix B

Risk Management Survey

Name and role in the project: e.g., client, stakeholder, team member, etc.
Project name
Date
Which risk management checklist items (Appendix A) were applied and not applied? For the ones that applied, were they useful? Why or why not?
Was the risk management goal clearly communicated?
Was the original notification of the risk management process communicated to you in a timely manner?
Were you kept informed of risk related developments related to you in a timely and manner and were the communications clear, including the reasons for them?
Did you receive timely and clear replies to your questions?
Recommendations