The book Turn the Ship Around by L. David Marquet recommends a Leader-Leader paradigm to replace the Leader-Follower paradigm. The Leader-Leader paradigm focuses on all personnel taking ownership over their position rather than just taking orders in a leader-follower structure.
Marquet took command of the USS Santa Fe when it was one of the worst performing submarines in the fleet. Through a series of leadership changes, he transformed the Santa Fe into a top-performing boat. One of the changes he implemented was having subordinates state their actions as “I intend to [action]“ rather than asking the commander what to do. Imagine the following fictitious conversations and think about which relationship you would rather be a part of either as the team lead or developer:
SCENARIO #1:
Team Lead: Is the shopping cart enhancement ready to deploy?
Developer: I finished it yesterday.
Team Lead: Is it ready to deploy?
Developer: Umm. Sure.
Team Lead: Is it tested?
Developer: I don’t know. Ask the test team.
SCENARIO #2:
Developer: I intend to deploy the shopping cart enhancement to production.
Team Lead: Is it tested and support team aware?
Developer: Yes, functional, regression, and performance testing complete. Support team is aware.
Team Lead: Any risks or concerns we should discuss?
Developer: Simple isolated change, easy to validate and easy to rollback.
Team Lead: How will we know if we achieved the desired result with this change?
Developer: We’ll do smoke testing after it’s deployed to make sure we can still add and delete items from the shopping cart.
SCENARIO #3:
Developer: I intend to deploy the shopping cart enhancement to production. All testing is complete and support team aware.
Team Lead: How will we know if we achieved the desired result with this change?
Developer: We made usability and performance improvements that we expect to decrease the shopping cart abandonment rate by 5%. We’ll track that metric daily over the next four weeks.
In scenario #1, the developer does not demonstrate they care about the deployment. In scenario #2, the developer demonstrates they care about seeing their work deployed to and validated as working in production. In scenario #3, the developer demonstrates they truly understand the mission and value of the feature to the business.
All Work is the Same
Friday, June 25, 2021
I Intend to...
Saturday, December 15, 2018
What I Learned About Programming from Forensic Science
EVIDENCE & FAILURE
One of the early stories tells of James Marsh who in 1832 was called as an expert witness in a murder trial to demonstrate that a sample of coffee had been contaminated with arsenic. Marsh identified arsenic was present, however, when the time came to present to the jury his sample had degraded giving rise to reasonable doubt and the accused walked free. McDermid writes:
“James Marsh was a proper scientist. He regarded this failure as a spur towards success. His response to the embarrassment of his court appearance was to devise a better test.”
This attitude towards failure as an opportunity to learn is also ingrained in many software development and operations frameworks. The story is also a reminder of the ephemeral nature of evidence. When an incident occurs, evidence that may help understand the root cause may not last long in memory and eventually may be lost. If root cause analysis is not treated with urgency after an incident is resolved, the evidence necessary to determine the root cause may vanish.
ASSUMPTIONS vs TRUTH
Another story tells the tale of Bernard Spilsbury who came to fame in 1910 for his expert testimony in the trial of Hawley Harvey Crippen accused of murder. McDermid describes Spilsbury as a having “a liberal sprinkling of charisma” and a “handsome, convincing orator”. During the trial “the judge referred to Spilsbury as ‘the greatest living pathologist.’” Spilsbury claimed a skin sample found in Crippen’s home had a scar just like a scar the female murder victim was known to have. However, the defense pointed out hair follicles in the sample indicate it could not have been scar tissue and thus did not point to the victim. Later DNA analysis cast doubt that the sample belonged to a female or was closely related to the victim.
This adolescent period where charisma trumped hard science, is a reminder of the responsibility of those of us who are looked to as experts. If you have unique expertise or knowledge in a group making a decision, you cannot rely on an adversarial system to help you arrive at the best decision. Instead, we must dedicate ourselves to the truth and state and challenge our own assumptions.
EVIDENCE COLLECTION KIT
Spilsbury is also known for establishing the “murder bag”, a collection of gloves, tweezers, evidence bags and other equipment to use in homicide investigations. When you approach an investigation of an incident, there is likely a standard set of tools and techniques you should use for investigation. You check event logs, application logs, CPU utilization, memory utilization, and database logs. Checking all these logs and performance counters manually can take a long time. Consider in your work what prepared kits or tools you could use to collect, process, and analyze logs and performance counters for abnormalities to save time in isolating an incident.
RECOMMENDATION
Forensics: What Bugs, Burns, Prints, DNA, and More Tell Us About Crime by Val McDermid is an engaging, captivating read. Learning how criminal investigators refined their tools and techniques over the history of forensic science can help spawn ideas about how to improve investigation and response to IT incidents.
Sunday, July 29, 2018
How the Military Decision Brief Made Me a Better Software Architect
The decision brief template provides an outline that promotes thinking of different ways to solve a problem and considering the trade-offs of different solutions.
As a junior software engineer, I tried to figure out how more senior engineers came up with design ideas I never considered. After learning about and using the course of action decision briefing template, I came to recognize that they were thinking of different evaluation criteria for a design and then proposing solutions based on those criteria. They were considering the impact to storage, memory, CPU, scalability, supportability, maintainability, security, effort to implement, and other factors I wasn't considering. Considering these different evaluation criteria would often prompt them to come up with alternative design options.
If you are able to define the evaluation criteria for your design, you will better understand the rubric you’ll be graded against before you present it for peer review.
Once you know how your design will be evaluated, you can start asking yourself questions like, “What design would be best for scalability?” or “What design would be the quickest to implement?” Thinking about how to optimize each evaluation criteria one at the time can help you generate different ideas that may not have been obvious. It is easier to come up with ideas if you have a specific problem or constraint you are trying to solve for rather than trying to solve for many considerations all at once. If you over-constrain your thinking, you tend to lose out on those creative big ideas.
Sample Course of Action Brief |
- Purpose
- Problem
- Recommendation
- Prior Coordination
- Background Information
- Facts Bearing on the Problem
- Assumptions
- Courses of Action (COA)
- Screening Criteria
- Screened COAs
- Surviving COAs
- Evaluation Criteria
- Analysis of Each COA
- Comparison of COAs
- Recommendation
- Decision
Saturday, July 14, 2018
How Foursquare Can Save You from Business Disasters
Risk = Impact x Likelihood. The higher the
impact and the higher the likelihood, the more risk you have. Quantifying
impact and likely can be difficult and expensive if you try to do so with
excessive precision. A matrix like the below can help you roughly rank your
risks relative to each other in terms of likelihood and impact. Once you have
ranked your risks relative to each other, you can focus in on the High Impact /
High Likelihood risks. Items in the Low Impact/Low Likelihood column do not
deserve much attention. You may need to focus some attention on those high
impact but low likelihood events and monitor the low impact / high likelihood events.
Road Trip Risks |
The matrix above of potential adverse events plotted
against the axes of risk for a road trip. While we could spend many hours applying actuarial science to calculating probabilities and impacts of
each of the consequences in the matrix, we can simply use the relative
estimated positions in the chart to quickly focus on what potential consequences merit planning out
mitigations. In the High/High quadrant we have “Traffic jam on I-95” that we
may mitigate by leaving early. You could also generate ideas for mitigating the
risk of being stopped for speeding such as avoiding the urge to engage “crazy
mode” in your vehicle. Having your car stolen during a road trip would be a very
high impact but low likelihood event. Since it’s low likely, incurring the
expense of buying a second backup car that someone else follows you around with would
likely cost much more than the mathematical expectation of losing the car. But
since it’s high impact, you might choose to transfer the risk to an insurance
company and have a plan to use a rental car if needed.
You could also invert this concept to map out various opportunities and rank each on their probability of success and positive impact.
This exercise can be done individually or in a group. In a group setting, I recommend having everyone write down events on sticky notes. Mark off the grid with painter's tape on the wall. Have each person place their notes in the matrix. After all the notes are placed, go around to each person to ask if they think any changes should be made and continue round robin rounds until the matrix is stable.
Sunday, June 17, 2018
How Cops Taught Me to Manage Software Problems
Photo by Matt Popovich on Unsplash |
If you have performance problems and production incidents and you are unsure how to manage the underlying problems that cause them, consider borrowing a management technique used by many police and other civil government organizations called CompStat. CompStat is short for “Compare Statistics” and is based on four key components:
- Timely and accurate information or intelligence
- Rapid deployment of resources
- Effective tactics
- Relentless follow-up
I translate these key components for production application support as the following:
1. Timely and accurate information
To be proactive and respond before a small problem becomes a big problem, you must have a way to detect problems. You need to see trends in performance and monitor failure rates. To properly prioritize response to problems, you need to know what functions are causing the most failures or lag in your system. A good application monitoring program can provide you the quantitative data you need to quickly identify problem areas that cause the most incidents and performance issues. Combining that quantitative data with the impact on your business or mission can give you a good idea of what to address first.
2. Rapid response
When you notice small performance problems, do they typically just magically disappear? Probably not. Small performance problems tend to become big performance problems as data and application usage grows. To prevent little problems from becoming big problems, respond quickly. To achieve system stability for your customers, you cannot stop at resolving the incident. You have to determine the underlying problems that lead to incidents and attack those problems with a sense of urgency.
3. Effective tactics, techniques and procedures
Are you able to employ resources to quickly isolate and resolve incidents? After resolving an incident consider documenting how it was isolated and investing in tools and methods to more quickly detect, isolate, and resolve the same class of problems in the future.
4. Relentless follow-up
Follow-up is the most easily forgotten component but is critical to successfully reducing problems in a system. If you deploy a change to resolve a problem and don’t validate that it actually resolved the problem, you will not know if your efforts are effective. Not following up on the effectiveness of a change can lead to moving on to solving lower priority problems when you have not actually resolved higher priority problems which might re-occur at inopportune times. Follow-up can reveal that there were multiple causes that lead to an incident, not just the one you fixed. After a investing time and money into fixing a problem, reporting the decline in incidents or increase in performance after the changes can help quantify the value provided by fixing problems and increase support for allocating resources to resolve production problems. After I deploy a change, I’ve gotten in the habit of not marking a user story done until after I’ve validated my fix has had the intended impact in production. I add a task to the story for post-deployment verification and set a calendar invite for myself to perform the task a week or two later -- however much time is needed to collect enough evidence to reasonably conclude the issue was fixed. Rushing to mark user stories done does not help quality or system stability. If you want a stable system, you have to follow up to ensure your changes are having the intended impact and not creating new problems.
Saturday, June 16, 2018
Managing Change the DOTMPLF Way
Most of us understand that if we run out one day
and buy a cello, we won’t be able to play music like Yo-Yo Ma that same day. To
be able to play something that resembles music you would need training,
practice, and a safe place to store the cello. However, sometimes we don’t
think about what’s necessary to achieve a new capability beyond just buying
something like a new piece of software. To help understand the different types
of changes that need to be made for successful technology injection, the US
military uses the acronym DOTMLPF-P (pronounced Dot-M-L-P-F-P) that stands for
Doctrine, Organization, Materiel, Leadership, Personnel, Facilities, and
Policy.
Let’s say I want to change a team from doing
deployments once a quarter to deploy once a week. Could I just install Jenkins
and declare victory? That approach is unlikely to be successful because you
will need to plan for personnel to configure Jenkins. You may need to train
people on configuring and using Jenkins. You may need an organizational change
to form a DevOps team to design and build out the initial deployment pipelines. To get the resources to do this
and people focused on it, you will need to provide leadership. You may
also need additional equipment like a new server to host Jenkins. You
might rethink your change management policy if it can’t accommodate
frequent releases. You may need to change the doctrine you follow, the way you
develop and test software, by emphasizing smaller, quicker releases, changing
your branching structure, using feature flags, and implementing automating
tests. For this one change, you had to think about not just obtaining and
installing the software but the doctrine, organization, training, materials
(software & hardware), leadership, personnel, and policy -- seven
components of DOTMLPF-P.
Below is the military definition of each element
of DOTMLPF-P along with my own translation for a software developer or other IT
professional.
Doctrine
Military Definition: “the way we fight (e.g.,
emphasizing maneuver warfare, combined air-ground campaigns).”
Developer Definition: The high level approached
based on a set of general principles that we use to deliver software. Think
agile as guided by the agile manifesto, RUP, or waterfall as different
doctrines. Software delivery has predominantly shifted to small
multi-disciplinary teams producing small and quick releases to production.
Organization
Military Definition: “how we organize to fight
(e.g., divisions, air wings, Marine-Air Ground Task Forces)”
Developer Definition: How we divide into teams
to deliver working software (feature teams, project teams, separate functional
dev and test teams).
Training
Military Definition: “how we prepare to fight
tactically (basic training to advanced individual training, unit training,
joint exercises, etc).”
Developer Definition: How we learn to deliver software. Training in programming languages, new technologies, agile practices, security and other software and system engineering competencies. This may include formal training classes like a certification bootcamps, peer training, college courses, etc.
Developer Definition: How we learn to deliver software. Training in programming languages, new technologies, agile practices, security and other software and system engineering competencies. This may include formal training classes like a certification bootcamps, peer training, college courses, etc.
Materiel
Military Definition: “all the ‘stuff’ necessary
to equip our forces.” When considering a new purchase, the definition is
restricted to equipment “that DOES NOT require a new development effort
(weapons, spares, test sets, etc that are ‘off the shelf’ both commercially and
within the government)” to focus on considering existing solutions to
potentially fill a capability gap.
Developer Definition: Your tools. This may include servers, workstation, Visual Studio, Eclipse, Emacs, etc.
Developer Definition: Your tools. This may include servers, workstation, Visual Studio, Eclipse, Emacs, etc.
Leadership and education
Military Definition: “how we prepare our leaders
to lead the fight (squad leader to 4-star general/admiral - professional
development)”
Developer Definition: Leadership is necessary to
bring people together to focus on solving a problem. If you are using 10 year
old development tools and techniques or everyone thinks your processes don’t
make sense but nobody does anything about it, it’s probably because your team
lacks leadership. Leadership doesn’t have to come from formal managers. Indeed,
to move up in the ranks, showing initiative, drive and leading even small
changes like upgrading Visual Studio across the whole development team goes a
long way to distinguish you as someone who is engaged, cares about the team,
and can drive change.
Personnel
Military Definition: “availability of qualified
people for peacetime, wartime, and various contingency operations”
Developer Definition: The people you have on
your team(s).
Facilities
Military Definition: “real property,
installations, and industrial facilities (e.g., government owned ammunition
production facilities)”
Developer Definition: The physical space where
your team works and the physical space where your servers are hosted. If you
are hosting servers in your own facility, you have to think about heating and
cooling, fire suppression systems, access controls and other physical security
issues. If you are forming a new team or rearranging teams you may also think
about how your office space is laid out to allow for collaboration and for
quiet time.
Policy
Military Definition: “DoD, interagency, or
international policy that impacts the other seven non-materiel elements”
Developer Definition: The rules you must follow
as part of your software delivery process. For example, all changes must be
approved by the Configuration Change Board prior to deploying to production.
Subscribe to:
Posts (Atom)