Estimation Points

Planning Poker® Cards, Mountain Goat Software. Used with permission.

...you have Product Backlog Items on your Product Backlog, and you need an approach to estimate the effort required to implement those items.

✥       ✥       ✥ 

It is important to have reliable estimates. Customers are happiest when products meet or exceed their expectations. A team wants to size the work of a Sprint properly so it will deliver what the customer expects: too much work for the Sprint and stakeholder expectations suffer; too little work for the Sprint and the team has extra time on its hands for which work has not been planned or prepared. The same problems extend to longer time horizons, of delivering something several Sprints too late or several Sprints too early. However, if you spend a lot of effort estimating in days or hours, and your velocity changes (see Notes on Velocity), then the estimates are all wrong, anyhow.

Part of being reliable is being up to date: to ensure that the most current knowledge stands behind the team’s estimates. It’s hard to be up-to-date if it takes a long, boring exercise to torture estimates from the team.

The Product Owner wants to know the earliest potential release date for a set of features. The natural inclination is to ask people to estimate the number of hours required to finish each item, and then to add up the hours required to finish all desired features and to project that number onto a calendar, presuming a fixed team size and a work week of 37–40 hours. However, the estimates for work completion don’t account for “in-between work” activities such as meetings, doing email, administration, the costs of context switching, and even of many business tasks, including work ordering and estimation itself. A typical work week is much shorter than 40 hours, with teams more typically spending about half that much on focused work. A Harris Poll survey found that people spend only about 45 percent of their time on “primary job duties.” [1] But the truth is, nobody really knows how many of those 40 hours are overhead and how many are real work. That makes it hard to know how many weeks it will take to accomplish a given number of hours of work.

An estimate is not a commitment, but rather a best guess of the work to complete a given Product Backlog Item (PBI) or Sprint Backlog Item (SBI). If the team is using absolute estimation units such as days or hours, it may suggest to stakeholders that the team can calculate a possible availability date for a given item assuming, say, a 40-hour work week for each team member, and to “prove” that the team can complete the item earlier. The problem, of course, is that the 40-hour week is a fiction. The difference between the stakeholders’ naive calculation and the team’s more informed calculation breaks down trust between them, and stakeholders may view any attempt to explain away the disparity as a difference between real hours and ideal hours with suspicion.

The Development Team needs to measure how process improvement affects its capacity for work. Removing impediments may cause velocity to increase. Without a velocity measure based on Estimation Points, it is difficult and often impossible to assess the results of an intended process improvement.

The Development Team and the Product Owner want to have a certain level of accuracy in the estimation. People have difficulty estimating in absolute units (e.g., meters, days) (see “Wedding Planning” in [2], ff. 118, and also [3], ff. 125). Research shows that estimating in relative units is a more accurate alternative. [4] When using absolute estimations, the units we use for estimation imply a corresponding duration. People can agree on the size of a thing like a 100-meter track but they will have a hard time agreeing on how fast it takes to run the 100 meters. We can agree that running 200 meters will probably take more than double the time as for 100 meters, though we may not be able to agree on the time for either. Experience will tell us how long it will take to run 100 meters, which means that we should be able to project how long it would take to run twice as long.

 

Therefore:

Use unitless numbers for effort estimation. Use relative estimation, starting with the Estimation Points for some simple work item that everyone understands well as a baseline for the rest.

To estimate Product Backlog Items, the initial baseline can be some PBI about which the team has consensus understanding. For the Sprint Backlog, it can be some task, deliverable, or other unit of measure of the Development Team’s choice for which the team has consensus understanding of the likely effort involved.

In all cases, the team should consider all work necessary to bring the corresponding item (PBI or Sprint Backlog Item) to the equivalent of Done (see Done).

The team can communicate its velocity (see Notes on Velocity) in units of Estimation Points per Sprint.

 

The estimation includes all work necessary for the Development Team to develop the Product Backlog Item to Done.

✥       ✥       ✥ 

Having a velocity “standard” in hand, the team can derive its velocity as a basis for prediction and improvement. See Running Average Velocity and Aggregate Velocity for common ways to apply velocity in the Scrum framework.

Poker-planning, as conceived by Grenning[4] and further elaborated and popularized by Cohn ([5]), is a modern implementation of the Delphi technique that has proven to be an excellent approach to generate Estimation Points. Poker-planning is based on a nonlinear scale (approximately the Fibonacci numbers) that helps break down linear thinking. The main idea behind the use of the Fibonacci series is that the distance between allowable estimates increases as the estimates increase, reflecting the increasing uncertainty with increasingly high estimates. To help weaken any faith that might remain in the precision of large estimates, the scale rounds down larger Fibonacci numbers (e.g., 21 becomes 20, because 21 has too many significant digits, etc.).

A team often anchors its poker-planning exercise with a baseline. The baseline is usually a small number (1, 2, or 3) and the team associates that number with some low-effort Small Item to Estimate, with which the Development Team has high familiarity and confidence. After the first round of estimation, every item is a baseline, by transitive closure of relative estimation. This obviates the need for techniques like reference stories.

Some teams give a pessimistic and optimistic estimate for each item to caveat their forecasts. It is common Scrum practice to instead give a single consensus estimate for each item, and then separately derive confidence ranges empirically from historic data. This makes estimation go faster and provides a solid foundation for believing that the confidence range is something other than an arbitrary attempt to avoid blame. See Release Range.

A common practice in Scrum is to call Estimation Points “Gummi Bears.” Ron Jeffries first mentions this name in 1999, and other references attribute it to an XP project led by Joseph Pelrine.[6]

A survey jointly sponsored by CA Software and the Software Engineering Institute (SEI) at Carnegie Mellon University, of 50,000 agile teams ([7]), has found that for 90 percent of the respondents, using Estimation Points is better than a hybrid approach that still uses hours for Sprint Backlog Items, which in turn is better than the “no estimates” technique, which in turn is better than estimating in hours.

The team can also use Estimation Points to estimate Sprint Backlog Items on the Sprint Backlog. If the Sprint Backlog and Product Backlog use the same units for estimation, then the team can audit whether estimates for a given PBI are too optimistic or pessimistic by summing the estimates of the SBIs that the team must complete to bring the corresponding PBI to Done. Teams often find that original PBI estimates are optimistic because the process of design (of turning PBIs into SBIs) uncovers emergent work.

Estimation Points are just a technique. A team can regularly improve its use of this technique to avoid common pitfalls. The most important problematic pitfall is to not involve all the developers; the second most problematic pitfall is to allow undue influence from anyone else. There are more refined improvements to the technique that nonetheless can make a lot of difference; see [8]. The key points to remember here are that the pace of the schedule should be set by those carrying out the implementation work, and that people are very bad at reckoning estimates in absolute time units.

To the degree that Development Team members understand an item well enough to estimate it, they are likely to understand it well enough to implement it. Indeed, one of the main reasons to do estimation is to get the team thinking about the problem and solution early; when the time comes to implement a solution, it has already been rolling about in Developers’ minds for a while. See Refined Product Backlog.


[1] Bourree Lam. “The Wasted Workday.“ In The Atlantic, 4 December 2014, https://www.theatlantic.com/business/archive/2014/12/the-wasted-workday/383380/ (accessed 2 November 2017).

[2] Jeff Sutherland and J. J. Sutherland. “Wedding Planning.” In Scrum: The Art of Doing Twice the Work in half the Time. New York: Random House, 2014, ff. 118.

[3] Kenneth S. Rubin. Essential Scrum: A Practical Guide to the Most Popular Agile Process. Reading, MA: Addison-Wesley, 2012, ff. 125.

[4] James Grenning. “Planning Poker or How to avoid analysis paralysis while release planning.” https://wingman-sw.com/papers/PlanningPoker-v1.1.pdf, 2002 (accessed 2 November 2017).

[5] Mike Cohn. Agile Estimation and Planning. New York: Prentice-Hall, 2005.

[6] —. “Agile Practices Timeline.” AgileAlliance.org, https://www.agilealliance.org/agile101/practices-timeline/ (accessed 2 November 2017).

[7] —. “The impact of agile quantified.” ProjectManagement.com, http://www.projectmanagement.com/pdf/469163_the-impact-of-agile-quantified.pdf, n.d. (accessed 13 November 2016).

[8] Magne Jørgensen. “Relative Estimation of Software Development Effort: It Matters with What and How You Compare.” In IEEE Software 30(2), March-April 2013, pp. 74–79.