Reporting

How to Communicate Uncertainty to Execs Without Losing the Room

Frameworks for presenting statistical uncertainty to non-technical stakeholders. Say 'we're not sure' without losing credibility or decision-making momentum.

Jan 2611 min readstatstest_flow Reporting Supporting

How to Communicate Uncertainty to Execs Without Losing the Room

Quick Hits

•Lead with the decision, not the uncertainty
•Translate confidence intervals to business language: 'between X and Y'
•Quantify uncertainty—'maybe' is less useful than 'likely 2-6% lift'
•Present options, not hedges: 'we could A) ship now or B) collect more data'
•Uncertainty isn't weakness—it's honesty about what the data can tell us

TL;DR

Communicating uncertainty to executives requires framing, not hedging. Lead with your recommendation, translate confidence intervals to business language ("between 2% and 6%"), quantify uncertainty instead of using vague words ("about 70% confident" not "probably"), and present options when decisions depend on risk tolerance. Uncertainty isn't weakness—it's honesty. The goal isn't statistical education; it's enabling good decisions with incomplete information.

The Communication Problem

What Happens When You Lead with Uncertainty

Analyst: "So, the results are not statistically significant at the 0.05 level, though the point estimate is positive. The confidence interval includes zero but also includes our MDE, and the p-value is 0.12, which some would consider marginally significant..."

Executive: Checks phone

Executive: "So... should we ship it or not?"

What Executives Actually Need

They Don't Need	They Need
P-values	Recommendations
Confidence intervals	Risk assessment
Statistical jargon	Business language
Your uncertainty	Options with trade-offs
Every caveat	The key caveat

The Communication Framework

The RUBS Structure

R - Recommendation (what you'd do)
U - Uncertainty (quantified, not hedged)
B - Business implications (translated)
S - Scenarios/options (if risk-dependent)

Example: Applying RUBS

Before (Uncertainty-First):

"The experiment showed a 3.2% lift in conversion, but the confidence interval is 0.5% to 5.9%, which means we can't rule out that the true effect is less than our 2% MDE. The p-value is 0.02, so it's significant, but I'm not sure if we should..."

After (RUBS):

"Recommendation: Ship the new checkout flow.

Why: We're confident the effect is positive (95% sure it's at least 0.5% lift). Our best estimate is 3.2%.

The uncertainty: The lift could be anywhere from small (0.5%) to substantial (6%). There's about a 30% chance it's below our 2% target.

If I'm wrong: Worst realistic case is a small positive effect, not a negative one. The feature also improves user experience qualitatively."

Translating Statistical Concepts

Confidence Intervals → Ranges

Technical: "95% CI: 1.2% to 5.4%"

Translated: "The lift is probably around 3%, but could realistically be anywhere from 1% to 5%."

Even Better: "Even in the pessimistic scenario, we're looking at a 1% lift. Optimistically, 5%. Most likely around 3%."

def translate_ci_to_business(point_estimate, ci_lower, ci_upper, metric_name):
    """
    Translate confidence interval to business language.
    """
    translations = []

    # Frame around the range
    range_statement = (
        f"The {metric_name} improvement is most likely around {point_estimate:.1%}, "
        f"but could realistically range from {ci_lower:.1%} to {ci_upper:.1%}."
    )
    translations.append(("Range framing", range_statement))

    # Frame around best/worst case
    scenario_statement = (
        f"Best realistic case: {ci_upper:.1%} lift. "
        f"Worst realistic case: {ci_lower:.1%} lift. "
        f"Most likely: {point_estimate:.1%}."
    )
    translations.append(("Scenario framing", scenario_statement))

    # Frame around confidence direction
    if ci_lower > 0:
        direction = f"We're highly confident the effect is positive—at least {ci_lower:.1%}."
    elif ci_upper < 0:
        direction = f"We're highly confident the effect is negative—at most {ci_upper:.1%}."
    else:
        direction = "We can't rule out no effect—the data is consistent with both positive and negative outcomes."
    translations.append(("Direction framing", direction))

    return translations


# Example
translations = translate_ci_to_business(0.032, 0.012, 0.054, "conversion")
print("CI Translation Options")
print("=" * 60)
for frame_type, statement in translations:
    print(f"\n{frame_type}:")
    print(f"  {statement}")

P-Values → Confidence Language

Technical: " $p = 0.03$ "

Translated: "The observed improvement is unlikely to be chance—about a 3% probability this is a fluke."

Better: Skip the p-value entirely and use the CI: "We're confident the effect is real. Even the conservative estimate shows meaningful lift."

def translate_p_value(p_value, effect_direction):
    """
    Translate p-value to plain language.
    """
    if p_value < 0.001:
        confidence = "virtually certain"
        pct = "<0.1%"
    elif p_value < 0.01:
        confidence = "highly confident"
        pct = f"~{p_value*100:.0f}%"
    elif p_value < 0.05:
        confidence = "reasonably confident"
        pct = f"~{p_value*100:.0f}%"
    elif p_value < 0.10:
        confidence = "somewhat confident"
        pct = f"~{p_value*100:.0f}%"
    else:
        confidence = "not confident"
        pct = f"~{p_value*100:.0f}%"

    if effect_direction == "positive":
        return f"We're {confidence} the improvement is real (only {pct} chance this is random noise)."
    else:
        return f"The effect might be {effect_direction}, but we're {confidence} ({pct} could be noise)."


# Examples
print(translate_p_value(0.001, "positive"))
print(translate_p_value(0.03, "positive"))
print(translate_p_value(0.15, "positive"))

Statistical Significance → Decision Language

Technical	Business Translation
"Statistically significant"	"We're confident this is a real effect"
"Not significant"	"The data doesn't give us a clear answer"
"Marginally significant"	"Suggestive but not conclusive"
"Highly significant"	"We're very confident" (but still note the effect size)

Quantifying vs. Hedging

The Hedge Problem

Hedging (vague):

"The effect might be positive"
"There's some chance it works"
"We can't be certain"
"It's possible but not guaranteed"

Quantifying (useful):

"70% chance the effect is positive"
"Likely between 2% and 6% lift"
"Even the pessimistic scenario shows 1% improvement"
"If we're wrong, downside is limited to X"

Building Your Calibration

Verbal expression	Probability	When to use
Virtually certain	99%+	CI entirely excludes alternatives
Highly confident	90-99%	CI barely touches threshold
Confident	75-90%	Point estimate well above threshold
Fairly confident	60-75%	Point estimate above threshold
Uncertain	40-60%	Could go either way
Skeptical	25-40%	Point estimate below but possible
Doubtful	10-25%	CI barely includes positive
Very doubtful	<10%	CI strongly suggests negative or null

Presenting Options for Risk-Dependent Decisions

When Uncertainty Affects the Decision

Sometimes the right answer depends on risk tolerance. In these cases, present options clearly:

## Recommendation: Choose Based on Risk Tolerance

### The Situation
- Observed lift: 2.8%
- 95% CI: -0.5% to 6.1%
- Our threshold was 2% lift

### Options

**Option A: Ship Now**
- Upside: Capture likely 2-3% improvement immediately
- Risk: ~20% chance the true effect is below our threshold
- Best if: Speed matters, feature has other benefits

**Option B: Extend Experiment 2 Weeks**
- Upside: Conclusive answer (narrower CI)
- Risk: 2 weeks delay; might still be inconclusive
- Best if: Decision is reversible, no urgency

**Option C: Partial Rollout (50%)**
- Upside: Capture some benefit while gathering more data
- Risk: Complexity; smaller sample than full rollout
- Best if: Want to hedge bets

### My Recommendation
Option A—ship now. The likely outcome is positive, the downside
risk is limited (worst case: small positive or neutral), and we
can monitor post-launch.

Decision Matrix for Stakeholders

def create_decision_matrix(options):
    """
    Create a simple decision matrix for stakeholders.
    """
    print("Decision Matrix")
    print("=" * 70)
    print(f"{'Option':<15} {'Best Outcome':<18} {'Worst Outcome':<18} {'Recommended If'}")
    print("-" * 70)
    for opt in options:
        print(f"{opt['name']:<15} {opt['best']:<18} {opt['worst']:<18} {opt['recommended_if']}")


options = [
    {
        'name': "Ship now",
        'best': "+6% lift",
        'worst': "~0% lift",
        'recommended_if': "Speed matters"
    },
    {
        'name': "Wait 2 weeks",
        'best': "Clear answer",
        'worst': "Still uncertain",
        'recommended_if': "Need certainty"
    },
    {
        'name': "50% rollout",
        'best': "Some benefit + data",
        'worst': "Complexity",
        'recommended_if': "Hedging bets"
    }
]

create_decision_matrix(options)

Common Scenarios and Scripts

Scenario 1: Positive but Wide CI

Situation: 4% lift, CI: 0.5% to 7.5%

Script:

"The new feature is working—we're confident it's positive. Our best estimate is a 4% improvement, and even in the conservative case, we're looking at at least half a percent. I recommend shipping.

The range is wide because [sample size / high variance / short duration], so if precision matters more than speed, we could run another week. But the directional evidence is clear: ship."

Scenario 2: Inconclusive Results

Situation: 1.5% lift, CI: -2% to 5%

Script:

"Honest answer: the data doesn't give us a clear signal. The effect could be positive, negative, or zero—all are consistent with what we observed.

We have three options:

Run two more weeks for a conclusive answer

Ship based on qualitative factors (user feedback, strategic fit)

Kill the feature and move on

My recommendation is [X] because [rationale]. But this is a judgment call that depends on how much we value certainty vs. speed."

Scenario 3: Surprising Negative Result

Situation: -3% conversion, CI: -5.5% to -0.5%

Script:

"This isn't what we expected, but the data is clear: the feature hurt conversion. We're confident the effect is negative—likely around 3% worse.

Before we kill it, let's check:

Is the implementation correct? (Rule out bugs)

Are there segments where it worked? (Exploratory)

What's the qualitative feedback?

If the implementation is correct and there's no segment story, I recommend rolling back. We learned something valuable."

Scenario 4: Below MDE but Positive

Situation: 1.2% lift, CI: 0.3% to 2.1%, MDE was 2%

Script:

"Good news and bad news. Good: the feature works—we're confident it's positive. Bad: the effect is smaller than we hoped. Our threshold was 2% lift; we're seeing about 1%.

Question for the group: Is a 1% lift worth the engineering cost to maintain this? That's a business judgment, not a statistical one.

If 1% is valuable, ship. If we need 2% to justify the complexity, either iterate on the feature or move on."

Visual Communication

The Uncertainty Visualization

import matplotlib.pyplot as plt
import numpy as np


def visualize_for_executives(point_estimate, ci_lower, ci_upper,
                              threshold=None, metric_name="Conversion"):
    """
    Create an executive-friendly visualization of uncertainty.
    """
    fig, ax = plt.subplots(figsize=(10, 3))

    # Plot the CI as a horizontal bar
    ax.barh(0, ci_upper - ci_lower, left=ci_lower, height=0.4,
            color='steelblue', alpha=0.3, label='Plausible range')

    # Plot the point estimate
    ax.scatter(point_estimate, 0, color='steelblue', s=200, zorder=5,
               label=f'Best estimate: {point_estimate:.1%}')

    # Add threshold line if provided
    if threshold is not None:
        ax.axvline(x=threshold, color='green', linestyle='--',
                   linewidth=2, label=f'Target: {threshold:.1%}')

    # Add zero line
    ax.axvline(x=0, color='gray', linestyle='-', linewidth=1)

    # Annotations
    ax.annotate(f'Worst case\n{ci_lower:.1%}', xy=(ci_lower, 0),
                xytext=(ci_lower, -0.3), ha='center', fontsize=10)
    ax.annotate(f'Best case\n{ci_upper:.1%}', xy=(ci_upper, 0),
                xytext=(ci_upper, -0.3), ha='center', fontsize=10)

    ax.set_xlim(min(ci_lower - 0.02, -0.01), ci_upper + 0.02)
    ax.set_ylim(-0.5, 0.5)
    ax.set_xlabel(f'{metric_name} Lift')
    ax.set_yticks([])
    ax.legend(loc='upper right')
    ax.set_title('Experiment Result: Plausible Range of Effects', fontsize=12)

    plt.tight_layout()
    return fig


# Example
# visualize_for_executives(0.032, 0.012, 0.054, threshold=0.02)
# plt.show()

The One-Slide Summary

┌────────────────────────────────────────────────────────────┐
│  EXPERIMENT: New Checkout Flow                             │
│  ──────────────────────────────────────────────────────    │
│                                                            │
│  RECOMMENDATION: 🟢 SHIP                                   │
│                                                            │
│  WHAT WE FOUND                                             │
│  ┌────────────────────────────────────────────────────┐   │
│  │                                                    │   │
│  │   Conversion improved by approximately 3%         │   │
│  │   Could be as low as 1%, as high as 5%            │   │
│  │                                                    │   │
│  │   [=====|===========|=====]                       │   │
│  │    1%      3%        5%                           │   │
│  │   worst   likely     best                         │   │
│  │                                                    │   │
│  └────────────────────────────────────────────────────┘   │
│                                                            │
│  CONFIDENCE: High that effect is positive                  │
│              Moderate that it exceeds 2% target            │
│                                                            │
│  IF WE'RE WRONG: Worst realistic case is small positive,  │
│                  not negative. Limited downside.           │
│                                                            │
│  KEY CAVEAT: Effect concentrated on desktop; mobile flat   │
│                                                            │
└────────────────────────────────────────────────────────────┘

Phrases to Use and Avoid

Use These

Situation	Phrase
High confidence	"We're confident that..."
Quantified range	"Most likely between X and Y"
Worst case	"Even in the pessimistic scenario..."
Uncertainty with recommendation	"Despite the uncertainty, I recommend..."
Bounded downside	"If we're wrong, the worst outcome is..."
Clear options	"There are two paths: A or B"

Avoid These

Avoid	Why	Use Instead
"Statistically significant"	Jargon	"We're confident"
"Not significant"	Sounds like no effect	"The data is inconclusive"
"P-value of 0.03"	Nobody cares	"3% chance this is a fluke"
"I'm not sure"	Undermines credibility	"The uncertainty is..."
"Maybe"	Unquantified	"About 60% likely"
"Possibly"	Hedge	"Could range from X to Y"

R Implementation

# Function to generate executive summary
executive_summary <- function(
  point_estimate,
  ci_lower,
  ci_upper,
  p_value,
  threshold = NULL,
  metric_name = "conversion"
) {
  # Determine confidence level
  if (ci_lower > 0) {
    direction_confidence <- "confident the effect is positive"
  } else if (ci_upper < 0) {
    direction_confidence <- "confident the effect is negative"
  } else {
    direction_confidence <- "uncertain about the direction"
  }

  # Generate summary
  cat("EXECUTIVE SUMMARY\n")
  cat(paste(rep("=", 50), collapse = ""), "\n\n")

  cat("FINDING:\n")
  cat(sprintf("  %s improved by approximately %.1f%%\n",
              metric_name, point_estimate * 100))
  cat(sprintf("  Plausible range: %.1f%% to %.1f%%\n\n",
              ci_lower * 100, ci_upper * 100))

  cat("CONFIDENCE:\n")
  cat(sprintf("  We're %s\n\n", direction_confidence))

  if (!is.null(threshold)) {
    if (ci_lower > threshold) {
      threshold_msg <- "confidently exceeds target"
    } else if (point_estimate > threshold) {
      threshold_msg <- "likely exceeds target, but not certain"
    } else {
      threshold_msg <- "unlikely to exceed target"
    }
    cat(sprintf("  Effect %s (%.1f%%)\n\n", threshold_msg, threshold * 100))
  }

  cat("WORST CASE:\n")
  cat(sprintf("  Even pessimistically: %.1f%%\n", ci_lower * 100))
}

# Example usage
executive_summary(
  point_estimate = 0.032,
  ci_lower = 0.012,
  ci_upper = 0.054,
  p_value = 0.003,
  threshold = 0.02,
  metric_name = "Conversion"
)

Analytics Reporting (Pillar) - Complete reporting guide
One-Slide Experiment Readout - Presentation template
When to Say Inconclusive - Handling unclear results
Effect Sizes for Proportions - Understanding magnitude

Key Takeaway

Communicating uncertainty to executives isn't about hedging—it's about framing. Lead with your recommendation (they're paying you for judgment, not just analysis). Translate statistical concepts to business language ("could be between 2% and 6%"). Quantify uncertainty instead of hand-waving ("about 70% confident" beats "probably"). Present clear options when decisions depend on risk tolerance. And always include what you'd recommend given the uncertainty. Executives don't need to understand p-values; they need to make good decisions with incomplete information. Help them do that by being clear about what you know, what you don't, and what you'd do.

References

https://hbr.org/2020/02/how-to-explain-data-uncertainty-to-non-experts
https://doi.org/10.1038/d41586-019-00857-9
Fischhoff, B., & Davis, A. L. (2014). Communicating scientific uncertainty. *Proceedings of the National Academy of Sciences*, 111(Supplement 4), 13664-13671.
Spiegelhalter, D. (2017). Risk and uncertainty communication. *Annual Review of Statistics and Its Application*, 4, 31-60.
Hullman, J. (2020). Why authors don't visualize uncertainty. *IEEE Transactions on Visualization and Computer Graphics*, 26(1), 130-139.

Frequently Asked Questions

What if execs just want a yes or no answer?

Give them one, with calibrated confidence: 'Based on the data, I recommend shipping. We're 80% confident the lift is positive, and 60% confident it exceeds our threshold. If we're wrong, the downside is limited to X.' A recommendation with caveats is more useful than endless hedging.

How do I explain confidence intervals without jargon?

Use 'range' or 'could be between': 'The lift is probably 4%, but it could be anywhere from 1% to 7%. That's still positive, so we should ship.' Or use analogies: 'It's like knowing the train arrives sometime between 3:00 and 3:15—not exact, but you know when to be at the station.'

What if stakeholders lose confidence when I express uncertainty?

Uncertainty framed as 'I don't know' loses confidence. Uncertainty framed as 'here's what we know and what we don't' builds trust. Always pair uncertainty with a recommendation: 'Given the uncertainty, I recommend X because Y.' Being honest about limitations while still making recommendations demonstrates expertise.

Key Takeaway

Communicating uncertainty to executives isn't about hedging—it's about framing. Lead with your recommendation, translate statistical concepts to business language ('could be between 2% and 6%'), quantify the uncertainty instead of hand-waving ('probably' vs 'about 70% confident'), present clear options when decisions depend on risk tolerance, and always include what you'd recommend. Executives don't need to understand p-values; they need to make good decisions with incomplete information. Help them do that by being clear about what you know, what you don't, and what you'd do.

Send to a friend

Share this with someone who loves clean statistical work.

Facebook X Reddit LinkedIn Email