Contents
When to Say 'Inconclusive': Decision Rules That Build Trust
Knowing when to call an experiment inconclusive is a skill. Learn decision frameworks for ambiguous results that maintain credibility and enable good business decisions.
Quick Hits
- •Inconclusive isn't failure—it's an honest answer when the data doesn't speak clearly
- •The CI includes both meaningful positive and meaningful negative? That's inconclusive.
- •Don't spin inconclusive as positive; don't force a narrative the data doesn't support
- •Options for inconclusive: extend, ship with risk acknowledgment, or abandon
- •Saying 'I don't know' when appropriate builds more trust than false certainty
TL;DR
Inconclusive is a legitimate conclusion, not a failure. Call results inconclusive when the confidence interval includes both meaningful positive and negative effects—the data genuinely doesn't tell you which way it goes. Don't spin inconclusive as positive ("directionally promising!") or dismiss it as negative ("probably doesn't work"). Present options: extend the experiment, ship accepting uncertainty, or abandon. Honest inconclusive calls build trust; forced narratives destroy it.
What "Inconclusive" Actually Means
The Definition
Inconclusive = The data is consistent with multiple importantly different outcomes.
← Negative → ← No Effect → ← Positive →
Conclusive (Negative): |---CI---|
↑ Zero
Conclusive (Positive): |---CI---|
↑ Zero
Inconclusive: |-------CI-------|
↑ Zero
The Key Distinction
| Result Type | Confidence Interval | What You Know |
|---|---|---|
| Conclusive positive | Entirely above zero | Effect is definitely positive |
| Conclusive negative | Entirely below zero | Effect is definitely negative |
| Conclusive null | Narrow, centered on zero | Effect is definitely small/zero |
| Inconclusive | Wide, includes zero | Could be positive, negative, or zero |
When to Call It Inconclusive
Decision Framework
def classify_result(ci_lower, ci_upper, mde, zero=0):
"""
Classify experiment result as conclusive or inconclusive.
Parameters:
-----------
ci_lower : float
Lower bound of confidence interval
ci_upper : float
Upper bound of confidence interval
mde : float
Minimum detectable/important effect
zero : float
The null hypothesis value (usually 0)
"""
# Conclusive positive: CI entirely above zero
if ci_lower > zero:
if ci_lower > mde:
return "CONCLUSIVE: Strong positive (CI above MDE)"
else:
return "CONCLUSIVE: Positive (CI above zero but includes below MDE)"
# Conclusive negative: CI entirely below zero
if ci_upper < zero:
return "CONCLUSIVE: Negative (CI entirely below zero)"
# Now CI includes zero - is it inconclusive or conclusive null?
ci_width = ci_upper - ci_lower
# If CI is narrow and centered near zero, it's a conclusive null
if ci_width < mde and abs(ci_lower + ci_upper) / 2 < mde / 2:
return "CONCLUSIVE: No meaningful effect (narrow CI around zero)"
# Otherwise, it's inconclusive
if ci_lower < -mde / 2 and ci_upper > mde / 2:
return "INCONCLUSIVE: CI includes both meaningful positive and negative"
elif ci_lower < zero < ci_upper:
return "INCONCLUSIVE: CI includes zero; direction uncertain"
else:
return "BORDERLINE: Close to conclusive but not quite"
# Examples
print("Result Classification Examples")
print("=" * 60)
examples = [
("Clear positive", 0.02, 0.06, 0.02),
("Clear negative", -0.05, -0.01, 0.02),
("Inconclusive (wide)", -0.02, 0.04, 0.02),
("Inconclusive (includes zero)", -0.01, 0.03, 0.02),
("Conclusive null (narrow)", -0.005, 0.005, 0.02),
("Positive but below MDE", 0.005, 0.015, 0.02),
]
for name, ci_l, ci_u, mde in examples:
result = classify_result(ci_l, ci_u, mde)
print(f"\n{name}:")
print(f" CI: [{ci_l:+.1%}, {ci_u:+.1%}], MDE: {mde:.1%}")
print(f" → {result}")
The Critical Questions
Ask yourself:
-
Does the CI include both positive and negative effects I'd care about?
- If yes → likely inconclusive
-
Is the CI narrow enough to rule out meaningful effects?
- If narrow and near zero → conclusive null
- If wide → inconclusive
-
Could I confidently recommend action based on this result?
- If no → probably inconclusive
What NOT to Do with Inconclusive Results
Don't Spin Positive
The Temptation: "While not statistically significant, the results are directionally positive, suggesting..."
Why It's Wrong: If the CI includes zero and negative values, the data is equally consistent with a negative effect. "Directionally positive" implies more certainty than exists.
Better: "The point estimate is positive, but the confidence interval includes both positive and negative effects. We cannot conclude the treatment helped."
Don't Spin Negative
The Temptation: "The experiment showed no significant effect, indicating the treatment doesn't work."
Why It's Wrong: "Not significant" with a wide CI doesn't mean "no effect." It means "we don't know."
Better: "We did not detect a significant effect. However, the confidence interval is wide enough that we also cannot rule out a meaningful positive effect."
Don't Cherry-Pick
The Temptation: "Overall results were inconclusive, but mobile users showed a significant improvement!"
Why It's Wrong: Post-hoc segment mining after inconclusive overall results is classic p-hacking.
Better: Report the overall inconclusive result as primary. Mention the segment finding as exploratory, requiring replication.
How to Present Inconclusive Results
The Honest Framework
## Result: Inconclusive
### What We Observed
- Point estimate: +2.3% lift
- 95% CI: -1.5% to +6.1%
- p-value: 0.23
### What This Means
The confidence interval includes:
- Zero (no effect)
- Values up to our MDE of 2% (small positive effect)
- Values above our MDE (meaningful positive effect)
- Negative values (potential harm)
**In plain language**: The data is consistent with the treatment helping, hurting, or doing nothing. We cannot tell which.
### Why This Happened
We achieved 62% of planned sample size due to the feature freeze.
At current sample size, we have ~45% power to detect our MDE.
### Options
**Option A: Extend experiment 3 weeks**
- Pro: Likely conclusive result
- Con: 3-week delay
- Probability of each outcome (estimated):
- Conclusive positive: ~35%
- Conclusive negative: ~15%
- Still inconclusive: ~50%
**Option B: Ship now**
- Pro: No delay, capture possible upside
- Con: ~25% probability effect is actually negative
- Risk: Limited; worst case appears to be small harm
**Option C: Abandon and reallocate**
- Pro: Free up resources immediately
- Con: Miss potential positive effect (35% chance it's real)
### My Recommendation
[Option A/B/C] because [reasoning based on business context]
Visualizing Inconclusive Results
import matplotlib.pyplot as plt
import numpy as np
def visualize_inconclusive(point_est, ci_lower, ci_upper, mde):
"""
Create a clear visualization of an inconclusive result.
"""
fig, ax = plt.subplots(figsize=(10, 4))
# Reference lines
ax.axvline(x=0, color='gray', linestyle='-', linewidth=2, label='No effect')
ax.axvline(x=mde, color='green', linestyle='--', linewidth=2, label=f'MDE (+{mde:.1%})')
ax.axvline(x=-mde, color='red', linestyle='--', linewidth=2, label=f'MDE (-{mde:.1%})')
# Confidence interval
ax.barh(0, ci_upper - ci_lower, left=ci_lower, height=0.3,
color='steelblue', alpha=0.4, label='95% CI')
# Point estimate
ax.scatter(point_est, 0, color='steelblue', s=150, zorder=5,
label=f'Point estimate ({point_est:+.1%})')
# Annotations
ax.annotate('Harm zone', xy=(-mde/2, 0.25), ha='center', fontsize=10, color='red')
ax.annotate('Help zone', xy=(mde*1.5, 0.25), ha='center', fontsize=10, color='green')
ax.annotate('Uncertain zone', xy=(0, 0.25), ha='center', fontsize=10, color='gray')
# Formatting
ax.set_xlim(-0.08, 0.10)
ax.set_ylim(-0.5, 0.5)
ax.set_xlabel('Effect Size')
ax.set_yticks([])
ax.legend(loc='upper right', fontsize=9)
ax.set_title('Inconclusive Result: CI Spans Multiple Outcome Zones', fontsize=12)
plt.tight_layout()
return fig
# Example
# visualize_inconclusive(0.023, -0.015, 0.061, 0.02)
# plt.show()
Decision Rules for Stakeholders
Pre-Specified Decision Framework
Define this before the experiment runs:
## Pre-Specified Decision Rules
### Conclusive Positive
**Condition**: CI lower bound > 0
**Action**: Ship to 100%
### Strong Positive
**Condition**: CI lower bound > MDE
**Action**: Ship with high confidence
### Conclusive Negative
**Condition**: CI upper bound < 0
**Action**: Roll back immediately
### Conclusive Null
**Condition**: CI is [-1%, +1%] (narrow, centered on zero)
**Action**: No meaningful effect; decide based on other factors
### Inconclusive
**Condition**: CI includes zero AND meaningful positive/negative
**Action**: Choose from options:
- Extend if: High stakes decision, time available
- Ship if: Low risk, directionally positive
- Abandon if: Opportunity cost high, signal weak
The Risk-Based Decision Matrix
def inconclusive_decision_matrix(ci_lower, ci_upper, mde,
business_stakes, time_pressure):
"""
Recommend action for inconclusive result based on context.
"""
# Calculate probability estimates (rough)
ci_width = ci_upper - ci_lower
point_est = (ci_lower + ci_upper) / 2
# Probability effect is positive (rough approximation)
prob_positive = max(0, min(1, (ci_upper / ci_width)))
# Probability effect is meaningfully positive (> MDE)
prob_meaningful = max(0, min(1, (ci_upper - mde) / ci_width)) if ci_upper > mde else 0
# Decision logic
if business_stakes == "high" and time_pressure == "low":
recommendation = "EXTEND"
rationale = "High stakes justify waiting for clarity"
elif business_stakes == "low" and prob_positive > 0.6:
recommendation = "SHIP"
rationale = f"Low stakes + {prob_positive:.0%} chance of positive effect"
elif prob_meaningful < 0.2 and point_est < mde / 2:
recommendation = "ABANDON"
rationale = f"Only {prob_meaningful:.0%} chance of meaningful effect"
elif time_pressure == "high" and point_est > 0:
recommendation = "SHIP (with monitoring)"
rationale = "Time pressure + directionally positive"
else:
recommendation = "EXTEND or ABANDON"
rationale = "Judgment call based on opportunity cost"
return {
'recommendation': recommendation,
'rationale': rationale,
'prob_positive': prob_positive,
'prob_meaningful': prob_meaningful
}
# Example
result = inconclusive_decision_matrix(
ci_lower=-0.015,
ci_upper=0.061,
mde=0.02,
business_stakes="medium",
time_pressure="low"
)
print("Inconclusive Result Decision")
print("=" * 40)
print(f"Recommendation: {result['recommendation']}")
print(f"Rationale: {result['rationale']}")
print(f"P(positive): {result['prob_positive']:.0%}")
print(f"P(meaningful): {result['prob_meaningful']:.0%}")
Common Scenarios
Scenario 1: Wide CI, Directionally Positive
Result: +3.1% lift, CI: -2.0% to +8.2%, p = 0.23
Wrong approach: "While not significant, results are promising..."
Right approach:
## Inconclusive: Wide CI Prevents Conclusion
The point estimate (+3.1%) is positive, but the confidence interval
(-2.0% to +8.2%) is too wide to draw conclusions.
**The data is equally consistent with**:
- A meaningful positive effect (+8%)
- No effect (0%)
- A small negative effect (-2%)
**Options**:
1. Extend 2 weeks to narrow the CI
2. Ship accepting the ~20% chance of small negative effect
3. Abandon if opportunity cost is high
Scenario 2: Near-Significant, Below MDE
Result: +1.8% lift, CI: +0.1% to +3.5%, p = 0.04
Wrong approach: "Significant positive effect!"
Right approach:
## Conclusive Positive, But Below MDE
The effect is statistically significant (CI excludes zero), but small.
Our pre-specified MDE was 2%; the point estimate (1.8%) is below this.
**What we know**:
- Effect is positive (CI entirely above zero)
- Effect is likely 0.1% to 3.5%
- Median estimate (1.8%) is below our target
**Decision**: Do we ship a feature with ~1.8% lift?
This is a business question, not a statistics question.
The data says the effect is real but probably small.
Scenario 3: Underpowered, High Variance Metric
Result: +$0.45/user, CI: -$1.20 to +$2.10, p = 0.58
Wrong approach: "Revenue was not significantly affected."
Right approach:
## Inconclusive: Insufficient Precision for Revenue
We could not detect a significant revenue effect. However, the wide
CI (-$1.20 to +$2.10) means we also cannot rule out meaningful
positive or negative effects.
**Why so uncertain?**
Revenue is high-variance. Our sample size provided adequate power
for conversion (detected +2.1% lift) but not for revenue.
**Options**:
1. Rely on conversion result (significant positive) for decision
2. Run longer specifically to measure revenue impact
3. Accept revenue uncertainty; monitor post-launch
Building Trust Through Honest Inconclusive Calls
Why Honesty Pays Off
| Short-Term | Long-Term |
|---|---|
| "Results inconclusive" feels disappointing | "This analyst tells the truth" |
| Stakeholders wanted a clear answer | Stakeholders trust your analysis |
| Pressure to spin positive | No cleanup after over-promising |
| One ambiguous experiment | Reputation for integrity |
The Credibility Flywheel
Tell the truth about inconclusive results
↓
Stakeholders learn you're honest
↓
Future positive results are believed
↓
Your recommendations carry weight
↓
You're asked for input on important decisions
↓
You tell the truth about inconclusive results
↓
(credibility compounds)
R Implementation
# Function to classify and present inconclusive results
present_result <- function(point_est, ci_lower, ci_upper, mde) {
# Classify
if (ci_lower > 0) {
classification <- "CONCLUSIVE POSITIVE"
} else if (ci_upper < 0) {
classification <- "CONCLUSIVE NEGATIVE"
} else if (ci_upper - ci_lower < mde && abs(point_est) < mde/2) {
classification <- "CONCLUSIVE NULL"
} else {
classification <- "INCONCLUSIVE"
}
# Present
cat("Result Classification:", classification, "\n")
cat(paste(rep("=", 50), collapse = ""), "\n\n")
cat("Point estimate:", sprintf("%.1f%%\n", point_est * 100))
cat("95% CI: [", sprintf("%.1f%%", ci_lower * 100), ", ",
sprintf("%.1f%%", ci_upper * 100), "]\n\n", sep = "")
if (classification == "INCONCLUSIVE") {
cat("The confidence interval includes:\n")
if (ci_lower < 0) cat(" - Negative effects (potential harm)\n")
cat(" - Zero (no effect)\n")
if (ci_upper > mde) cat(" - Effects above MDE (meaningful help)\n")
cat("\nThis result cannot distinguish between help, harm, or no effect.\n")
cat("\nOptions:\n")
cat(" 1. Extend experiment for clarity\n")
cat(" 2. Ship accepting uncertainty\n")
cat(" 3. Abandon and reallocate\n")
}
}
# Example
present_result(
point_est = 0.023,
ci_lower = -0.015,
ci_upper = 0.061,
mde = 0.02
)
Related Articles
- Analytics Reporting (Pillar) - Complete reporting guide
- Communicate Uncertainty to Execs - Stakeholder communication
- P-Value vs Confidence Interval - Interpretation guide
- Pre-Registration Lite - Decision rules
Key Takeaway
Inconclusive is a valid, valuable conclusion—not a failure. Call results inconclusive when the confidence interval includes both meaningful positive and meaningful negative effects. Don't spin it as positive ("directionally good!") or negative ("probably doesn't work"). Present it honestly: "We can't rule out no effect, but we also can't rule out a positive effect. Here are our options: extend, ship with risk acknowledgment, or abandon." This builds trust because stakeholders learn you'll tell them the truth rather than what they want to hear. Over time, that credibility is worth more than any single inflated finding.
References
- https://doi.org/10.1177/0956797611417632
- https://www.microsoft.com/en-us/research/publication/top-challenges-from-the-first-practical-online-controlled-experiments-summit/
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis. *Psychological Science*, 22(11), 1359-1366.
- Kohavi, R., Tang, D., & Xu, Y. (2020). *Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing*. Cambridge University Press.
- Greenland, S., et al. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. *European Journal of Epidemiology*, 31(4), 337-350.
Frequently Asked Questions
Won't calling results 'inconclusive' make me look bad?
How do I present inconclusive results without losing momentum?
At what point is 'not significant' actually 'no effect'?
Key Takeaway
Inconclusive is a valid, valuable conclusion—not a failure. Call results inconclusive when the confidence interval includes both meaningful positive and meaningful negative effects. Don't spin it as positive ('directionally good!') or negative ('probably doesn't work'). Present it honestly: 'We can't rule out no effect, but we also can't rule out a positive effect. Here are our options.' This builds trust because stakeholders learn you'll tell them the truth, not what they want to hear. Over time, that credibility is worth more than any single inflated finding.