Reporting

How to Write a Methods Section for Internal Docs That's Actually Copy-Ready

Templates and examples for writing clear, reproducible methods sections. Document your analysis so future you (and your colleagues) can understand and replicate it.

Jan 2610 min readstatstest_flow Reporting Supporting

How to Write a Methods Section for Internal Docs That's Actually Copy-Ready

Quick Hits

•Write methods so someone can reproduce your analysis without asking you questions
•Include: data source, filters, metric definitions, statistical test, sample size
•Specify software versions and link to code when possible
•Document deviations from the pre-analysis plan and why
•Future you will thank present you—write it when it's fresh

TL;DR

A methods section should enable reproduction without clarifying questions. Include: data source and extraction date, inclusion/exclusion criteria, metric definitions, statistical test and parameters, sample sizes, and deviations from plan. Write it while details are fresh. Future you will forget which users you excluded and why—document it now.

Why Methods Documentation Matters

The Scenario You Want to Avoid

6 months later...

Stakeholder: "Can you re-run last year's checkout experiment analysis?"

You: *Opens old doc*

Old doc: "We analyzed conversion rate and found a 3.2% lift (p < 0.05)"

You: "Which users did I include? What date range? Did I filter bots?
      What did I count as a conversion? Why is the sample size
      different from what I remember?"

*Spends 2 days reverse-engineering own analysis*

What Good Documentation Provides

Without Documentation	With Documentation
"We found a 3.2% lift"	Full reproducibility
Reverse-engineering required	Open doc, run query
Inconsistent re-analyses	Identical reproductions
Trust erosion over time	Auditable decisions

The Methods Section Template

Complete Template

## Methods

### Data Source
- **Database/Table**: [e.g., analytics.experiment_assignments]
- **Extraction Date**: [e.g., 2026-01-28]
- **Date Range Analyzed**: [e.g., Jan 15-28, 2026]
- **Query/Code Reference**: [link to SQL/notebook]

### Sample Definition
- **Unit of Analysis**: [e.g., unique users, sessions, transactions]
- **Population**: [e.g., all users visiting the homepage]
- **Inclusion Criteria**:
  - [Criterion 1]
  - [Criterion 2]
- **Exclusion Criteria**:
  - [Criterion 1 with rationale]
  - [Criterion 2 with rationale]
- **Final Sample Size**: [n per group]

### Metric Definitions
- **Primary Metric**: [name]
  - **Numerator**: [exact definition]
  - **Denominator**: [exact definition]
  - **Formula**: [explicit formula]

### Statistical Analysis
- **Test**: [e.g., two-sample proportion z-test]
- **Significance Level**: [e.g., α = 0.05, two-tailed]
- **Software**: [e.g., Python 3.9, scipy 1.7.3]
- **Assumptions Checked**: [list]

### Deviations from Pre-Analysis Plan
- [Any changes, with rationale]

### Limitations
- [Known issues or caveats]

Section-by-Section Guidance

Data Source

Bad Example:

Data was pulled from our analytics database.

Good Example:

### Data Source
- **Database**: analytics_prod.experiment_exposures
- **Extraction Date**: 2026-01-28 09:15 UTC
- **Date Range**: Users exposed Jan 15-28, 2026
- **Query**: experiments/checkout_v2/analysis.sql (commit abc123)
- **Data Freshness**: Production tables, 24-hour lag

Sample Definition

Unit of analysis: User (deduplicated by user_id)

Population: All logged-in users who visited the checkout page during the experiment period

Inclusion criteria:

Exposed to experiment (has assignment record in experiment_exposures)
Visited checkout page at least once during exposure window
Has valid user_id (not null, not test account)

Exclusion criteria:

Bot traffic: user_agent matches known bot patterns (see bot_filter.py). Rationale: bots don't convert, inflate denominator artificially. Excluded: 2,341 users (1.8%)
Internal employees: email domain = @company.com. Rationale: different behavior than real users. Excluded: 156 users (0.1%)
Users with >100 sessions/day: anomalous behavior. Rationale: likely automated or testing accounts. Excluded: 23 users (<0.1%)

Final sample:

Control: 64,521 users
Treatment: 64,892 users
Total exclusions: 2,520 (1.9%)

Metric Definitions

Critical: Be obsessively specific.

### Primary Metric: Purchase Conversion Rate

**Definition**: The proportion of exposed users who completed at least
one purchase during their exposure window.

**Numerator**: Count of unique users with at least one transaction
where transaction_type = 'purchase' AND transaction_timestamp >
first_exposure_timestamp AND transaction_timestamp <
first_exposure_timestamp + 7 days

**Denominator**: Count of unique users in the experiment sample

**Formula**:

conversion_rate = (users_with_purchase / total_users) × 100


**Notes**:
- Attribution window: 7 days from first exposure
- A user with multiple purchases counts once in the numerator
- Refunded transactions are INCLUDED (purchase intent captured)

Statistical Analysis

### Statistical Analysis

**Primary Analysis**:
- Test: Two-sample proportion z-test (two-tailed)
- Null hypothesis: π_treatment = π_control
- Significance level: α = 0.05
- Minimum detectable effect: 2% relative lift (pre-specified)

**Confidence Interval**:
- Method: Wald interval for difference in proportions
- Formula: (p̂_t - p̂_c) ± z_{α/2} × SE
- SE = sqrt(p̂_t(1-p̂_t)/n_t + p̂_c(1-p̂_c)/n_c)

**Software**:
- Python 3.9.7
- scipy 1.7.3 (scipy.stats.proportions_ztest)
- Analysis notebook: analysis/checkout_v2_final.ipynb

**Assumptions Checked**:
- Sample size > 30 per group: ✓ (n > 64,000)
- Expected successes/failures > 5: ✓
- Independence: Users randomized individually ✓
- SRM check: p = 0.34 (no ratio mismatch)

Common Scenarios

Scenario 1: A/B Test with Continuous Metric

### Methods: Revenue Per User Analysis

**Data Source**:
- Table: analytics.transactions joined with analytics.experiment_exposures
- Date extracted: 2026-01-29
- Experiment period: 2026-01-15 to 2026-01-28

**Sample**:
- All users assigned to checkout_flow_v2 experiment
- Exclusions: Bots (2.1%), employees (0.1%), fraud flags (0.3%)
- Final: Control n=62,445, Treatment n=62,891

**Metric**: Revenue Per User (RPU)
- Sum of transaction_amount for each user during attribution window
- Attribution window: 14 days from assignment
- Users with no purchases contribute $0 to numerator
- Outlier handling: Winsorized at 99th percentile ($847)
  - Rationale: 0.8% of users with orders >$847 were capping effect
  - Pre-specified in analysis plan

**Analysis**:
- Test: Welch's t-test (unequal variances assumed)
- Confidence interval: Bootstrap BCa, 10,000 iterations
- α = 0.05, two-tailed
- Software: Python 3.9, scipy 1.7.3, statsmodels 0.13.2

Scenario 2: Model Comparison

### Methods: Classification Model Comparison

**Evaluation Dataset**:
- Source: eval_set_v3, labeled by 3 raters
- Date created: 2026-01-20
- n = 2,500 examples (500 positive, 2,000 negative)
- Class balance: 20% positive (matches production distribution)

**Models Compared**:
- Baseline: logistic_v2 (production model, deployed 2025-11-01)
- Candidate: transformer_v1 (trained 2026-01-15)

**Primary Metric**: Area Under ROC Curve (AUC)
- Computed using sklearn.metrics.roc_auc_score
- Averaged across 3 labeler judgments (majority vote for ties)

**Statistical Test**:
- DeLong's test for comparing correlated AUCs
- Implementation: pROC package via rpy2
- α = 0.05, two-tailed

**Secondary Metrics**:
- Precision at 90% recall
- F1 score at optimal threshold
- Note: Secondary metrics exploratory, not corrected for multiple comparisons

**Confidence Intervals**:
- Bootstrap, 2,000 iterations, BCa method
- Paired bootstrap (same examples in each resample)

Scenario 3: Survey Analysis

### Methods: User Satisfaction Survey

**Data Collection**:
- Survey deployed: 2026-01-20 to 2026-01-25
- Platform: SurveyMonkey, randomized question order
- Population: Users who completed a purchase in past 30 days
- Sampling: Random sample of 10,000 eligible users invited

**Response**:
- Invited: 10,000
- Responded: 1,847 (18.5% response rate)
- Complete responses: 1,702 (17.0% completion rate)
- Excluded: 145 partial responses (missing primary question)

**Measure**: Net Promoter Score (NPS)
- Question: "How likely are you to recommend [Product] to a friend?"
- Scale: 0-10
- NPS = % Promoters (9-10) - % Detractors (0-6)

**Analysis**:
- NPS comparison: Bootstrap confidence interval (10,000 iterations)
- Demographic comparisons: Chi-square tests with Holm correction
- α = 0.05 for primary analysis

**Limitations**:
- Self-selection bias: Respondents may differ from non-respondents
- Recency bias: Survey sent within 7 days of purchase
- No control group: Pre-post comparison only

Deviations Documentation

When You Change the Plan

### Deviations from Pre-Analysis Plan

**Deviation 1: Extended experiment duration**
- Original plan: 14 days
- Actual: 16 days
- Reason: Holiday weekend (Jan 18-19) caused traffic spike; extended to
  ensure representative sample
- Impact: Sample size increased by ~15%; should improve precision

**Deviation 2: Added post-hoc segment analysis**
- Original plan: Overall analysis only
- Added: Mobile vs Desktop comparison
- Reason: Stakeholder request after seeing flat overall result
- Status: Labeled as EXPLORATORY in findings
- Note: No significant interaction detected (p = 0.23)

**Deviation 3: Changed outlier threshold**
- Original plan: Winsorize at 99th percentile
- Actual: Winsorize at 97th percentile
- Reason: 99th percentile was $2,847 due to one enterprise order
- Impact: Sensitivity analysis shows results consistent at both thresholds

Code Documentation Standards

Linking Code to Methods

### Code References

**Main Analysis**:
- Repository: github.com/company/experiments
- Path: checkout_v2/analysis/main_analysis.py
- Commit: abc123def (tagged: checkout-v2-final)

**Key Functions**:
- `compute_conversion_rate()`: Primary metric calculation
- `run_proportion_test()`: Statistical test (lines 145-180)
- `bootstrap_ci()`: Confidence interval computation

**Data Pipeline**:
- Query: checkout_v2/sql/extract_experiment_data.sql
- Pre-processing: checkout_v2/scripts/clean_data.py

**To Reproduce**:
```bash
git checkout checkout-v2-final
python -m checkout_v2.analysis.main_analysis \
  --config config/checkout_v2.yaml \
  --output results/


### Version Pinning

```markdown
### Software Environment

**Python Version**: 3.9.7

**Key Package Versions**:
- pandas==1.4.2
- scipy==1.7.3
- statsmodels==0.13.2
- numpy==1.22.3

**Full Environment**:
- requirements.txt: checkout_v2/requirements.txt
- Frozen: checkout_v2/requirements-frozen.txt (pip freeze output)

Template Generator

def generate_methods_template(analysis_type='ab_test'):
    """
    Generate a methods section template for common analysis types.
    """
    templates = {
        'ab_test': """
## Methods

### Data Source
- Database/Table:
- Extraction Date:
- Date Range:
- Query Reference:

### Sample Definition
- Unit of Analysis:
- Population:
- Inclusion Criteria:
- Exclusion Criteria (with rationale and counts):
- Final Sample Size (per group):

### Metric Definitions
Primary Metric: [Name]
- Numerator:
- Denominator:
- Attribution window:
- Special handling (outliers, zeros):

### Statistical Analysis
- Test:
- Significance Level:
- Confidence Interval Method:
- Assumptions Checked:
- Software and Versions:

### Deviations from Pre-Analysis Plan
- (List any deviations with rationale)

### Limitations
- (Known issues or caveats)
""",
        'model_eval': """
## Methods

### Evaluation Dataset
- Source:
- Date Created:
- n Examples:
- Class Distribution:
- Label Source (human/automated):

### Models Compared
- Baseline:
- Candidate:

### Primary Metric
- Metric Name:
- Computation Method:
- Handling of Edge Cases:

### Statistical Test
- Test Name:
- Null Hypothesis:
- Significance Level:
- Implementation:

### Confidence Intervals
- Method:
- Number of Iterations:
- Type (paired/unpaired):

### Secondary Metrics (Exploratory)
- (List with note on multiple comparison status)

### Limitations
- (Known issues)
"""
    }

    print(f"Methods Template for {analysis_type.upper()}")
    print("=" * 50)
    print(templates.get(analysis_type, templates['ab_test']))


generate_methods_template('ab_test')

R Implementation

# Function to generate methods documentation
generate_methods_doc <- function(
  data_source,
  date_extracted,
  sample_size_control,
  sample_size_treatment,
  metric_name,
  test_type,
  alpha = 0.05,
  deviations = NULL
) {
  cat("## Methods\n\n")

  cat("### Data Source\n")
  cat("- Source:", data_source, "\n")
  cat("- Date extracted:", date_extracted, "\n\n")

  cat("### Sample\n")
  cat("- Control:", sample_size_control, "\n")
  cat("- Treatment:", sample_size_treatment, "\n\n")

  cat("### Primary Metric\n")
  cat("- Metric:", metric_name, "\n\n")

  cat("### Statistical Analysis\n")
  cat("- Test:", test_type, "\n")
  cat("- Significance level: α =", alpha, "\n")
  cat("- Software: R", R.version.string, "\n\n")

  if (!is.null(deviations)) {
    cat("### Deviations from Plan\n")
    for (d in deviations) {
      cat("-", d, "\n")
    }
  }
}

# Example usage
generate_methods_doc(
  data_source = "analytics.experiment_data",
  date_extracted = "2026-01-28",
  sample_size_control = 64521,
  sample_size_treatment = 64892,
  metric_name = "Purchase conversion rate",
  test_type = "Two-sample proportion z-test",
  deviations = c("Extended duration from 14 to 16 days")
)

Checklist Before Submitting

Methods Section Completeness Checklist:

□ Data source and extraction date specified?
□ Date range of analysis clear?
□ Query/code linked and versioned?
□ All inclusion criteria listed?
□ All exclusion criteria listed with rationale and counts?
□ Final sample sizes stated?
□ Primary metric formula explicit (not just name)?
□ Attribution windows specified?
□ Outlier handling documented?
□ Statistical test named?
□ Assumptions listed and checked?
□ Significance level stated?
□ Software versions specified?
□ Deviations from pre-analysis plan documented?
□ Limitations acknowledged?
□ Could someone reproduce this without asking me questions?

Analytics Reporting (Pillar) - Complete reporting guide
Pre-Registration Lite - Pre-analysis planning
Audit Trails - Documentation practices
Common Analyst Mistakes - What to avoid

Key Takeaway

A good methods section lets someone reproduce your analysis without asking you a single question. Include your data source with extraction date, inclusion/exclusion criteria with counts and rationale, exact metric definitions (not just names), statistical test with parameters, sample sizes, software versions, and any deviations from your pre-analysis plan. Write it while the details are fresh—six months from now, you won't remember why you excluded those 2,341 users or what threshold you used for outliers. Document for your future self, and you'll document well for everyone else too.

References

https://www.equator-network.org/reporting-guidelines/
https://arxiv.org/abs/1903.06372
EQUATOR Network. (2023). Enhancing the QUAlity and Transparency Of health Research. Retrieved from equator-network.org
Nosek, B. A., et al. (2019). Preregistration is hard, and worthwhile. *Trends in Cognitive Sciences*, 23(10), 815-818.
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. *American Psychologist*, 54(8), 594-604.

Frequently Asked Questions

How much detail is too much?

If someone could reproduce your analysis from the methods section alone, it's enough. If they'd need to ask you questions, it's not enough. Err on the side of more detail—you can always create a summary for executives and keep full methods as an appendix.

Should I include code in the methods?

Link to the code repository rather than including full code inline. Include key formulas, parameters, and function names that would help someone find the relevant code. The methods section describes what you did; the code shows exactly how.

What if I changed my approach mid-analysis?

Document it! Include a 'Deviations from Plan' section explaining what changed and why. This transparency builds trust and helps others understand the context. Hiding changes is worse than documenting them.

Key Takeaway

A good methods section lets someone reproduce your analysis without asking you a single question. Include your data source and query date, inclusion/exclusion criteria, exact metric definitions, the statistical test with parameters, sample sizes, and any deviations from your pre-analysis plan. Write it while the details are fresh—six months from now, you won't remember why you excluded those users or what threshold you used for outliers. Document for your future self, and you'll document well for everyone else too.

Send to a friend

Share this with someone who loves clean statistical work.

Facebook X Reddit LinkedIn Email