Quantified Fake‑Door: Exact Sample Sizes & Decision Rules for No‑Code Prebuild Tests
Written by AppWispr editorial
Return to blogQUANTIFIED FAKE‑DOOR: EXACT SAMPLE SIZES & DECISION RULES FOR NO‑CODE PREBUILD TESTS
Fake‑door tests (no‑code prebuilds that measure demand before you build) are cheap and seductive — but most teams run them without clear rules. This post gives a compact, evergreen playbook: exact sample‑size formulas and calculators, practical acceptance thresholds, segmentation rules, and UTM / KPI templates you can copy so a signup or click becomes a defensible build (or a clear kill).
Section 1
1) What to measure and the decision you’re making
A fake‑door test reduces a product decision to a binary signal: someone saw the CTA and either converted (clicked, signed up, paid intent) or didn’t. That makes the primary metric a proportion (conversion rate). Design your test so the observed event maps to the future value you care about — e.g., a paid intent click is a stronger signal than an email capture.
Before you run any numbers, write a one‑line decision rule: “If conversion ≥ X% with N users and sustained across our two priority segments, we build MVP A; otherwise we kill it.” That rule forces you to quantify business tradeoffs (cost to build, expected LTV, acceptable risk) and select a meaningful Minimum Detectable Effect (MDE).
- Pick a single binary primary outcome (click, email, payment intent).
- Translate conversion into business value (LTV × expected conversion funnel) before picking thresholds.
- Decide whether you need one‑sided (only care about uplift) or two‑sided testing (care about any difference).
Sources used in this section
Section 2
2) Exact sample‑size formula and a compact calculator you can use
For a binary outcome compare the observed conversion rate to your baseline (or compare two buckets) using the standard two‑proportion z‑test sample‑size formula. For each variant the per‑group sample size n is approximately: n = ((Z_{1−α/2}√(2p(1−p)) + Z_{1−β}√(p1(1−p1)+p2(1−p2)))^2) / (p1−p2)^2. In practice most teams specify baseline p (current conversion), desired MDE (absolute or relative), α (commonly 0.05) and power 1−β (commonly 0.8).
If that formula looks heavy, use any reputable online calculator that implements the two‑proportion test (Evan Miller style calculators, Statsig, or other sample‑size tools). Plug in your baseline and the smallest uplift you’d act on (your MDE) — this gives the minimal per‑variant sample. If your traffic can’t reach that sample in a reasonable time, either increase the MDE you’ll accept or reframe the test (e.g., run a more targeted paid campaign to accelerate traffic).
- Inputs you must set: baseline conversion p, MDE (absolute or relative), α (type I error), power (1−β).
- If you compare treatment vs control directly, use two‑sample proportion formulas; for single‑arm judgment against a target, use one‑sample proportion tests.
- If volume is low, increase MDE or run targeted campaigns to reach required N.
Section 3
3) Practical acceptance thresholds & decision rules (statistical + operational)
Statistical acceptance alone (p < 0.05) is not enough. Use a two‑part rule: (A) a statistical test that meets your α/power requirements, and (B) a business threshold that maps to value. Example rule: “We require at least 80% power to detect our MDE, and observed conversion ≥ target conversion (baseline × multiplier) across both priority segments — otherwise fail.” That prevents small but statistically significant lifts with no business impact from triggering builds.
Add operational constraints: require the effect to persist over a holdout window (e.g., at least one full acquisition cycle or 7–14 days) and check primary segments separately (top acquisition channel and target persona). If the signal is statistically significant overall but driven by one fringe segment, don’t build for the entire market.
- Combine statistical and business thresholds — both must pass.
- Require persistence: effect holds across a pre‑specified time window (e.g., 7–14 days) to avoid early stopping bias.
- Segment checks: require the effect in at least two priority segments (channel, persona, geography).
Section 4
4) Segmentation rules and how to avoid false positives
Segment early and pre‑register. Define 2–4 priority segments before the test (e.g., organic vs paid, power users vs new users, country A vs B). Treat segment checks like mini decision gates: if overall significance passes but fails for both priority segments, downgrade the signal and require a follow‑up test targeted to the winning segment.
Adjust for multiple comparisons: every extra segment or metric you check inflates false positive risk. Use Bonferroni or Benjamini‑Hochberg corrections if you perform many independent tests, or focus on a small, pre‑specified set of segments to keep statistical interpretation simple.
- Predefine 2–4 priority segments and keep exploratory segmentation separate.
- Use corrections (Bonferroni or FDR) when you test multiple independent hypotheses.
- If only one non‑priority segment shows uplift, run a targeted follow‑up before committing to a full build.
Sources used in this section
Section 5
5) UTM & KPI templates you can copy for every fake‑door test
Instrument each fake‑door using a strict UTM scheme and a single canonical KPI so results are unambiguous. Example UTM pattern: utm_source=fakedoor&utm_medium=banner|email|paid&utm_campaign=feature‑name_v1&utm_term=segment. Capture the canonical KPI (e.g., click‑to‑intent rate or paid intent actions) and a secondary KPI for quality (e.g., email open, trial activation).
Report a one‑page results snapshot for each test showing: sample sizes by variant and segment, raw conversions, conversion rates with 95% CIs, p‑value (or posterior probability if using Bayesian), business value mapping (expected customers × LTV), and the final green/amber/red verdict per the pre‑registered decision rule. Store templates and past results in a lightweight tracker (AppWispr users can keep these templates in their analysis folder).
- Canonical UTM: utm_source=fakedoor&utm_medium={channel}&utm_campaign={feature}_v{n}&utm_term={segment}
- KPI snapshot: N per variant, conversions, conversion rate, 95% CI, p‑value, business value estimate, decision verdict.
- Keep an internal changelog (versioned campaign names) so you never conflate tests.
FAQ
Common follow-up questions
How do I pick a reasonable Minimum Detectable Effect (MDE)?
Pick the smallest uplift that meaningfully changes your build decision. Translate uplift into expected additional paying customers (or LTV) and compare that to your build cost. If the incremental revenue from the MDE over reasonable timeframes exceeds the cost to build and operate, it’s a reasonable MDE. Practically, founders often start with 20–50% relative uplift targets for low‑traffic fake‑doors and smaller MDEs only when traffic supports large samples.
What if my traffic is too low to reach the required sample size?
Options: (1) increase the MDE you’ll act on (accept larger effect sizes), (2) run a focused paid acquisition campaign to accelerate traffic to the fake‑door, (3) convert the test to a qualitative prelaunch (interviews or moderated usability), or (4) run a single‑arm test judged against a business target rather than a two‑arm statistical test.
Should I use frequentist p‑values or a Bayesian decision rule?
Both can work. Frequentist tests are widely understood and simple to pre‑register (α, power, two‑sample z test). Bayesian rules let you express decisions as posterior probabilities (e.g., P(uplift > MDE) > 0.9). Pick one framework and pre‑register thresholds so the decision isn’t shifted after peeking at results.
How long should I run a fake‑door test?
Run until you reach the required sample size and at least one acquisition cycle for your product (commonly 7–14 days). Don’t stop early when the effect temporarily looks good. Pre‑computing required N and estimating duration from expected traffic avoids early‑stopping bias.
Sources
Research used in this article
Each generated article keeps its own linked source list so the underlying reporting is visible and easy to verify.
Statsig
A/B Test Sample Size Calculator - Statsig
https://statsig.com/calculator
Wikipedia
Two-proportion Z-test
https://en.wikipedia.org/wiki/Two-proportion_Z-test
SampleSizeCalc
Sample Size Calculator - A/B Testing Tools | SampleSizeCalc
https://www.samplesizecalc.com/calculator
ConversionXL
Introduction to Conversion - ConversionXL guide
https://conversionxl.com/wp-content/uploads/2016/03/crobeginnerguide_final.pdf
CXL
Statistical Power & Sample Size Calculations - CXL Institute lesson
https://cxl.com/institute/wp-content/uploads/2019/08/Lesson3_Statistical-Power-Sample-Size.pdf
Wikipedia
Sample size determination
https://en.wikipedia.org/wiki/Sample_size_determination
Referenced source
All about sample-size calculations for A/B testing: Novel extensions and practical guide (arXiv)
https://arxiv.org/abs/2305.16459
Next step
Turn the idea into a build-ready plan.
AppWispr takes the research and packages it into a product brief, mockups, screenshots, and launch copy you can use right away.