Question 1

What is Welch's t-test and when should I use it?

Accepted Answer

Welch's t-test compares the means (averages) of two groups when the groups may have different variances and different sample sizes. Use it in A/B testing when your metric is a continuous number — like revenue per user, average order value, time on page, or session duration. It is more robust than Student's t-test because it does not assume equal variances.

Question 2

What is the difference between Welch's t-test and Student's t-test?

Accepted Answer

Student's t-test assumes that both groups have the same variance (spread). Welch's t-test relaxes this assumption, making it safer to use in practice because A/B test groups often have different variances. Welch's test uses the Welch-Satterthwaite equation to adjust the degrees of freedom, which corrects for unequal variances. This calculator uses Welch's t-test by default.

Question 3

What does the t-statistic mean?

Accepted Answer

The t-statistic measures how many standard errors the difference between the two group means is away from zero. A larger absolute t-value means the groups are more different relative to the variability in the data. The t-statistic is used together with the degrees of freedom to calculate the p-value.

Question 4

Should I use a one-tailed or two-tailed t-test?

Accepted Answer

Use a two-tailed test in most A/B testing scenarios. A two-tailed test checks whether the variant is different from the control in either direction (higher or lower). Only use a one-tailed test if you are specifically testing whether one group is greater than or less than the other, and you do not care about detecting effects in the opposite direction.

Question 5

What is Cohen's d and how do I interpret it?

Accepted Answer

Cohen's d is a standardized measure of the difference between two group means, expressed in units of the pooled standard deviation. It tells you the practical size of the effect, regardless of sample size. Benchmarks: d = 0.2 is a small effect, d = 0.5 is medium, and d = 0.8 is large. A statistically significant result with a very small Cohen's d may not be worth acting on.

Question 6

How many observations do I need for a reliable t-test?

Accepted Answer

As a general rule, each group should have at least 30 observations for the t-test to work well. With fewer than 30 per group, the test becomes sensitive to non-normality in the data. With fewer than 5, results are unreliable. For A/B tests, use a sample size calculator to determine the exact number needed based on the effect size you want to detect.

How Welch's T-Test Works for A/B Testing

The T-Test Formula

Welch-Satterthwaite Degrees of Freedom

Worked Example

Understanding the Results

P-Value

Confidence Intervals

Effect Size (Cohen's d)

Assumptions and Limitations

Frequently Asked Questions