A/B Testing Framework in Python

A/B testing is a crucial technique for optimizing websites, applications, and marketing campaigns. This challenge asks you to build a simplified A/B testing framework in Python that can track conversions for two different versions (A and B) and determine which version performs better based on statistical significance. This framework will help you simulate the core logic of A/B testing and understand the principles behind it.

Problem Description

You are tasked with creating a Python class called ABTest. This class should allow you to record conversions for two versions (A and B) and then calculate a simple statistical significance score to determine which version is performing better.

Key Requirements:

Initialization: The ABTest class should be initialized with a significance level (alpha). This represents the probability of incorrectly rejecting the null hypothesis (i.e., concluding there's a difference when there isn't). A common value for alpha is 0.05.
Record Conversion: The class should have a method called record_conversion(version, converted) where version is either "A" or "B" and converted is a boolean indicating whether the user converted (True) or not (False).
Calculate Significance: The class should have a method called calculate_significance() that calculates the z-statistic and p-value to determine the statistical significance between the two versions. The z-statistic is calculated as: (p1 - p2) / sqrt(p_hat * (1 - p_hat) * (1/n1 + 1/n2)) where:
- p1 is the conversion rate for version A.
- p2 is the conversion rate for version B.
- n1 is the number of users exposed to version A.
- n2 is the number of users exposed to version B.
- p_hat is the overall conversion rate ( (p1 * n1 + p2 * n2) / (n1 + n2) ).
Determine Winner: The calculate_significance() method should return a tuple: (winner, p_value). winner should be "A", "B", or "No Winner" depending on whether version A, version B, or neither has a statistically significant advantage (p-value < alpha). p_value is the calculated p-value.

Expected Behavior:

The ABTest class should accurately track conversions and calculate the statistical significance between the two versions. The calculate_significance() method should return the correct winner and p-value based on the data collected and the specified significance level.

Edge Cases to Consider:

Zero Conversions: Handle cases where one or both versions have zero conversions. This can lead to division by zero errors.
Small Sample Sizes: Statistical significance is harder to achieve with small sample sizes.
Equal Conversion Rates: If the conversion rates are equal, the p-value should be close to 1, and the winner should be "No Winner".

Examples

Example 1:

Input:
ab_test = ABTest(alpha=0.05)
ab_test.record_conversion("A", True)
ab_test.record_conversion("A", False)
ab_test.record_conversion("B", True)
ab_test.record_conversion("B", False)
ab_test.record_conversion("B", True)
winner, p_value = ab_test.calculate_significance()
Output:
winner = "B"
p_value = 0.07
Explanation: Version B has a slightly higher conversion rate and a p-value below 0.05, indicating a statistically significant advantage.

Example 2:

Input:
ab_test = ABTest(alpha=0.05)
ab_test.record_conversion("A", True)
ab_test.record_conversion("A", False)
ab_test.record_conversion("B", False)
ab_test.record_conversion("B", False)
winner, p_value = ab_test.calculate_significance()
Output:
winner = "No Winner"
p_value = 0.99
Explanation: Version A has a higher conversion rate, but the difference is not statistically significant given the small sample size and low conversion rates.

Example 3: (Edge Case - Zero Conversions)

Input:
ab_test = ABTest(alpha=0.05)
ab_test.record_conversion("A", False)
ab_test.record_conversion("A", False)
ab_test.record_conversion("B", True)
ab_test.record_conversion("B", False)
winner, p_value = ab_test.calculate_significance()
Output:
winner = "B"
p_value = 0.02
Explanation: Version B has a conversion, while version A has none.  The p-value will be very low, indicating a significant difference.

Constraints

alpha must be a float between 0.0 and 1.0.
version must be either "A" or "B".
converted must be a boolean (True or False).
The number of users exposed to each version will be at least 1.
The code should be reasonably efficient for a moderate number of conversions (e.g., up to 1000 conversions total).

Notes

You can use the scipy.stats library for calculating the z-statistic and p-value if desired, but it's not strictly required. Implementing the formula directly is also acceptable.
Focus on the core logic of A/B testing and statistical significance. Error handling and input validation can be kept relatively simple.
Consider how to handle edge cases like zero conversions gracefully to avoid errors.
The goal is to demonstrate your understanding of A/B testing principles and your ability to implement them in Python.