A/B Testing Framework in Python
A/B testing is a crucial technique for optimizing websites, applications, and marketing campaigns. This challenge asks you to build a simplified A/B testing framework in Python that can track conversions for two different versions (A and B) and determine which version performs better based on statistical significance. This framework will help you simulate the core logic of A/B testing and understand the principles behind it.
Problem Description
You are tasked with creating a Python class called ABTest. This class should allow you to record conversions for two versions (A and B) and then calculate a simple statistical significance score to determine which version is performing better.
Key Requirements:
- Initialization: The
ABTestclass should be initialized with a significance level (alpha). This represents the probability of incorrectly rejecting the null hypothesis (i.e., concluding there's a difference when there isn't). A common value for alpha is 0.05. - Record Conversion: The class should have a method called
record_conversion(version, converted)whereversionis either "A" or "B" andconvertedis a boolean indicating whether the user converted (True) or not (False). - Calculate Significance: The class should have a method called
calculate_significance()that calculates the z-statistic and p-value to determine the statistical significance between the two versions. The z-statistic is calculated as:(p1 - p2) / sqrt(p_hat * (1 - p_hat) * (1/n1 + 1/n2))where:p1is the conversion rate for version A.p2is the conversion rate for version B.n1is the number of users exposed to version A.n2is the number of users exposed to version B.p_hatis the overall conversion rate ( (p1 * n1 + p2 * n2) / (n1 + n2) ).
- Determine Winner: The
calculate_significance()method should return a tuple:(winner, p_value).winnershould be "A", "B", or "No Winner" depending on whether version A, version B, or neither has a statistically significant advantage (p-value < alpha).p_valueis the calculated p-value.
Expected Behavior:
The ABTest class should accurately track conversions and calculate the statistical significance between the two versions. The calculate_significance() method should return the correct winner and p-value based on the data collected and the specified significance level.
Edge Cases to Consider:
- Zero Conversions: Handle cases where one or both versions have zero conversions. This can lead to division by zero errors.
- Small Sample Sizes: Statistical significance is harder to achieve with small sample sizes.
- Equal Conversion Rates: If the conversion rates are equal, the p-value should be close to 1, and the winner should be "No Winner".
Examples
Example 1:
Input:
ab_test = ABTest(alpha=0.05)
ab_test.record_conversion("A", True)
ab_test.record_conversion("A", False)
ab_test.record_conversion("B", True)
ab_test.record_conversion("B", False)
ab_test.record_conversion("B", True)
winner, p_value = ab_test.calculate_significance()
Output:
winner = "B"
p_value = 0.07
Explanation: Version B has a slightly higher conversion rate and a p-value below 0.05, indicating a statistically significant advantage.
Example 2:
Input:
ab_test = ABTest(alpha=0.05)
ab_test.record_conversion("A", True)
ab_test.record_conversion("A", False)
ab_test.record_conversion("B", False)
ab_test.record_conversion("B", False)
winner, p_value = ab_test.calculate_significance()
Output:
winner = "No Winner"
p_value = 0.99
Explanation: Version A has a higher conversion rate, but the difference is not statistically significant given the small sample size and low conversion rates.
Example 3: (Edge Case - Zero Conversions)
Input:
ab_test = ABTest(alpha=0.05)
ab_test.record_conversion("A", False)
ab_test.record_conversion("A", False)
ab_test.record_conversion("B", True)
ab_test.record_conversion("B", False)
winner, p_value = ab_test.calculate_significance()
Output:
winner = "B"
p_value = 0.02
Explanation: Version B has a conversion, while version A has none. The p-value will be very low, indicating a significant difference.
Constraints
alphamust be a float between 0.0 and 1.0.versionmust be either "A" or "B".convertedmust be a boolean (True or False).- The number of users exposed to each version will be at least 1.
- The code should be reasonably efficient for a moderate number of conversions (e.g., up to 1000 conversions total).
Notes
- You can use the
scipy.statslibrary for calculating the z-statistic and p-value if desired, but it's not strictly required. Implementing the formula directly is also acceptable. - Focus on the core logic of A/B testing and statistical significance. Error handling and input validation can be kept relatively simple.
- Consider how to handle edge cases like zero conversions gracefully to avoid errors.
- The goal is to demonstrate your understanding of A/B testing principles and your ability to implement them in Python.