Hone logo
Hone
Problems

Python Statistics Collector

This challenge asks you to build a flexible statistics collector in Python. The collector should be able to accept a stream of numerical data and calculate various statistical measures like mean, median, standard deviation, minimum, and maximum. This is a common task in data analysis and provides a good exercise in working with numerical data and implementing statistical algorithms.

Problem Description

You are tasked with creating a StatisticsCollector class in Python. This class should be initialized with an empty list of data points. It should provide the following methods:

  • add(value): Adds a single numerical value to the internal data store. The value should be a number (int or float).
  • mean(): Calculates and returns the arithmetic mean (average) of all data points added so far. Returns None if no data points have been added.
  • median(): Calculates and returns the median of all data points added so far. Returns None if no data points have been added.
  • std_dev(): Calculates and returns the sample standard deviation of all data points added so far. Returns None if fewer than two data points have been added (standard deviation requires at least two values).
  • min(): Returns the minimum value among all data points added so far. Returns None if no data points have been added.
  • max(): Returns the maximum value among all data points added so far. Returns None if no data points have been added.
  • data(): Returns a copy of the internal list of data points.

The class should handle potential errors gracefully, such as attempting to calculate statistics on an empty dataset or adding non-numerical values.

Examples

Example 1:

Input:
collector = StatisticsCollector()
collector.add(10)
collector.add(20)
collector.add(30)

Output:
mean() -> 20.0
median() -> 20.0
std_dev() -> 10.0
min() -> 10
max() -> 30
data() -> [10, 20, 30]

Explanation: The collector is initialized, three values are added. The mean is (10+20+30)/3 = 20. The median is 20. The standard deviation is calculated using the sample standard deviation formula.

Example 2:

Input:
collector = StatisticsCollector()
collector.add(5)
collector.add(5)
collector.add(5)

Output:
mean() -> 5.0
median() -> 5.0
std_dev() -> 0.0
min() -> 5
max() -> 5
data() -> [5, 5, 5]

Explanation: All values are the same, so the standard deviation is 0.

Example 3: (Edge Case)

Input:
collector = StatisticsCollector()
collector.add(1)
collector.add(2)
collector.add(3)
collector.add(4)
collector.add(5)
collector.add(6)
collector.add(7)
collector.add(8)
collector.add(9)
collector.add(10)

Output:
mean() -> 5.5
median() -> 5.5
std_dev() -> 3.0276503540974917
min() -> 1
max() -> 10
data() -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Explanation: Demonstrates calculation with a larger dataset.

Constraints

  • The add() method should raise a TypeError if a non-numerical value (not int or float) is passed.
  • All statistical methods (mean(), median(), std_dev(), min(), max()) should return None if the collector has no data points.
  • The std_dev() method should return None if the collector has fewer than two data points.
  • The data() method should return a copy of the internal data list, not the original list itself. This prevents external modification of the collector's internal state.
  • The standard deviation should be calculated using the sample standard deviation formula (dividing by n-1).

Notes

  • Consider using Python's built-in functions like sum(), sorted(), and statistics module (but implement the core logic yourself, don't just rely on the module for everything).
  • Think about how to efficiently calculate the median without sorting the entire dataset every time.
  • Pay close attention to edge cases, such as empty datasets and datasets with only one element.
  • Write clear, concise, and well-documented code. Good variable names are important.
  • Test your code thoroughly with various inputs, including edge cases.
Loading editor...
python