Python Statistics Calculator
This challenge requires you to build a Python function that can calculate various common statistical measures for a given list of numbers. This is a fundamental skill in data analysis, enabling you to understand the central tendency, dispersion, and distribution of data.
Problem Description
Your task is to create a Python function named calculate_statistics that accepts a list of numbers (integers or floats) and returns a dictionary containing the following statistical calculations:
- Mean: The average of the numbers.
- Median: The middle value when the numbers are sorted. If there's an even number of elements, it's the average of the two middle values.
- Mode: The number that appears most frequently in the list. If there are multiple numbers with the same highest frequency, return the smallest of them.
- Range: The difference between the maximum and minimum values.
- Variance: The average of the squared differences from the Mean. (Use population variance formula: $\frac{\sum (x_i - \mu)^2}{N}$)
- Standard Deviation: The square root of the Variance.
Key Requirements:
- The function must accept a single argument:
data(a list of numbers). - The function must return a dictionary where keys are the names of the statistics (e.g., "mean", "median") and values are their calculated results.
- Handle potential edge cases such as empty input lists or lists with a single element.
Expected Behavior:
- For a non-empty list of numbers, the function should return a dictionary with all calculated statistics.
- If the input list is empty, the function should return an empty dictionary.
Edge Cases:
- Empty List: An empty list should result in an empty dictionary.
- Single Element List: For a list with a single element, all statistics (mean, median, mode, range, variance, standard deviation) should be equal to that single element, except for range which should be 0.
Examples
Example 1:
Input: [1, 2, 3, 4, 5]
Output: {'mean': 3.0, 'median': 3.0, 'mode': 1, 'range': 4, 'variance': 2.0, 'standard_deviation': 1.4142135623730951}
Explanation:
- Mean: (1+2+3+4+5)/5 = 3.0
- Median: The middle element of [1, 2, 3, 4, 5] is 3.
- Mode: All numbers appear once. The smallest is 1.
- Range: 5 - 1 = 4
- Variance: ((1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2) / 5 = (4 + 1 + 0 + 1 + 4) / 5 = 10 / 5 = 2.0
- Standard Deviation: sqrt(2.0) ≈ 1.414
Example 2:
Input: [10, 20, 10, 30, 20, 10]
Output: {'mean': 16.666666666666668, 'median': 15.0, 'mode': 10, 'range': 20, 'variance': 35.55555555555556, 'standard_deviation': 5.962847939999171}
Explanation:
- Mean: (10+20+10+30+20+10)/6 = 100/6 ≈ 16.67
- Sorted list: [10, 10, 10, 20, 20, 30]. Median is (10+20)/2 = 15.0
- Mode: 10 appears 3 times, 20 appears 2 times, 30 appears once. 10 is the mode.
- Range: 30 - 10 = 20
- Variance: ... (calculations would be performed)
- Standard Deviation: ... (calculations would be performed)
Example 3: (Edge Case: Empty List)
Input: []
Output: {}
Explanation: An empty input list should result in an empty dictionary.
Example 4: (Edge Case: Single Element List)
Input: [7]
Output: {'mean': 7.0, 'median': 7.0, 'mode': 7, 'range': 0, 'variance': 0.0, 'standard_deviation': 0.0}
Explanation: For a single element, all statistics are that element, except range which is 0.
Constraints
- The input list
datawill contain only numbers (integers or floats). - The number of elements in the
datalist can range from 0 to 1000. - The values of the numbers in the
datalist will be between -1000 and 1000. - Your solution should aim for reasonable efficiency, but explicit time complexity constraints are not imposed beyond what's generally expected for list operations.
Notes
- You are encouraged to implement the calculations yourself rather than relying on external libraries like NumPy or SciPy, unless you want to compare your results. Standard Python libraries like
mathforsqrtare acceptable. - Pay close attention to the definition of mode when there are multiple numbers with the same highest frequency.
- Ensure floating-point precision is handled appropriately for calculations like mean, variance, and standard deviation.