Identifying Unique Customers in Sales Transactions
You are tasked with analyzing sales data to understand your customer base. A common challenge is to identify how many unique customers have made purchases, rather than just counting the total number of transactions. This helps in understanding customer reach and avoids overcounting repeat customers.
Problem Description
You are given a table of sales transactions. Each row in this table represents a single sale and includes information such as the transaction_id, the customer_id who made the purchase, and the amount of the sale. Your goal is to write a query that returns a list of all the unique customer_ids present in the sales table. This means that if a customer has made multiple purchases, their customer_id should only appear once in the result.
Key Requirements:
- Retrieve a list of unique customer identifiers.
- Ensure that each
customer_idappears only once in the output.
Expected Behavior:
The output should be a single column containing customer_ids. If the input table contains duplicate customer_ids, only one instance of each customer_id should be returned.
Edge Cases:
- An empty
salestable should result in an empty output.
Examples
Example 1:
Input sales table:
| transaction_id | customer_id | amount |
|---|---|---|
| 101 | C001 | 50.00 |
| 102 | C002 | 75.00 |
| 103 | C001 | 25.00 |
| 104 | C003 | 100.00 |
| 105 | C002 | 30.00 |
Output:
| customer_id |
|---|
| C001 |
| C002 |
| C003 |
Explanation: Customer C001 and C002 appear multiple times in the input table. The output correctly lists each unique customer only once.
Example 2:
Input sales table:
| transaction_id | customer_id | amount |
|---|---|---|
| 201 | A123 | 10.00 |
| 202 | B456 | 20.00 |
| 203 | A123 | 15.00 |
Output:
| customer_id |
|---|
| A123 |
| B456 |
Explanation: Customer A123 made two transactions, but is listed only once in the output.
Example 3:
Input sales table:
| transaction_id | customer_id | amount |
|---|
Output:
| customer_id |
|---|
Explanation: An empty input table results in an empty output.
Constraints
- The
salestable will contain at least one column namedcustomer_id. - The
customer_idcolumn will store string or integer values representing customer identifiers. - The number of rows in the
salestable can range from 0 to 1,000,000. - The query should execute efficiently, aiming for a time complexity that scales well with the number of rows.
Notes
Think about SQL keywords that are specifically designed to eliminate duplicate rows from a result set. This keyword is typically used immediately after the SELECT statement.