Hone logo
Hone
Problems

Extracting Product Information from an E-commerce Website

Web scraping is a powerful technique for automatically extracting data from websites. This challenge will test your ability to use Python and web scraping libraries to gather product information from a simplified e-commerce website. The goal is to build a script that retrieves product names, prices, and descriptions from a given URL and presents the data in a structured format.

Problem Description

You are tasked with creating a Python script that scrapes product data from a mock e-commerce website. The website has a consistent HTML structure for each product listing. Your script should:

  1. Take a URL as input: The URL points to a page containing a list of products.
  2. Parse the HTML: Use a suitable library (e.g., requests and BeautifulSoup4) to fetch the HTML content of the URL and parse it.
  3. Locate Product Elements: Identify the HTML elements that contain the product name, price, and description. Assume each product listing follows a consistent pattern.
  4. Extract Data: Extract the text content of these elements for each product.
  5. Store Data: Store the extracted data (product name, price, description) in a list of dictionaries. Each dictionary represents a single product.
  6. Return the Data: Return the list of dictionaries.

Expected Behavior:

The script should handle cases where the website is unavailable or the HTML structure is slightly different (e.g., missing description). It should gracefully handle errors and return an empty list if no products are found or if an error occurs during scraping.

Examples

Example 1:

Input: "https://www.example.com/products" (Assume this URL returns a page with two products)
Output:
[
    {
        "name": "Awesome T-Shirt",
        "price": "$25.00",
        "description": "A comfortable and stylish t-shirt."
    },
    {
        "name": "Cool Jeans",
        "price": "$75.00",
        "description": "Classic denim jeans for everyday wear."
    }
]
Explanation: The script successfully scraped the name, price, and description for each product on the page and returned them in a list of dictionaries.

Example 2:

Input: "https://www.example.com/products/error" (Assume this URL returns a 404 error)
Output: []
Explanation: The script handles the error and returns an empty list because it couldn't retrieve the page content.

Example 3:

Input: "https://www.example.com/products/missing_description" (Assume this URL returns a page where one product is missing a description)
Output:
[
    {
        "name": "Product A",
        "price": "$10.00",
        "description": ""
    }
]
Explanation: The script handles the missing description gracefully by assigning an empty string to the description field.

Constraints

  • URL Length: The input URL will be a string with a maximum length of 200 characters.
  • HTML Structure: Assume a consistent HTML structure for product listings, but the script should be robust enough to handle minor variations (e.g., missing description).
  • Error Handling: The script must handle potential errors such as network errors (e.g., website unavailable) and invalid HTML.
  • Library Usage: You are encouraged to use requests for fetching the HTML and BeautifulSoup4 for parsing. Other libraries are acceptable, but these are recommended.
  • Time Limit: The script should complete within 5 seconds.

Notes

  • You will need to install the requests and beautifulsoup4 libraries: pip install requests beautifulsoup4
  • For testing purposes, you can create a simple HTML file locally and use its URL as input.
  • Focus on creating a robust and well-structured script that handles potential errors gracefully.
  • The HTML structure of the mock website is as follows (example):
<!DOCTYPE html>
<html>
<head>
    <title>Products</title>
</head>
<body>
    <div class="product">
        <h2 class="product-name">Awesome T-Shirt</h2>
        <p class="product-price">$25.00</p>
        <p class="product-description">A comfortable and stylish t-shirt.</p>
    </div>
    <div class="product">
        <h2 class="product-name">Cool Jeans</h2>
        <p class="product-price">$75.00</p>
        <p class="product-description">Classic denim jeans for everyday wear.</p>
    </div>
</body>
</html>

(Note: This is just an example. The actual HTML structure you use for testing is up to you.)

Loading editor...
python