Hone logo
Hone
Problems

Concurrent Web Page Fetcher

Asynchronous I/O is crucial for building efficient and responsive applications, especially when dealing with network operations. This challenge asks you to implement a concurrent web page fetcher in Go using goroutines and channels to download multiple web pages simultaneously, demonstrating your understanding of asynchronous programming and Go's concurrency features.

Problem Description

You are tasked with creating a program that fetches the content of multiple web pages concurrently. The program should take a list of URLs as input and download the content of each page in parallel. The fetched content (as a string) and the URL from which it was fetched should be sent to a channel. The main function should then read from this channel and print the URL and the length of the fetched content. Error handling is essential; if a URL cannot be fetched, the error should be printed to standard error, and the program should continue processing other URLs.

Key Requirements:

  • Concurrency: Utilize goroutines to fetch each URL concurrently.
  • Channels: Employ channels to communicate the fetched content and URLs between the goroutines and the main function.
  • Error Handling: Gracefully handle errors during the fetching process and print them to standard error.
  • Output: Print the URL and the length of the fetched content for each successfully fetched page.
  • Efficiency: The program should be designed to maximize concurrency and minimize overall execution time.

Expected Behavior:

The program should accept a slice of URLs as input. It should then launch a goroutine for each URL to fetch its content. The fetched content and the corresponding URL should be sent to a channel. The main function should read from the channel, print the URL and the length of the content, and handle any errors encountered during fetching. The program should complete when all URLs have been processed.

Edge Cases to Consider:

  • Invalid URLs: Handle cases where the provided URLs are malformed or unreachable.
  • Network Errors: Account for potential network errors such as timeouts, connection refused, and DNS resolution failures.
  • Empty Input: Handle the case where the input slice of URLs is empty.
  • Large Number of URLs: Consider the potential for resource exhaustion if a very large number of URLs are provided.

Examples

Example 1:

Input: ["https://www.example.com", "https://www.google.com"]
Output:
www.example.com: 1291
www.google.com: 1379

Explanation: The program fetches the content of example.com and google.com concurrently. It then prints the URL and the length of the content for each.

Example 2:

Input: ["https://www.example.com", "https://invalid-url.com"]
Output:
www.example.com: 1291
2023/10/27 10:00:00 Error fetching https://invalid-url.com: Get "https://invalid-url.com": dial tcp 'invalid-url.com:443': connect: connection refused

Explanation: The program fetches example.com successfully. It attempts to fetch invalid-url.com, but encounters a connection error. The error is printed to standard error, and the program continues.

Example 3:

Input: []
Output: (No output)

Explanation: The input slice is empty. The program completes without any errors or output.

Constraints

  • The program should be able to handle at least 10 URLs concurrently without significant performance degradation.
  • The program should gracefully handle network errors and continue processing other URLs.
  • The fetched content should be treated as a string.
  • The URL input will be a slice of strings.
  • The program should complete within a reasonable time (e.g., less than 10 seconds) for a list of 20 URLs.

Notes

  • The net/http package is the recommended way to fetch web pages in Go.
  • Consider using a sync.WaitGroup to ensure that all goroutines complete before the program exits.
  • Channels are essential for safely communicating data between goroutines.
  • Error handling is critical for robustness. Use defer to ensure resources are cleaned up properly.
  • Think about how to structure your code to make it modular and easy to understand. Consider creating a separate function for fetching the content of a single URL.
Loading editor...
go