Hone logo
Hone
Problems

Python Data Archiver

Data archiving is a crucial process for managing large datasets, reducing storage costs, and improving system performance. This challenge requires you to build a Python system that can efficiently archive data from a source to a compressed archive file.

Problem Description

Your task is to implement a Python function, archive_data, that takes a list of file paths and an output archive file path as input. The function should read the content of each specified file, compress it, and store it within a single archive file. The archive format should be a common compression format, and the archiving process should handle potential errors gracefully.

Key Requirements:

  • Input: The function should accept two arguments:
    • file_paths: A list of strings, where each string is a valid path to a file that needs to be archived.
    • archive_path: A string representing the desired path for the output archive file.
  • Archiving Mechanism: You should use a standard Python library for creating compressed archives. Common choices include zipfile (for ZIP archives) or tarfile (for TAR and TAR.GZ archives).
  • File Inclusion: Each file specified in file_paths should be added to the archive. The name of the file within the archive should be its base name (e.g., if archiving /home/user/data/report.txt, it should be stored as report.txt inside the archive).
  • Error Handling:
    • If any of the input file_paths do not exist or are inaccessible, the function should log a warning for that specific file and continue archiving the rest.
    • If there are issues writing to the archive_path (e.g., permissions), the function should raise an appropriate exception.
  • Output: The function should return True if the archiving process completes successfully (even with warnings for individual files), and False if a critical error occurs during the archiving process itself (e.g., inability to create the archive file).

Expected Behavior:

When archive_data is called, it will create a compressed archive at archive_path. Each file from file_paths that exists and is readable will be added to this archive. Any non-existent or unreadable files will result in a logged warning, but the archiving of other files will proceed.

Edge Cases:

  • Empty file_paths list: The function should create an empty archive file.
  • Files with special characters in their names.
  • Large file sizes (consider memory usage if reading entire files at once, though most archiving libraries handle this efficiently).
  • Overwriting an existing archive file: The function should overwrite any existing archive at archive_path.

Examples

Example 1:

# Assume these files exist:
# /tmp/file1.txt containing "Hello"
# /tmp/file2.log containing "World"

file_list = ["/tmp/file1.txt", "/tmp/file2.log"]
archive_file = "/tmp/my_archive.zip"

# Call the function
# archive_data(file_list, archive_file)

# Expected state after execution:
# A file named /tmp/my_archive.zip exists.
# When unzipped, it contains:
# - file1.txt with content "Hello"
# - file2.log with content "World"

Explanation: The function will read file1.txt and file2.log, compress them, and store them in my_archive.zip.

Example 2:

# Assume these files exist:
# /tmp/config.ini containing "[settings]\nkey=value"

file_list = ["/tmp/config.ini", "/tmp/non_existent_file.txt"]
archive_file = "/tmp/config_archive.tar.gz"

# Call the function
# archive_data(file_list, archive_file)

# Expected state after execution:
# A file named /tmp/config_archive.tar.gz exists.
# When extracted, it contains:
# - config.ini with content "[settings]\nkey=value"
# A warning will be logged indicating that "/tmp/non_existent_file.txt" could not be found.

Explanation: The function archives config.ini. It logs a warning for non_existent_file.txt but continues to create the archive.

Example 3: (Edge Case: Empty input)

file_list = []
archive_file = "/tmp/empty_archive.zip"

# Call the function
# archive_data(file_list, archive_file)

# Expected state after execution:
# A file named /tmp/empty_archive.zip exists.
# When unzipped, it contains no files.

Explanation: An empty archive is created when no files are provided for archiving.

Constraints

  • The file_paths list will contain between 0 and 100 strings.
  • Each string in file_paths will be a valid file path on the system.
  • The archive_path will be a valid path where the script has write permissions.
  • The total size of all files to be archived will not exceed 1 GB.
  • The function should complete the archiving of up to 100 files within 60 seconds.

Notes

  • You are encouraged to use the zipfile module for creating ZIP archives. This module is part of Python's standard library.
  • Consider using the logging module to report warnings for inaccessible files.
  • Think about how you will open and read files efficiently.
  • The base name of a file (e.g., "report.txt" from "/path/to/report.txt") is what should be stored inside the archive. You can use os.path.basename() for this.
  • Ensure your function handles FileNotFoundError and PermissionError gracefully.
Loading editor...
python