Hone logo
Hone
Problems

Automated Backup System

Imagine you are tasked with building a crucial component for a data management system: an automated backup utility. This utility needs to efficiently select files for backup based on a defined strategy, ensuring that only necessary data is copied, saving storage space and time.

Problem Description

You need to implement a Python function select_files_for_backup that takes a list of available files and a backup strategy, and returns a list of files that should be backed up. The backup strategy is defined by a set of rules.

The function should support the following backup strategies:

  1. Full Backup: Back up all files.
  2. Incremental Backup: Back up only files that have been modified since the last backup. For this problem, we'll assume a simplified scenario: a file is considered modified if its last modification timestamp is newer than a given last_backup_timestamp.
  3. Differential Backup: Back up all files that have been modified since the most recent full backup. For this problem, we'll assume a simplified scenario: a file is considered modified if its last modification timestamp is newer than a given last_full_backup_timestamp.

Key Requirements:

  • The function select_files_for_backup should accept three arguments:
    • available_files: A list of dictionaries, where each dictionary represents a file and contains:
      • name: A string, the name of the file.
      • size: An integer, the size of the file in bytes.
      • last_modified: An integer, a Unix timestamp representing the last modification time.
    • strategy: A string, representing the backup strategy ('full', 'incremental', or 'differential').
    • last_backup_timestamp: An integer (Unix timestamp). This is only relevant for 'incremental' backups. If not provided or None, it implies no previous incremental backup has occurred.
    • last_full_backup_timestamp: An integer (Unix timestamp). This is only relevant for 'differential' backups. If not provided or None, it implies no previous full backup has occurred.
  • The function should return a list of file names (strings) that are selected for backup.
  • The order of files in the output list does not matter.

Edge Cases to Consider:

  • An empty available_files list.
  • The strategy string being an unrecognized value (though for this challenge, assume valid inputs).
  • last_backup_timestamp and last_full_backup_timestamp being None for incremental and differential backups respectively.

Examples

Example 1: Full Backup

Input:
available_files = [
    {'name': 'document.txt', 'size': 1024, 'last_modified': 1678886400}, # March 15, 2023 12:00:00 PM UTC
    {'name': 'image.jpg', 'size': 2048, 'last_modified': 1678886500},   # March 15, 2023 12:01:40 PM UTC
    {'name': 'config.ini', 'size': 512, 'last_modified': 1678886600}     # March 15, 2023 12:03:20 PM UTC
]
strategy = 'full'
last_backup_timestamp = None # Not used for full backup
last_full_backup_timestamp = None # Not used for full backup

Output: ['document.txt', 'image.jpg', 'config.ini']
Explanation: A full backup strategy selects all available files.

Example 2: Incremental Backup

Input:
available_files = [
    {'name': 'document.txt', 'size': 1024, 'last_modified': 1678886400}, # March 15, 2023 12:00:00 PM UTC
    {'name': 'image.jpg', 'size': 2048, 'last_modified': 1678972800},   # March 16, 2023 12:00:00 PM UTC
    {'name': 'report.pdf', 'size': 3072, 'last_modified': 1679059200}    # March 17, 2023 12:00:00 PM UTC
]
strategy = 'incremental'
last_backup_timestamp = 1678972799 # March 16, 2023 11:59:59 AM UTC (just before image.jpg was modified)
last_full_backup_timestamp = None # Not used for incremental backup

Output: ['image.jpg', 'report.pdf']
Explanation: 'document.txt' was last modified before the 'last_backup_timestamp'. 'image.jpg' and 'report.pdf' were modified after it, so they are selected for backup.

Example 3: Differential Backup

Input:
available_files = [
    {'name': 'script.py', 'size': 768, 'last_modified': 1678886400},    # March 15, 2023 12:00:00 PM UTC
    {'name': 'data.csv', 'size': 4096, 'last_modified': 1679059200},     # March 17, 2023 12:00:00 PM UTC
    {'name': 'archive.zip', 'size': 8192, 'last_modified': 1679145600}   # March 18, 2023 12:00:00 PM UTC
]
strategy = 'differential'
last_backup_timestamp = None # Not used for differential backup
last_full_backup_timestamp = 1678972799 # March 16, 2023 11:59:59 AM UTC (timestamp of the last full backup)

Output: ['data.csv', 'archive.zip']
Explanation: 'script.py' was last modified before the 'last_full_backup_timestamp'. 'data.csv' and 'archive.zip' were modified after it, so they are selected for backup.

Example 4: Edge Case - No Previous Backup

Input:
available_files = [
    {'name': 'notes.md', 'size': 256, 'last_modified': 1678886400}
]
strategy = 'incremental'
last_backup_timestamp = None
last_full_backup_timestamp = None

Output: ['notes.md']
Explanation: When 'last_backup_timestamp' is None for an incremental backup, all files are considered new or modified.

Constraints

  • The available_files list will contain at most 1000 files.
  • File names will be unique strings.
  • File sizes will be non-negative integers.
  • Timestamps will be non-negative integers representing Unix timestamps.
  • The strategy string will be one of 'full', 'incremental', or 'differential'.
  • The solution should aim for a time complexity of O(N), where N is the number of available files, as it involves iterating through the files at most once.

Notes

  • You will need to implement the select_files_for_backup function.
  • For incremental backups, if last_backup_timestamp is None, consider it as if no previous incremental backup has ever been made. In this case, all files should be selected.
  • Similarly, for differential backups, if last_full_backup_timestamp is None, consider it as if no previous full backup has ever been made. All files should be selected.
  • The definition of "modified" for incremental and differential backups is solely based on comparing the file's last_modified timestamp with the provided timestamp parameters.
  • Consider how to handle the last_backup_timestamp and last_full_backup_timestamp arguments when they are not relevant to the chosen strategy.
Loading editor...
python