Automated Backup System
Imagine you are tasked with building a crucial component for a data management system: an automated backup utility. This utility needs to efficiently select files for backup based on a defined strategy, ensuring that only necessary data is copied, saving storage space and time.
Problem Description
You need to implement a Python function select_files_for_backup that takes a list of available files and a backup strategy, and returns a list of files that should be backed up. The backup strategy is defined by a set of rules.
The function should support the following backup strategies:
- Full Backup: Back up all files.
- Incremental Backup: Back up only files that have been modified since the last backup. For this problem, we'll assume a simplified scenario: a file is considered modified if its last modification timestamp is newer than a given
last_backup_timestamp. - Differential Backup: Back up all files that have been modified since the most recent full backup. For this problem, we'll assume a simplified scenario: a file is considered modified if its last modification timestamp is newer than a given
last_full_backup_timestamp.
Key Requirements:
- The function
select_files_for_backupshould accept three arguments:available_files: A list of dictionaries, where each dictionary represents a file and contains:name: A string, the name of the file.size: An integer, the size of the file in bytes.last_modified: An integer, a Unix timestamp representing the last modification time.
strategy: A string, representing the backup strategy ('full', 'incremental', or 'differential').last_backup_timestamp: An integer (Unix timestamp). This is only relevant for 'incremental' backups. If not provided orNone, it implies no previous incremental backup has occurred.last_full_backup_timestamp: An integer (Unix timestamp). This is only relevant for 'differential' backups. If not provided orNone, it implies no previous full backup has occurred.
- The function should return a list of file names (strings) that are selected for backup.
- The order of files in the output list does not matter.
Edge Cases to Consider:
- An empty
available_fileslist. - The
strategystring being an unrecognized value (though for this challenge, assume valid inputs). last_backup_timestampandlast_full_backup_timestampbeingNonefor incremental and differential backups respectively.
Examples
Example 1: Full Backup
Input:
available_files = [
{'name': 'document.txt', 'size': 1024, 'last_modified': 1678886400}, # March 15, 2023 12:00:00 PM UTC
{'name': 'image.jpg', 'size': 2048, 'last_modified': 1678886500}, # March 15, 2023 12:01:40 PM UTC
{'name': 'config.ini', 'size': 512, 'last_modified': 1678886600} # March 15, 2023 12:03:20 PM UTC
]
strategy = 'full'
last_backup_timestamp = None # Not used for full backup
last_full_backup_timestamp = None # Not used for full backup
Output: ['document.txt', 'image.jpg', 'config.ini']
Explanation: A full backup strategy selects all available files.
Example 2: Incremental Backup
Input:
available_files = [
{'name': 'document.txt', 'size': 1024, 'last_modified': 1678886400}, # March 15, 2023 12:00:00 PM UTC
{'name': 'image.jpg', 'size': 2048, 'last_modified': 1678972800}, # March 16, 2023 12:00:00 PM UTC
{'name': 'report.pdf', 'size': 3072, 'last_modified': 1679059200} # March 17, 2023 12:00:00 PM UTC
]
strategy = 'incremental'
last_backup_timestamp = 1678972799 # March 16, 2023 11:59:59 AM UTC (just before image.jpg was modified)
last_full_backup_timestamp = None # Not used for incremental backup
Output: ['image.jpg', 'report.pdf']
Explanation: 'document.txt' was last modified before the 'last_backup_timestamp'. 'image.jpg' and 'report.pdf' were modified after it, so they are selected for backup.
Example 3: Differential Backup
Input:
available_files = [
{'name': 'script.py', 'size': 768, 'last_modified': 1678886400}, # March 15, 2023 12:00:00 PM UTC
{'name': 'data.csv', 'size': 4096, 'last_modified': 1679059200}, # March 17, 2023 12:00:00 PM UTC
{'name': 'archive.zip', 'size': 8192, 'last_modified': 1679145600} # March 18, 2023 12:00:00 PM UTC
]
strategy = 'differential'
last_backup_timestamp = None # Not used for differential backup
last_full_backup_timestamp = 1678972799 # March 16, 2023 11:59:59 AM UTC (timestamp of the last full backup)
Output: ['data.csv', 'archive.zip']
Explanation: 'script.py' was last modified before the 'last_full_backup_timestamp'. 'data.csv' and 'archive.zip' were modified after it, so they are selected for backup.
Example 4: Edge Case - No Previous Backup
Input:
available_files = [
{'name': 'notes.md', 'size': 256, 'last_modified': 1678886400}
]
strategy = 'incremental'
last_backup_timestamp = None
last_full_backup_timestamp = None
Output: ['notes.md']
Explanation: When 'last_backup_timestamp' is None for an incremental backup, all files are considered new or modified.
Constraints
- The
available_fileslist will contain at most 1000 files. - File names will be unique strings.
- File sizes will be non-negative integers.
- Timestamps will be non-negative integers representing Unix timestamps.
- The
strategystring will be one of 'full', 'incremental', or 'differential'. - The solution should aim for a time complexity of O(N), where N is the number of available files, as it involves iterating through the files at most once.
Notes
- You will need to implement the
select_files_for_backupfunction. - For incremental backups, if
last_backup_timestampisNone, consider it as if no previous incremental backup has ever been made. In this case, all files should be selected. - Similarly, for differential backups, if
last_full_backup_timestampisNone, consider it as if no previous full backup has ever been made. All files should be selected. - The definition of "modified" for incremental and differential backups is solely based on comparing the file's
last_modifiedtimestamp with the provided timestamp parameters. - Consider how to handle the
last_backup_timestampandlast_full_backup_timestamparguments when they are not relevant to the chosen strategy.