Hone logo
Hone
Problems

SQL Data Replication: Ensuring Data Consistency Across Systems

Data replication is a crucial technique for ensuring high availability, fault tolerance, and improved performance in database systems. This challenge focuses on implementing a basic form of data replication where changes made to a primary table are mirrored to one or more replica tables. This ensures that even if the primary system experiences an issue, a consistent copy of the data is readily available.

Problem Description

Your task is to design and implement a mechanism to replicate data from a SourceTable to one or more ReplicaTables. When a record is inserted, updated, or deleted in the SourceTable, the corresponding operation must be performed on all ReplicaTables to maintain data consistency.

Key Requirements:

  • Insert Replication: When a new record is added to SourceTable, it should be inserted into every ReplicaTable.
  • Update Replication: When an existing record in SourceTable is modified, the corresponding record in every ReplicaTable should be updated with the new values.
  • Delete Replication: When a record is removed from SourceTable, it should also be removed from every ReplicaTable.
  • Schema Consistency: The ReplicaTables must have the same schema (columns and data types) as the SourceTable.
  • Atomicity (Conceptual): Ideally, replication should be an atomic operation, meaning all replica updates succeed or fail together. For this challenge, we'll simulate this by ensuring operations are executed in a transactional manner if your chosen SQL dialect supports it, or by attempting to apply all changes sequentially.

Expected Behavior:

After performing an operation on the SourceTable, querying the ReplicaTables should yield identical results to querying the SourceTable.

Edge Cases to Consider:

  • No changes: If no operation is performed on SourceTable, ReplicaTables should remain unchanged.
  • Multiple operations: How does the replication handle a sequence of inserts, updates, and deletes?
  • Concurrent operations (conceptual): While we won't simulate true concurrency for this challenge, consider how your approach might fare if multiple operations were batched.

Examples

Example 1:

SourceTable (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
+----+----------+-------+

ReplicaTable1 (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
+----+----------+-------+

ReplicaTable2 (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
+----+----------+-------+

Operation: INSERT INTO SourceTable (id, name, value) VALUES (2, 'Beta', 200);

SourceTable (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
| 2  | Beta     | 200   |
+----+----------+-------+

ReplicaTable1 (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
| 2  | Beta     | 200   |
+----+----------+-------+

ReplicaTable2 (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
| 2  | Beta     | 200   |
+----+----------+-------+

Explanation: The new record from SourceTable was successfully inserted into both ReplicaTable1 and ReplicaTable2.

Example 2:

SourceTable (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
| 2  | Beta     | 200   |
+----+----------+-------+

ReplicaTable1 (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 100   |
| 2  | Beta     | 200   |
+----+----------+-------+

Operation: UPDATE SourceTable SET value = 150 WHERE id = 1;

SourceTable (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 150   |
| 2  | Beta     | 200   |
+----+----------+-------+

ReplicaTable1 (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 150   |
| 2  | Beta     | 200   |
+----+----------+-------+

Explanation: The update operation on SourceTable was mirrored to ReplicaTable1, reflecting the change in the 'value' for id 1.

Example 3 (Delete Operation):

SourceTable (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 150   |
| 2  | Beta     | 200   |
+----+----------+-------+

ReplicaTable1 (before operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 150   |
| 2  | Beta     | 200   |
+----+----------+-------+

Operation: DELETE FROM SourceTable WHERE id = 2;

SourceTable (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 150   |
+----+----------+-------+

ReplicaTable1 (after operation):
+----+----------+-------+
| id | name     | value |
+----+----------+-------+
| 1  | Alpha    | 150   |
+----+----------+-------+

Explanation: The deletion of record with id 2 from SourceTable was also performed on ReplicaTable1.

Constraints

  • Table Count: You will be working with one SourceTable and at least two ReplicaTables.
  • Primary Key: Assume the SourceTable has a single-column primary key (e.g., id). This key will be used to identify records for updates and deletes.
  • Data Types: All columns in ReplicaTables will match the data types of their corresponding columns in SourceTable.
  • Implementation: You are expected to use SQL INSERT, UPDATE, and DELETE statements. Triggers, stored procedures, or other database-specific features are encouraged if they simplify the implementation of the replication logic.
  • No Network Latency Simulation: For this challenge, you do not need to simulate network latency or handle asynchronous replication. The operations on replica tables should occur immediately after the operation on the source table.

Notes

  • Consider how to handle the case where a record might not exist in a replica table during an update or delete operation on the source table. Your implementation should gracefully handle this.
  • The most common approaches involve using database triggers or log-based replication. For this challenge, you can explore implementing replication using either triggers or a procedural approach that queries the source table and applies changes to replicas.
  • Think about how to manage multiple replica tables efficiently. You want to avoid redundant code and ensure that the replication logic can easily scale to include more replica tables in the future.
  • Success will be measured by the consistency of data between the SourceTable and all ReplicaTables after a series of operations.
Loading editor...
plaintext