Hone logo
Hone
Problems

Implementing Full-Text Search in SQL

Full-text search allows users to search for documents (or rows in a database table) based on the words they contain, rather than exact matches. This is crucial for applications like e-commerce product search, content management systems, and knowledge bases where users need to find relevant information quickly. This challenge asks you to implement a basic full-text search functionality using SQL.

Problem Description

You are tasked with implementing full-text search capabilities on a table containing documents. The table has at least two columns: id (an integer representing a unique document identifier) and content (a text field containing the document's content). You need to write SQL queries that allow users to search for documents containing specific keywords or phrases. The search should return the id of the documents that match the search query.

Key Requirements:

  • Keyword Search: The query should return documents containing any of the specified keywords.
  • Phrase Search: The query should return documents containing the exact specified phrase.
  • Case-Insensitivity: The search should be case-insensitive.
  • Stop Word Handling (Optional): Ideally, the search should ignore common "stop words" (e.g., "the", "a", "is") to improve relevance, but this is not strictly required for a basic implementation.
  • Ranking (Optional): While not required, consider how you might rank results based on relevance (e.g., documents containing more keywords should rank higher).

Expected Behavior:

Given a search query (a string of keywords or a phrase), the SQL query should return a list of id values of documents whose content field contains the query. If no documents match, an empty list should be returned.

Edge Cases to Consider:

  • Empty search query: Should return all documents or an empty list (specify your choice).
  • Search query containing special characters: Handle these characters appropriately (e.g., escape them or remove them).
  • Very long search queries: Consider potential performance implications.
  • Documents with empty content: Should these be included in the search results?

Examples

Example 1:

Input: Table: `documents` (id INT, content TEXT)
documents:
| id | content                                  |
|----|-------------------------------------------|
| 1  | "The quick brown fox jumps over the lazy dog" |
| 2  | "A lazy cat sleeps in the sun"            |
| 3  | "Brown foxes are quick"                    |
Query: "lazy"
Output: [2, 1, 3]
Explanation: Documents 1, 2, and 3 all contain the word "lazy" (case-insensitive). The order doesn't matter.

Example 2:

Input: Table: `documents` (id INT, content TEXT)
documents:
| id | content                                  |
|----|-------------------------------------------|
| 1  | "The quick brown fox jumps over the lazy dog" |
| 2  | "A lazy cat sleeps in the sun"            |
| 3  | "Brown foxes are quick"                    |
Query: "quick brown fox"
Output: [1, 3]
Explanation: Only document 1 and 3 contain the exact phrase "quick brown fox".

Example 3:

Input: Table: `documents` (id INT, content TEXT)
documents:
| id | content                                  |
|----|-------------------------------------------|
| 1  | "The quick brown fox jumps over the lazy dog" |
| 2  | "A lazy cat sleeps in the sun"            |
| 3  | "Brown foxes are quick"                    |
Query: "" (empty string)
Output: [1, 2, 3]  (Assuming all documents are returned for an empty query)
Explanation:  An empty query matches all documents.  Alternatively, an empty list could be returned.

Constraints

  • The content field can contain up to 10000 characters.
  • The search query can contain up to 255 characters.
  • The database system should be able to handle a table with up to 10000 rows.
  • The query should execute within a reasonable time (e.g., less than 1 second) for a table with 10000 rows and a typical search query.
  • The SQL dialect should be compatible with standard SQL (e.g., PostgreSQL, MySQL, SQLite). Specify which dialect you are using.

Notes

  • Consider using SQL's built-in full-text search capabilities if available in your chosen database system (e.g., MATCH AGAINST in MySQL, tsvector and tsquery in PostgreSQL).
  • If your database system doesn't have built-in full-text search, you can use LIKE operator with wildcards (%) to implement a basic search. However, be aware that this approach can be less efficient and may not handle complex search requirements well.
  • Think about how to handle special characters in the search query to prevent SQL injection vulnerabilities.
  • For a more advanced implementation, you could explore techniques like stemming and lemmatization to improve search accuracy.
  • Clearly document the SQL dialect you are using and any assumptions you make about the database schema.
  • Focus on correctness and clarity of the SQL query. Performance optimization is a secondary consideration for this challenge.
Loading editor...
plaintext