Sentiment Analysis Inference with a Pre-trained Model
This challenge focuses on implementing machine learning inference in JavaScript using a pre-trained sentiment analysis model. Sentiment analysis is a crucial task in natural language processing, allowing us to determine the emotional tone (positive, negative, or neutral) of a given text. You'll be provided with a simplified model representation and tasked with writing a function to perform inference on new text inputs.
Problem Description
You are given a simplified representation of a sentiment analysis model. This model consists of a vocabulary (a list of words) and a dictionary mapping words to their sentiment scores. The model predicts the sentiment of a new text by calculating a weighted average of the sentiment scores of the words present in the text. Your task is to implement a function predictSentiment that takes a text string as input and returns a sentiment prediction (either "positive", "negative", or "neutral") based on the provided model.
Key Requirements:
- Tokenization: Split the input text into individual words (tokens).
- Sentiment Scoring: Look up the sentiment score for each word in the vocabulary. If a word is not found in the vocabulary, assign it a score of 0.
- Weighted Average: Calculate the weighted average of the sentiment scores of all words in the text.
- Sentiment Prediction: Based on the weighted average, predict the sentiment as follows:
- If the average score is greater than 0.2, predict "positive".
- If the average score is less than -0.2, predict "negative".
- Otherwise, predict "neutral".
- Case Insensitivity: The text should be converted to lowercase before processing.
Expected Behavior:
The predictSentiment function should accurately predict the sentiment of a given text based on the provided model. It should handle cases where words are not in the vocabulary gracefully (by assigning a score of 0). The function should be case-insensitive.
Edge Cases to Consider:
- Empty input text.
- Text containing punctuation and special characters (ignore these).
- Text containing words not present in the vocabulary.
- Text with a mix of positive and negative words.
Examples
Example 1:
Input: "This is a great movie!"
Output: "positive"
Explanation: The text contains "great", which has a positive score. The overall average score will be positive, leading to a "positive" prediction.
Example 2:
Input: "I am feeling very sad today."
Output: "negative"
Explanation: The text contains "sad", which has a negative score. The overall average score will be negative, leading to a "negative" prediction.
Example 3:
Input: "The weather is okay."
Output: "neutral"
Explanation: The text contains "okay", which has a score close to zero. The overall average score will be close to zero, leading to a "neutral" prediction.
Example 4:
Input: ""
Output: "neutral"
Explanation: Empty input should result in a neutral prediction.
Constraints
- The input text will be a string.
- The vocabulary and sentiment scores will be provided as constants.
- The length of the input text can vary.
- The function must return one of the following strings: "positive", "negative", or "neutral".
- Performance is not a critical concern for this challenge. Focus on correctness and readability.
Notes
- You are provided with a simplified model. In a real-world scenario, you would likely use a more sophisticated model and a more robust tokenization process.
- Consider using regular expressions to remove punctuation and special characters from the input text.
- The provided vocabulary and sentiment scores are designed to be simple and illustrative.
- Remember to convert the input text to lowercase before processing.
- The sentiment scores are relative; the absolute values are less important than their signs and magnitudes relative to each other.
- The threshold values (0.2 and -0.2) for sentiment prediction can be adjusted as needed.
const vocabulary = {
"great": 0.8,
"good": 0.7,
"amazing": 0.9,
"excellent": 0.85,
"bad": -0.7,
"sad": -0.6,
"terrible": -0.8,
"awful": -0.75,
"okay": 0.1,
"neutral": 0.0,
"happy": 0.75,
"angry": -0.65
};
function predictSentiment(text) {
if (!text) {
return "neutral";
}
const lowercaseText = text.toLowerCase();
const words = lowercaseText.split(/\s+/).filter(word => word !== ""); // Split by spaces and remove empty strings
let totalScore = 0;
let wordCount = 0;
for (const word of words) {
const cleanWord = word.replace(/[^a-z]/g, ''); // Remove punctuation
if (cleanWord) { // Ensure the word is not empty after cleaning
const score = vocabulary[cleanWord] || 0;
totalScore += score;
wordCount++;
}
}
if (wordCount === 0) {
return "neutral";
}
const averageScore = totalScore / wordCount;
if (averageScore > 0.2) {
return "positive";
} else if (averageScore < -0.2) {
return "negative";
} else {
return "neutral";
}
}