Sentiment Analysis Inference with a Pre-trained Model

This challenge focuses on implementing machine learning inference in JavaScript using a pre-trained sentiment analysis model. Sentiment analysis is a crucial task in natural language processing, allowing us to determine the emotional tone (positive, negative, or neutral) of a given text. You'll be provided with a simplified model representation and tasked with writing a function to perform inference on new text inputs.

Problem Description

You are given a simplified representation of a sentiment analysis model. This model consists of a vocabulary (a list of words) and a dictionary mapping words to their sentiment scores. The model predicts the sentiment of a new text by calculating a weighted average of the sentiment scores of the words present in the text. Your task is to implement a function predictSentiment that takes a text string as input and returns a sentiment prediction (either "positive", "negative", or "neutral") based on the provided model.

Key Requirements:

Tokenization: Split the input text into individual words (tokens).
Sentiment Scoring: Look up the sentiment score for each word in the vocabulary. If a word is not found in the vocabulary, assign it a score of 0.
Weighted Average: Calculate the weighted average of the sentiment scores of all words in the text.
Sentiment Prediction: Based on the weighted average, predict the sentiment as follows:
- If the average score is greater than 0.2, predict "positive".
- If the average score is less than -0.2, predict "negative".
- Otherwise, predict "neutral".
Case Insensitivity: The text should be converted to lowercase before processing.

Expected Behavior:

The predictSentiment function should accurately predict the sentiment of a given text based on the provided model. It should handle cases where words are not in the vocabulary gracefully (by assigning a score of 0). The function should be case-insensitive.

Edge Cases to Consider:

Empty input text.
Text containing punctuation and special characters (ignore these).
Text containing words not present in the vocabulary.
Text with a mix of positive and negative words.

Examples

Example 1:

Input: "This is a great movie!"
Output: "positive"
Explanation: The text contains "great", which has a positive score. The overall average score will be positive, leading to a "positive" prediction.

Example 2:

Input: "I am feeling very sad today."
Output: "negative"
Explanation: The text contains "sad", which has a negative score. The overall average score will be negative, leading to a "negative" prediction.

Example 3:

Input: "The weather is okay."
Output: "neutral"
Explanation: The text contains "okay", which has a score close to zero. The overall average score will be close to zero, leading to a "neutral" prediction.

Example 4:

Input: ""
Output: "neutral"
Explanation: Empty input should result in a neutral prediction.

Constraints

The input text will be a string.
The vocabulary and sentiment scores will be provided as constants.
The length of the input text can vary.
The function must return one of the following strings: "positive", "negative", or "neutral".
Performance is not a critical concern for this challenge. Focus on correctness and readability.

Notes

You are provided with a simplified model. In a real-world scenario, you would likely use a more sophisticated model and a more robust tokenization process.
Consider using regular expressions to remove punctuation and special characters from the input text.
The provided vocabulary and sentiment scores are designed to be simple and illustrative.
Remember to convert the input text to lowercase before processing.
The sentiment scores are relative; the absolute values are less important than their signs and magnitudes relative to each other.
The threshold values (0.2 and -0.2) for sentiment prediction can be adjusted as needed.

const vocabulary = {
  "great": 0.8,
  "good": 0.7,
  "amazing": 0.9,
  "excellent": 0.85,
  "bad": -0.7,
  "sad": -0.6,
  "terrible": -0.8,
  "awful": -0.75,
  "okay": 0.1,
  "neutral": 0.0,
  "happy": 0.75,
  "angry": -0.65
};

function predictSentiment(text) {
  if (!text) {
    return "neutral";
  }

  const lowercaseText = text.toLowerCase();
  const words = lowercaseText.split(/\s+/).filter(word => word !== ""); // Split by spaces and remove empty strings

  let totalScore = 0;
  let wordCount = 0;

  for (const word of words) {
    const cleanWord = word.replace(/[^a-z]/g, ''); // Remove punctuation
    if (cleanWord) { // Ensure the word is not empty after cleaning
      const score = vocabulary[cleanWord] || 0;
      totalScore += score;
      wordCount++;
    }
  }

  if (wordCount === 0) {
    return "neutral";
  }

  const averageScore = totalScore / wordCount;

  if (averageScore > 0.2) {
    return "positive";
  } else if (averageScore < -0.2) {
    return "negative";
  } else {
    return "neutral";
  }
}