819. Most Common Word - Leetcode Solution

Problem Description

The "Most Common Word" problem asks you to find the most frequent word in a given paragraph, excluding any words that appear in a provided list of banned words.

You are given:

A string paragraph that may contain letters, punctuation, and spaces.
A list of strings banned containing words that must be ignored when counting frequencies.

Constraints:

You must ignore case (treat "Word" and "word" as the same).
Punctuation should be ignored (words are only sequences of letters).
There is always exactly one most common word that is not banned.
The answer should not be a banned word, even if it is the most frequent.

Your task is to return the most frequent non-banned word from the paragraph.

Thought Process

The problem is about counting word frequencies, but with a few twists:

We must handle case insensitivity, so "Bob" and "bob" are the same.
Punctuation (like "ball," or "hit!") should not be part of the word.
Some words are banned and must not be counted.

At first, you might think to split the paragraph by spaces and count each word, but that would leave punctuation attached. So, we need a way to extract only the words.

Once we have the words, we can count their occurrences, but we must skip any that are banned. The most efficient way to do this is to use a hash map (dictionary) to count occurrences and a set for fast banned word lookup.

The brute-force approach would be to repeatedly count each word's occurrences, but that is slow. Using a hash map lets us count in a single pass.

Solution Approach

Let's break the solution into clear steps:

Normalize the Text:
- Convert the paragraph to all lowercase, so comparisons are case-insensitive.
- Replace all non-letter characters (punctuation) with spaces, so only words remain.
Split into Words:
- Split the cleaned paragraph into a list of words using spaces.
Prepare Banned Set:
- Convert the banned list into a set for O(1) lookup.
Count Word Frequencies:
- Use a hash map (dictionary) to count the frequency of each word, skipping any banned words.
Find the Most Common Word:
- Iterate over the hash map to find the word with the highest count that is not banned.

Why use a hash map? Because it allows us to count each word in O(1) time, making the process efficient even for large paragraphs.
Why use a set for banned words? Because checking membership in a set is also O(1), so we don't waste time checking if a word is banned.

Example Walkthrough

Sample Input:

paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]

Step-by-step:

Normalize: Convert to lowercase and replace punctuation.
Result: "bob hit a ball the hit ball flew far after it was hit"
Split into words:
["bob", "hit", "a", "ball", "the", "hit", "ball", "flew", "far", "after", "it", "was", "hit"]
Prepare banned set: {"hit"}
Count frequencies (skipping "hit"):
- "bob": 1
- "a": 1
- "ball": 2
- "the": 1
- "flew": 1
- "far": 1
- "after": 1
- "it": 1
- "was": 1
The word with the highest count is "ball" (2 times), which is not banned.

Output: "ball"

Time and Space Complexity

Brute-force approach:

For each unique word, count its appearances in the paragraph: O(N * M), where N is the number of words, and M is the number of unique words.

Optimized approach (using hash map):

Processing and splitting the paragraph: O(N), where N is the length of the paragraph.
Counting frequencies: O(N), as we visit each word once.
Finding the most common word: O(M), where M is the number of unique, non-banned words.
Total Time Complexity: O(N)
Space Complexity: O(M) for the hash map and O(B) for the banned set, where M is the number of unique words and B is the number of banned words.

Why? Because we only store each unique word and banned word once, and we process the paragraph in a single pass.

Summary

To solve the "Most Common Word" problem efficiently, we:

Normalize the text to handle case and punctuation.
Split the text into words and ignore banned ones using a set.
Count occurrences with a hash map for speed.
Return the most frequent, non-banned word.

This approach is elegant because it leverages efficient data structures (hash map and set) to ensure fast lookups and counting, making it suitable even for large paragraphs.

Code Implementation

import re
from collections import Counter

class Solution:
    def mostCommonWord(self, paragraph: str, banned: list[str]) -> str:
        # Normalize paragraph: lowercase and replace punctuation with spaces
        words = re.findall(r'\w+', paragraph.lower())
        banned_set = set(banned)
        counts = Counter(word for word in words if word not in banned_set)
        return counts.most_common(1)[0][0]

#include <string>
#include <unordered_set>
#include <unordered_map>
#include <vector>
#include <cctype>

using namespace std;

class Solution {
public:
    string mostCommonWord(string paragraph, vector<string>& banned) {
        unordered_set<string> banned_set(banned.begin(), banned.end());
        unordered_map<string, int> count;
        string word, res;
        int max_count = 0;
        for (int i = 0; i <= paragraph.size(); ++i) {
            if (i < paragraph.size() && isalpha(paragraph[i])) {
                word += tolower(paragraph[i]);
            } else if (!word.empty()) {
                if (!banned_set.count(word)) {
                    count[word]++;
                    if (count[word] > max_count) {
                        max_count = count[word];
                        res = word;
                    }
                }
                word = "";
            }
        }
        return res;
    }
};

import java.util.*;

class Solution {
    public String mostCommonWord(String paragraph, String[] banned) {
        Set<String> bannedSet = new HashSet<>(Arrays.asList(banned));
        Map<String, Integer> count = new HashMap<>();
        String[] words = paragraph.toLowerCase().split("\\W+");
        for (String word : words) {
            if (!bannedSet.contains(word) && word.length() > 0) {
                count.put(word, count.getOrDefault(word, 0) + 1);
            }
        }
        String res = "";
        int max = 0;
        for (String word : count.keySet()) {
            if (count.get(word) > max) {
                max = count.get(word);
                res = word;
            }
        }
        return res;
    }
}

/**
 * @param {string} paragraph
 * @param {string[]} banned
 * @return {string}
 */
var mostCommonWord = function(paragraph, banned) {
    let bannedSet = new Set(banned.map(w => w.toLowerCase()));
    let words = paragraph.toLowerCase().match(/\w+/g);
    let count = {};
    let maxWord = '';
    let maxCount = 0;
    for (let word of words) {
        if (!bannedSet.has(word)) {
            count[word] = (count[word] || 0) + 1;
            if (count[word] > maxCount) {
                maxCount = count[word];
                maxWord = word;
            }
        }
    }
    return maxWord;
};

AlgoMap.io

Want Help Cracking FAANG?