Want Help Cracking FAANG?

(Then click this)

×
Back to Question Bank

1078. Occurrences After Bigram - Leetcode Solution

Problem Description

The "Occurrences After Bigram" problem asks you to analyze a string of words and find all words that immediately follow a specific pair of consecutive words (a bigram).

Given:

  • A string text containing words separated by spaces.
  • Two target words: first and second.
Your task is to find every occurrence in text where the word first is immediately followed by second, and collect the word that comes right after this pair (if any).

Constraints:
  • The input string will have at least two words.
  • There may be multiple, overlapping occurrences of the bigram.
  • Return all valid third words in the order they appear.
Example:
text = "we will we will rock you", first = "we", second = "will"
The output should be ["we", "rock"] because "we will we" and "we will rock" are the two occurrences.

Thought Process

To solve this problem, let's first break down what we need:

  • We have a string of words, and we need to look for every place where two specific words occur one after another.
  • For each such occurrence, we want the next word (if it exists).
A brute-force way would be to check every possible pair of consecutive words, and if they match first and second, grab the following word.

This is a classic sliding window problem, where we can scan the words with a window of size 3 and check if the first two match our targets.

We don't need to use any advanced data structures; a simple loop over the list of words is sufficient. Optimization comes from realizing that we only need to look at each word once, and can ignore any pairs that don't match.

Solution Approach

Let's walk through the steps to solve this problem efficiently:

  1. Split the text: Use the space character to split text into an array (or list) of words.
  2. Iterate with a sliding window:
    • Start a loop from the first word up to the third-last word (since we always look at three consecutive words).
    • At each index i, check if words[i] is equal to first and words[i+1] is equal to second.
  3. Collect results:
    • If both match, append words[i+2] to the result list (since it's the word that follows the bigram).
  4. Return the result: After the loop, return the list of collected words.
Why this works:
  • We only ever look at each possible bigram (pair of words) once.
  • We're careful not to go out of bounds by stopping the loop early enough.
  • This approach is both simple and efficient for the problem size.

Example Walkthrough

Let's use the input:
text = "alice is a good girl she is a good student"
first = "a", second = "good"

Step-by-step:

  • Split text: ["alice", "is", "a", "good", "girl", "she", "is", "a", "good", "student"]
  • Loop from index 0 to 7 (since we need at least 3 words for each check):
  • At each step:
    • i=0: "alice" "is" → not "a" "good"
    • i=1: "is" "a" → not "a" "good"
    • i=2: "a" "good" → YES! Next word is "girl" → add "girl"
    • i=3: "good" "girl" → not "a" "good"
    • i=4: "girl" "she" → not "a" "good"
    • i=5: "she" "is" → not "a" "good"
    • i=6: "is" "a" → not "a" "good"
    • i=7: "a" "good" → YES! Next word is "student" → add "student"
  • Result: ["girl", "student"]
This matches the expected answer.

Time and Space Complexity

Brute-force approach:

  • If you checked every possible triplet of words, you would have O(n) time, where n is the number of words.
  • But since the optimized approach also does this in a single pass, it's just as efficient.
Optimized approach:
  • Time Complexity: O(n), where n is the number of words in text. We loop through the list once.
  • Space Complexity: O(n) for storing the split words, plus O(k) for the output list (where k is the number of matches).
Why?
  • Each word is checked at most once for being part of a bigram.
  • No nested loops or extra data structures are needed.

Summary

The "Occurrences After Bigram" problem is a great example of a simple sliding window approach. By splitting the input into words and scanning with a window of size three, we can efficiently find all words that follow a given bigram. The solution is elegant because it makes a single pass, uses minimal space, and is easy to reason about, making it suitable even for large input sizes.

Code Implementation

class Solution:
    def findOcurrences(self, text: str, first: str, second: str) -> list:
        words = text.split()
        res = []
        for i in range(len(words) - 2):
            if words[i] == first and words[i+1] == second:
                res.append(words[i+2])
        return res
      
class Solution {
public:
    vector<string> findOcurrences(string text, string first, string second) {
        vector<string> words, res;
        istringstream iss(text);
        string word;
        while (iss >> word) {
            words.push_back(word);
        }
        for (int i = 0; i + 2 < words.size(); ++i) {
            if (words[i] == first && words[i+1] == second) {
                res.push_back(words[i+2]);
            }
        }
        return res;
    }
};
      
class Solution {
    public String[] findOcurrences(String text, String first, String second) {
        String[] words = text.split(" ");
        List<String> result = new ArrayList<>();
        for (int i = 0; i + 2 < words.length; i++) {
            if (words[i].equals(first) && words[i+1].equals(second)) {
                result.add(words[i+2]);
            }
        }
        return result.toArray(new String[0]);
    }
}
      
var findOcurrences = function(text, first, second) {
    const words = text.split(' ');
    const res = [];
    for (let i = 0; i < words.length - 2; i++) {
        if (words[i] === first && words[i+1] === second) {
            res.push(words[i+2]);
        }
    }
    return res;
};