792. Number of Matching Subsequences - Leetcode Solution

Code Implementation

from collections import defaultdict, deque

class Solution:
    def numMatchingSubseq(self, s: str, words: list[str]) -> int:
        waiting = defaultdict(deque)
        for word in words:
            waiting[word[0]].append(iter(word[1:]))
        count = 0
        for c in s:
            old_waiting = waiting[c]
            waiting[c] = deque()
            for it in old_waiting:
                nxt = next(it, None)
                if nxt:
                    waiting[nxt].append(it)
                else:
                    count += 1
        return count

#include <vector>
#include <string>
#include <unordered_map>
#include <queue>
using namespace std;

class Solution {
public:
    int numMatchingSubseq(string s, vector<string>& words) {
        unordered_map<char, queue<pair<string::iterator, string::iterator>>> waiting;
        for (const string& word : words) {
            waiting[word[0]].emplace(word.begin() + 1, word.end());
        }
        int count = 0;
        for (char c : s) {
            auto old_queue = std::move(waiting[c]);
            while (!old_queue.empty()) {
                auto it_pair = old_queue.front(); old_queue.pop();
                if (it_pair.first == it_pair.second) {
                    ++count;
                } else {
                    waiting[*it_pair.first].emplace(it_pair.first + 1, it_pair.second);
                }
            }
        }
        return count;
    }
};

import java.util.*;

class Solution {
    public int numMatchingSubseq(String s, String[] words) {
        Map<Character, Queue<Iterator<Character>>> waiting = new HashMap<>();
        for (char c = 'a'; c <= 'z'; c++) waiting.put(c, new LinkedList<>());
        for (String word : words) {
            waiting.get(word.charAt(0)).offer(word.chars().skip(1).mapToObj(i -> (char)i).iterator());
        }
        int count = 0;
        for (char c : s.toCharArray()) {
            Queue<Iterator<Character>> old = waiting.get(c);
            int size = old.size();
            for (int i = 0; i < size; i++) {
                Iterator<Character> it = old.poll();
                if (it.hasNext()) {
                    waiting.get(it.next()).offer(it);
                } else {
                    count++;
                }
            }
        }
        return count;
    }
}

var numMatchingSubseq = function(s, words) {
    const waiting = {};
    for (let i = 0; i < 26; i++) waiting[String.fromCharCode(97 + i)] = [];
    for (const word of words) {
        waiting[word[0]].push([word, 1]);
    }
    let count = 0;
    for (const c of s) {
        const old = waiting[c];
        waiting[c] = [];
        for (const [word, idx] of old) {
            if (idx === word.length) {
                count++;
            } else {
                waiting[word[idx]].push([word, idx + 1]);
            }
        }
    }
    return count;
};

Problem Description

You are given a string s and an array of strings words. Your task is to determine how many words in words are subsequences of s.

A word is a subsequence of s if you can delete some (or no) characters from s without changing the order of the remaining characters to form the word.
Each word in words should be checked independently.
You may not reuse characters in s for the same position in a word.
Return the total number of matching subsequences.

Constraints:

There is only one valid answer for each test case.
Each element in words must be checked separately; do not reuse elements.

Thought Process

The most straightforward way to solve this problem is to check, for each word, if it appears as a subsequence in s. This can be done by iterating through s and trying to match the characters of the word in order. However, this brute-force approach, especially if s and words are large, can be inefficient.

To optimize, we need to avoid redundant work. Notice that many words may share prefixes, and many words may be waiting for the same next character in s. If we can process all words waiting for the same character together as we scan through s, we can significantly reduce the number of operations.

The key insight is to use a "bucket" or "waiting list" for each character. Each bucket keeps track of which words (or their iterators) are waiting for that character next. As we process each character in s, we update the waiting lists accordingly.

Solution Approach

Here is a step-by-step breakdown of the optimized algorithm:

Initialize waiting lists:
- For each word in words, put it (or an iterator/pointer to its next character) into a waiting list keyed by the first character it needs to match.
Process the main string s:
- For each character c in s (left to right):
- Take all words (or iterators) currently waiting for c.
- For each such word:
  - If the word has more characters, move it to the waiting list for its next required character.
  - If the word is complete (no more characters to match), increment the count of matching subsequences.
Return the result:
- After processing all characters in s, return the total count of words that matched as subsequences.

This approach is efficient because each character in each word is processed exactly once, and we leverage hash maps (or arrays) for fast lookups.

Example Walkthrough

Sample Input:
s = "abcde"
words = ["a", "bb", "acd", "ace"]

Initialize waiting lists:
- 'a': ["a", "acd", "ace"]
- 'b': ["bb"]
Process 'a' in s:
- "a" is complete (matches), count = 1
- "acd" now waits for 'c'
- "ace" now waits for 'c'
Process 'b':
- "bb" now waits for 'b'
Process 'c':
- "acd" now waits for 'd'
- "ace" now waits for 'e'
Process 'd':
- "acd" is complete, count = 2
Process 'e':
- "ace" is complete, count = 3

"bb" is not a subsequence (no further 'b' in s), so the final answer is 3.

Time and Space Complexity

Brute-force approach:
- For each word, scan through s to check if it is a subsequence.
- Time: O(N * L), where N = number of words, L = length of s.
Optimized bucket approach:
- Each character of each word is processed exactly once as we scan through s.
- Time: O(L + total_chars_in_words), where L = length of s.
- Space: O(total_chars_in_words) for the waiting lists and iterators.

The optimized approach is much faster, especially when words contains many strings or s is long.

Summary

In this problem, the key insight is to process all words in words in a way that minimizes redundant work by grouping them according to the next character they need to match. By using a hash map (or array) of waiting lists for each character, we efficiently move iterators through the words as we scan s, counting matches as we go. This elegant solution leverages data structures to turn a potentially slow brute-force approach into a highly efficient one, making it suitable for large inputs.

AlgoMap.io

Want Help Cracking FAANG?