940. Distinct Subsequences II - Leetcode Solution

Code Implementation

class Solution:
    def distinctSubseqII(self, s: str) -> int:
        MOD = 10 ** 9 + 7
        ends = [0] * 26  # ends[c] = # of subseqs ending with chr(c + ord('a'))
        for ch in s:
            idx = ord(ch) - ord('a')
            total = sum(ends) % MOD
            ends[idx] = (total + 1) % MOD
        return sum(ends) % MOD

class Solution {
public:
    int distinctSubseqII(string s) {
        const int MOD = 1e9 + 7;
        vector<int> ends(26, 0);
        for (char ch : s) {
            int idx = ch - 'a';
            long total = 0;
            for (int cnt : ends) total = (total + cnt) % MOD;
            ends[idx] = (total + 1) % MOD;
        }
        long res = 0;
        for (int cnt : ends) res = (res + cnt) % MOD;
        return (int)res;
    }
};

class Solution {
    public int distinctSubseqII(String s) {
        int MOD = 1000000007;
        int[] ends = new int[26];
        for (char ch : s.toCharArray()) {
            int idx = ch - 'a';
            int total = 0;
            for (int cnt : ends) total = (total + cnt) % MOD;
            ends[idx] = (total + 1) % MOD;
        }
        int res = 0;
        for (int cnt : ends) res = (res + cnt) % MOD;
        return res;
    }
}

var distinctSubseqII = function(s) {
    const MOD = 1e9 + 7;
    let ends = Array(26).fill(0);
    for (let ch of s) {
        let idx = ch.charCodeAt(0) - 'a'.charCodeAt(0);
        let total = ends.reduce((a, b) => (a + b) % MOD, 0);
        ends[idx] = (total + 1) % MOD;
    }
    return ends.reduce((a, b) => (a + b) % MOD, 0);
};

Problem Description

Given a string s, return the number of distinct non-empty subsequences of s. Since the answer may be very large, return it modulo 10^9 + 7.

A subsequence is a sequence that can be derived from s by deleting some (or no) characters without changing the order of the remaining characters.
Two subsequences are considered distinct if they differ at least at one position.
You must not count the empty subsequence.
Constraints: 1 <= s.length <= 2000, s consists only of lowercase English letters.

Thought Process

At first glance, the problem asks us to count all possible distinct subsequences of a string. A brute-force approach would involve generating all possible subsequences and then filtering out duplicates, but this quickly becomes infeasible as the input size grows. For a string of length n, there are up to 2^n possible subsequences, so we need a smarter way.

The key insight is to recognize patterns in how subsequences are formed as we process the string from left to right. For each new character, every existing subsequence can be extended by this character to form a new subsequence. However, we also have to avoid double-counting subsequences that end with the same character. This leads us to consider using dynamic programming, where we track the number of distinct subsequences ending with each character.

Solution Approach

We use a dynamic programming approach. The idea is to keep an array ends of size 26 (for each lowercase letter), where ends[c] represents the number of distinct subsequences ending with character c.

Initialize ends as an array of zeros.
For each character ch in s:
- Compute the total number of distinct subsequences so far by summing all values in ends.
- The number of new subsequences ending with ch is total + 1 (the "+1" is for the subsequence containing only ch itself).
- Update ends[ch] to this new value. This may overwrite previous counts for ch, which is correct because new subsequences with the latest ch may overlap with older ones, and we want only the most recent.
After processing all characters, the answer is the sum of all elements in ends.
Use modulo 10^9 + 7 at each step to avoid overflow.

This approach ensures we count each distinct subsequence exactly once, efficiently and without generating them explicitly.

Example Walkthrough

Let's walk through the algorithm for the string s = "abc":

Start with ends = [0]*26.
Process 'a':
- Total subsequences so far: 0
- New subsequences ending with 'a': 0 + 1 = 1
- ends['a'] = 1
Process 'b':
- Total subsequences so far: 1
- New subsequences ending with 'b': 1 + 1 = 2
- ends['b'] = 2
Process 'c':
- Total subsequences so far: 1 + 2 = 3
- New subsequences ending with 'c': 3 + 1 = 4
- ends['c'] = 4
Final answer: ends['a'] + ends['b'] + ends['c'] = 1 + 2 + 4 = 7

The 7 distinct non-empty subsequences are: "a", "b", "c", "ab", "ac", "bc", "abc".

Time and Space Complexity

Brute-force: Generating all subsequences would take O(2^n) time and space, which is infeasible for n = 2000.
Optimized DP approach:
- Time Complexity: O(n * 26) (for each character, we sum a 26-length array), which is effectively O(n) since 26 is constant.
- Space Complexity: O(26) for the ends array, i.e., O(1) extra space.

This makes the solution efficient and scalable for large input sizes.

Summary

The problem of counting distinct non-empty subsequences can be solved efficiently using dynamic programming. By tracking the number of subsequences ending with each character and updating these counts as we traverse the string, we avoid redundant calculations and exponential blowup. The key insight is to realize how new subsequences are formed and how to avoid double-counting. The final approach is both elegant and practical, leveraging simple data structures and modular arithmetic.

AlgoMap.io

Want Help Cracking FAANG?