1759. Count Number of Homogenous Substrings - Leetcode Solution

Code Implementation

class Solution:
    def countHomogenous(self, s: str) -> int:
        MOD = 10**9 + 7
        result = 0
        count = 1
        for i in range(1, len(s) + 1):
            if i < len(s) and s[i] == s[i - 1]:
                count += 1
            else:
                result = (result + count * (count + 1) // 2) % MOD
                count = 1
        return result

class Solution {
public:
    int countHomogenous(string s) {
        const int MOD = 1e9 + 7;
        long long result = 0;
        int count = 1;
        for (int i = 1; i <= s.size(); ++i) {
            if (i < s.size() && s[i] == s[i - 1]) {
                count++;
            } else {
                result = (result + 1LL * count * (count + 1) / 2) % MOD;
                count = 1;
            }
        }
        return (int)result;
    }
};

class Solution {
    public int countHomogenous(String s) {
        int MOD = 1000000007;
        long result = 0;
        int count = 1;
        for (int i = 1; i <= s.length(); i++) {
            if (i < s.length() && s.charAt(i) == s.charAt(i - 1)) {
                count++;
            } else {
                result = (result + (long)count * (count + 1) / 2) % MOD;
                count = 1;
            }
        }
        return (int)result;
    }
}

var countHomogenous = function(s) {
    const MOD = 1e9 + 7;
    let result = 0;
    let count = 1;
    for (let i = 1; i <= s.length; i++) {
        if (i < s.length && s[i] === s[i - 1]) {
            count++;
        } else {
            result = (result + count * (count + 1) / 2) % MOD;
            count = 1;
        }
    }
    return result;
};

Problem Description

Given a string s, you are asked to count the number of homogenous substrings in s. A homogenous substring is a substring where all characters are the same. Substrings are contiguous sequences of characters within the string.

For example, in s = "abbcccaa", the homogenous substrings include single letters ("a", "b", "c"), as well as longer substrings like "bb", "ccc", "aa", etc.

Return the total number of homogenous substrings in s, modulo 10^9 + 7.
Constraints:
- 1 <= s.length <= 10^5
- s consists of lowercase English letters only.

Thought Process

The first idea that might come to mind is to consider all possible substrings and check if each one is homogenous. However, generating all substrings would be extremely inefficient for large strings because the number of substrings grows quadratically with the length of the string.

Instead, let's think about what defines a homogenous substring: it is a substring where all characters are the same. If you have a run of consecutive identical characters, say k of them, then every substring within this run is homogenous. For example, for "aaa", the homogenous substrings are "a" (3 times), "aa" (2 times), and "aaa" (1 time), for a total of 6 substrings.

So, the problem reduces to finding all runs of consecutive identical characters and, for each run of length k, calculating the number of homogenous substrings it contains.

Solution Approach

The optimal solution involves scanning the string once and counting the length of each run of consecutive identical characters. For each run, we compute the number of homogenous substrings using the formula for the sum of the first k natural numbers: k * (k + 1) / 2.

Initialize a result variable to accumulate the answer and a counter to track the current run's length.
Iterate through the string, starting from the second character.
If the current character matches the previous one, increment the run counter.
If it does not match (or we have reached the end of the string), calculate the number of homogenous substrings in the previous run using count * (count + 1) / 2 and add it to the result.
Reset the counter for the new run.
After processing the string, return the result modulo 10^9 + 7.

This approach ensures we only traverse the string once, making it efficient even for large inputs.

Example Walkthrough

Let's walk through the input s = "abbcccaa":

Start with the first character 'a' (run length = 1).
Next, 'b' is different, so count substrings in 'a': 1 * (1 + 1) / 2 = 1. Add 1 to result. Start new run with 'b' (run length = 1).
Next, 'b' again, increment run (run length = 2).
Next, 'c' is different, so count substrings in 'bb': 2 * (2 + 1) / 2 = 3. Add 3 to result. Start new run with 'c' (run length = 1).
Next, 'c' again, increment run (run length = 2).
Next, 'c' again, increment run (run length = 3).
Next, 'a' is different, so count substrings in 'ccc': 3 * (3 + 1) / 2 = 6. Add 6 to result. Start new run with 'a' (run length = 1).
Next, 'a' again, increment run (run length = 2).
End of string, so count substrings in 'aa': 2 * (2 + 1) / 2 = 3. Add 3 to result.
Total: 1 + 3 + 6 + 3 = 13.

Thus, the answer for "abbcccaa" is 13.

Time and Space Complexity

Brute-force approach: Checking every possible substring would require O(n^2) time and O(1) extra space, which is too slow for n up to 10^5.
Optimized approach: Only a single pass through the string is needed, so the time complexity is O(n), where n is the length of the string. Space complexity is O(1), since only a few variables are used regardless of input size.

Summary

The key insight is that all homogenous substrings are contained within runs of identical characters. By counting the length of each run and using a simple mathematical formula, we can efficiently compute the total number of homogenous substrings in linear time. This approach is both elegant and practical for large inputs, avoiding the inefficiency of brute-force substring enumeration.

AlgoMap.io

Want Help Cracking FAANG?