class Solution:
def countHomogenous(self, s: str) -> int:
MOD = 10**9 + 7
result = 0
count = 1
for i in range(1, len(s) + 1):
if i < len(s) and s[i] == s[i - 1]:
count += 1
else:
result = (result + count * (count + 1) // 2) % MOD
count = 1
return result
class Solution {
public:
int countHomogenous(string s) {
const int MOD = 1e9 + 7;
long long result = 0;
int count = 1;
for (int i = 1; i <= s.size(); ++i) {
if (i < s.size() && s[i] == s[i - 1]) {
count++;
} else {
result = (result + 1LL * count * (count + 1) / 2) % MOD;
count = 1;
}
}
return (int)result;
}
};
class Solution {
public int countHomogenous(String s) {
int MOD = 1000000007;
long result = 0;
int count = 1;
for (int i = 1; i <= s.length(); i++) {
if (i < s.length() && s.charAt(i) == s.charAt(i - 1)) {
count++;
} else {
result = (result + (long)count * (count + 1) / 2) % MOD;
count = 1;
}
}
return (int)result;
}
}
var countHomogenous = function(s) {
const MOD = 1e9 + 7;
let result = 0;
let count = 1;
for (let i = 1; i <= s.length; i++) {
if (i < s.length && s[i] === s[i - 1]) {
count++;
} else {
result = (result + count * (count + 1) / 2) % MOD;
count = 1;
}
}
return result;
};
Given a string s
, you are asked to count the number of homogenous substrings in s
. A homogenous substring is a substring where all characters are the same. Substrings are contiguous sequences of characters within the string.
For example, in s = "abbcccaa"
, the homogenous substrings include single letters ("a"
, "b"
, "c"
), as well as longer substrings like "bb"
, "ccc"
, "aa"
, etc.
s
, modulo 10^9 + 7
.1 <= s.length <= 10^5
s
consists of lowercase English letters only.The first idea that might come to mind is to consider all possible substrings and check if each one is homogenous. However, generating all substrings would be extremely inefficient for large strings because the number of substrings grows quadratically with the length of the string.
Instead, let's think about what defines a homogenous substring: it is a substring where all characters are the same. If you have a run of consecutive identical characters, say k
of them, then every substring within this run is homogenous. For example, for "aaa"
, the homogenous substrings are "a"
(3 times), "aa"
(2 times), and "aaa"
(1 time), for a total of 6 substrings.
So, the problem reduces to finding all runs of consecutive identical characters and, for each run of length k
, calculating the number of homogenous substrings it contains.
The optimal solution involves scanning the string once and counting the length of each run of consecutive identical characters. For each run, we compute the number of homogenous substrings using the formula for the sum of the first k
natural numbers: k * (k + 1) / 2
.
count * (count + 1) / 2
and add it to the result.10^9 + 7
.This approach ensures we only traverse the string once, making it efficient even for large inputs.
Let's walk through the input s = "abbcccaa"
:
'a'
(run length = 1).'b'
is different, so count substrings in 'a'
: 1 * (1 + 1) / 2 = 1
. Add 1 to result. Start new run with 'b'
(run length = 1).'b'
again, increment run (run length = 2).'c'
is different, so count substrings in 'bb'
: 2 * (2 + 1) / 2 = 3
. Add 3 to result. Start new run with 'c'
(run length = 1).'c'
again, increment run (run length = 2).'c'
again, increment run (run length = 3).'a'
is different, so count substrings in 'ccc'
: 3 * (3 + 1) / 2 = 6
. Add 6 to result. Start new run with 'a'
(run length = 1).'a'
again, increment run (run length = 2).'aa'
: 2 * (2 + 1) / 2 = 3
. Add 3 to result.1 + 3 + 6 + 3 = 13
.
Thus, the answer for "abbcccaa"
is 13.
O(n^2)
time and O(1)
extra space, which is too slow for n
up to 10^5
.
O(n)
, where n
is the length of the string. Space complexity is O(1)
, since only a few variables are used regardless of input size.
The key insight is that all homogenous substrings are contained within runs of identical characters. By counting the length of each run and using a simple mathematical formula, we can efficiently compute the total number of homogenous substrings in linear time. This approach is both elegant and practical for large inputs, avoiding the inefficiency of brute-force substring enumeration.