Given a string S
, find the length of the longest substring that appears at least twice in S
. The two occurrences of the substring may overlap. If there is no such repeating substring, return 0
.
Constraints:
S
consists only of lowercase English letters.S
is between 1 and 1500, inclusive.Input: S = "abcd" Output: 0 Input: S = "abbaba" Output: 2
You must find the length of the longest substring that repeats at least twice in S
. Substrings must be contiguous, and the repeated occurrences can overlap.
The naive approach is to check every possible substring and see if it appears at least twice. However, this is highly inefficient, especially for longer strings, because there are a large number of substrings, and checking each one would take too long.
To improve, we can think about ways to efficiently search for repeating substrings. One way is to use hashing (such as Rabin-Karp) to quickly compare substrings of the same length. Another idea is to use binary search to find the maximum possible length of a repeating substring, which can reduce the number of checks we need to make.
The key insight is that for a given length L
, we can efficiently check if any substring of length L
appears more than once using a set or hash map to store seen substrings. If we can check for a given L
quickly, we can use binary search to find the largest such L
.
We'll use a combination of binary search and hashing to solve the problem efficiently:
left = 1
and right = len(S) - 1
. The answer must be between these values.
left <= right
, do:
mid = (left + right) // 2
. Check if there is any substring of length mid
that occurs more than once in S
.
left = mid + 1
.
right = mid - 1
.
L
, slide a window of size L
over the string.
L
.
L
for which a repeat exists is our answer.
This approach is efficient because binary search reduces the number of lengths to check (logarithmic in the length of the string), and hashing allows us to check for repeats among all substrings of a given length in linear time.
Let's walk through the input S = "abbaba"
:
left = 1
, right = 5
.
mid = 3
.
right = 2
.
mid = 1
.
left = 2
.
mid = 2
.
left = 3
.
left > right
, so the answer is the last found length with repeats, which is 2
.
This shows how binary search narrows down the possible lengths, and hashing allows us to check for repeats efficiently.
The optimized approach is much more efficient and suitable for the problem's constraints.
To find the length of the longest repeating substring in a string, we use binary search to efficiently narrow down the possible lengths and hashing to quickly check for repeated substrings of a given length. This combination leverages the strengths of both algorithms, making the solution fast and effective for large inputs. The main insight is to treat the problem as a search over substring lengths, and to use efficient data structures for checking repeats.
class Solution:
def longestRepeatingSubstring(self, S: str) -> int:
def search(L):
seen = set()
base = 256
mod = 2**63 - 1
hash_val = 0
for i in range(L):
hash_val = (hash_val * base + ord(S[i])) % mod
seen.add(hash_val)
baseL = pow(base, L, mod)
for i in range(1, len(S) - L + 1):
hash_val = (hash_val * base - ord(S[i-1]) * baseL + ord(S[i+L-1])) % mod
if hash_val in seen:
return True
seen.add(hash_val)
return False
left, right = 1, len(S) - 1
ans = 0
while left <= right:
mid = (left + right) // 2
if search(mid):
ans = mid
left = mid + 1
else:
right = mid - 1
return ans
class Solution {
public:
int longestRepeatingSubstring(string S) {
int n = S.length();
int left = 1, right = n - 1, ans = 0;
while (left <= right) {
int mid = (left + right) / 2;
if (hasRepeat(S, mid)) {
ans = mid;
left = mid + 1;
} else {
right = mid - 1;
}
}
return ans;
}
private:
bool hasRepeat(const string& S, int L) {
unordered_set<long long> seen;
long long hash = 0, base = 256, mod = (1LL << 61) - 1;
long long baseL = 1;
for (int i = 0; i < L; ++i) {
hash = (hash * base + S[i]) % mod;
baseL = (baseL * base) % mod;
}
seen.insert(hash);
for (int i = 1; i + L <= S.size(); ++i) {
hash = (hash * base - S[i-1] * baseL % mod + mod) % mod;
hash = (hash + S[i+L-1]) % mod;
if (seen.count(hash)) return true;
seen.insert(hash);
}
return false;
}
};
class Solution {
public int longestRepeatingSubstring(String S) {
int n = S.length();
int left = 1, right = n - 1, ans = 0;
while (left <= right) {
int mid = (left + right) / 2;
if (hasRepeat(S, mid)) {
ans = mid;
left = mid + 1;
} else {
right = mid - 1;
}
}
return ans;
}
private boolean hasRepeat(String S, int L) {
HashSet<Long> seen = new HashSet<>();
long hash = 0, base = 256, mod = (1L << 61) - 1;
long baseL = 1;
for (int i = 0; i < L; ++i) {
hash = (hash * base + S.charAt(i)) % mod;
baseL = (baseL * base) % mod;
}
seen.add(hash);
for (int i = 1; i + L <= S.length(); ++i) {
hash = (hash * base - S.charAt(i-1) * baseL % mod + mod) % mod;
hash = (hash + S.charAt(i+L-1)) % mod;
if (seen.contains(hash)) return true;
seen.add(hash);
}
return false;
}
}
var longestRepeatingSubstring = function(S) {
function search(L) {
let seen = new Set();
let base = 256, mod = 2 ** 53 - 1;
let hash = 0;
for (let i = 0; i < L; ++i) {
hash = (hash * base + S.charCodeAt(i)) % mod;
}
seen.add(hash);
let baseL = 1;
for (let i = 0; i < L; ++i) baseL = (baseL * base) % mod;
for (let i = 1; i + L <= S.length; ++i) {
hash = (hash * base - S.charCodeAt(i-1) * baseL % mod + mod) % mod;
hash = (hash + S.charCodeAt(i+L-1)) % mod;
if (seen.has(hash)) return true;
seen.add(hash);
}
return false;
}
let left = 1, right = S.length - 1, ans = 0;
while (left <= right) {
let mid = Math.floor((left + right) / 2);
if (search(mid)) {
ans = mid;
left = mid + 1;
} else {
right = mid - 1;
}
}
return ans;
};