Given a non-empty string s
, encode it such that its encoded length is as short as possible. The encoding rule is: k[encoded_string]
, where the encoded_string
inside the square brackets is repeated exactly k
times. If encoding does not make the string shorter, return the original string.
For example, "aaa"
can be encoded as "3[a]"
but since "3[a]"
is not shorter than "aaa"
, we keep the original. However, "aaaaa"
can be encoded as "5[a]"
.
s
(1 ≤ s.length ≤ 160)
The encoding must be valid and use the format k[encoded_string]
for repeated substrings, and you may use encoding recursively within encoded_string
.
At first glance, we might think of simply finding repeated substrings and compressing them. However, the challenge is that repetitions can be nested, and sometimes encoding a substring recursively can result in a shorter overall encoding. Brute-force approaches, such as trying every possible substring and encoding, would be too slow for large strings.
To optimize, we can use dynamic programming. The idea is to build up solutions for shorter substrings and use them to solve for longer substrings. For every substring, we try all possible splits and all possible repeated patterns, and choose the shortest encoding at each step.
This approach is similar to solving the "word break" problem or optimal matrix chain multiplication, where we build up from small to large segments.
dp[i][j]
as the shortest encoded string for substring s[i:j+1]
.
s[i:j+1]
, try every split point k
between i
and j
and set dp[i][j]
as the minimum of dp[i][k] + dp[k+1][j]
.
repeat_count[encoded_pattern]
where encoded_pattern
is the shortest encoding of that smaller substring.
dp[0][n-1]
(the shortest encoding for the whole string).
The DP approach ensures that all possible encodings are considered, and by always selecting the shortest, we guarantee the optimal solution.
Let's consider the input s = "ababab"
.
"ab"
, there's no shorter encoding, so dp[0][1] = "ab"
.
"abab"
(i=0, j=3
), we can split at k=1
("ab" + "ab"
), but also notice that "abab"
is 2 * "ab"
. So, 2[ab]
is shorter than "abab"
.
"ababab"
(i=0, j=5
), we can split at k=3
("abab" + "ab"
), or notice that "ababab"
is 3 * "ab"
. So, 3[ab]
is the shortest encoding.
The final answer for "ababab"
is "3[ab]"
.
O(n^2)
substrings. For each, we try all possible splits (O(n)
) and check for repeated patterns (O(n)
), resulting in O(n^3)
time complexity.
O(n^2)
space.
For n = 160
, this approach is efficient enough.
This problem requires minimizing the encoded length of a string using a specific repeat encoding format. By using dynamic programming, we efficiently explore all possible encodings for all substrings, always selecting the shortest. The solution is elegant because it systematically builds up from simple cases, avoids redundant calculations, and guarantees optimality by always choosing the minimal encoding at every step.
class Solution:
def encode(self, s: str) -> str:
n = len(s)
dp = [["" for _ in range(n)] for _ in range(n)]
for l in range(1, n+1): # length of substring
for i in range(n - l + 1):
j = i + l - 1
substr = s[i:j+1]
dp[i][j] = substr
# Try all possible splits
for k in range(i, j):
if len(dp[i][k] + dp[k+1][j]) < len(dp[i][j]):
dp[i][j] = dp[i][k] + dp[k+1][j]
# Try to encode by repeating pattern
for k in range(1, l//2 + 1):
if l % k == 0:
repeat = substr[:k]
if repeat * (l // k) == substr:
encoded = str(l // k) + "[" + dp[i][i+k-1] + "]"
if len(encoded) < len(dp[i][j]):
dp[i][j] = encoded
return dp[0][n-1]
class Solution {
public:
string encode(string s) {
int n = s.size();
vector<vector<string>> dp(n, vector<string>(n, ""));
for (int l = 1; l <= n; ++l) {
for (int i = 0; i + l - 1 < n; ++i) {
int j = i + l - 1;
string substr = s.substr(i, l);
dp[i][j] = substr;
// Try all possible splits
for (int k = i; k < j; ++k) {
if (dp[i][k].size() + dp[k+1][j].size() < dp[i][j].size()) {
dp[i][j] = dp[i][k] + dp[k+1][j];
}
}
// Try to encode by repeating pattern
for (int k = 1; k <= l/2; ++k) {
if (l % k == 0) {
string repeat = substr.substr(0, k);
string repeated = "";
for (int t = 0; t < l/k; ++t) repeated += repeat;
if (repeated == substr) {
string encoded = to_string(l/k) + "[" + dp[i][i+k-1] + "]";
if (encoded.size() < dp[i][j].size()) {
dp[i][j] = encoded;
}
}
}
}
}
}
return dp[0][n-1];
}
};
class Solution {
public String encode(String s) {
int n = s.length();
String[][] dp = new String[n][n];
for (int l = 1; l <= n; l++) {
for (int i = 0; i + l - 1 < n; i++) {
int j = i + l - 1;
String substr = s.substring(i, j + 1);
dp[i][j] = substr;
// Try all possible splits
for (int k = i; k < j; k++) {
if (dp[i][k].length() + dp[k+1][j].length() < dp[i][j].length()) {
dp[i][j] = dp[i][k] + dp[k+1][j];
}
}
// Try to encode by repeating pattern
for (int k = 1; k <= l/2; k++) {
if (l % k == 0) {
String repeat = substr.substring(0, k);
StringBuilder repeated = new StringBuilder();
for (int t = 0; t < l / k; t++) repeated.append(repeat);
if (repeated.toString().equals(substr)) {
String encoded = (l / k) + "[" + dp[i][i + k - 1] + "]";
if (encoded.length() < dp[i][j].length()) {
dp[i][j] = encoded;
}
}
}
}
}
}
return dp[0][n - 1];
}
}
var encode = function(s) {
const n = s.length;
const dp = Array.from({length: n}, () => Array(n).fill(""));
for (let l = 1; l <= n; l++) {
for (let i = 0; i + l - 1 < n; i++) {
let j = i + l - 1;
let substr = s.substring(i, j + 1);
dp[i][j] = substr;
// Try all possible splits
for (let k = i; k < j; k++) {
if ((dp[i][k] + dp[k+1][j]).length < dp[i][j].length) {
dp[i][j] = dp[i][k] + dp[k+1][j];
}
}
// Try to encode by repeating pattern
for (let k = 1; k <= Math.floor(l/2); k++) {
if (l % k === 0) {
let repeat = substr.substring(0, k);
if (repeat.repeat(l / k) === substr) {
let encoded = (l / k) + "[" + dp[i][i + k - 1] + "]";
if (encoded.length < dp[i][j].length) {
dp[i][j] = encoded;
}
}
}
}
}
}
return dp[0][n-1];
};