Given the root of a binary tree, you are asked to find all duplicate subtrees.
A duplicate subtree is defined as a subtree that appears more than once in the tree, where two subtrees are considered duplicate if they have the same structure and the same node values.
For each kind of duplicate subtree, you need to return the root node of any one of them.
Constraints:
[1, 5000]
.-200 <= Node.val <= 200
To solve this problem, we need to identify all subtrees that occur more than once in the given binary tree.
A brute-force approach would be to compare every subtree with every other subtree, but this is highly inefficient, especially for large trees.
Instead, we can notice that if two subtrees are identical, then their serialized (string or tuple) representations will also be identical.
Thus, we can represent each subtree by a unique serialization (for example, a string that encodes the structure and values of the subtree).
By traversing the tree and recording the serialization of each subtree, we can use a hash map (dictionary) to count how many times each unique subtree appears.
If a serialization appears more than once, we've found a duplicate subtree.
This shift from comparing trees directly to comparing their serializations is the key optimization that makes the problem tractable.
We solve the problem using a post-order traversal, serializing each subtree, and using a hash map to record the count of each serialization. Here's a step-by-step breakdown:
We use post-order traversal so that we serialize children before their parent, ensuring that subtree serializations are built up from the leaves. The hash map allows us to check for duplicates in constant time per subtree.
Sample Input:
1 / \ 2 3 / / \ 4 2 4 / 4Step-by-step:
4
is serialized as 4,#,#
.2
subtree becomes 2,4,#,#,#
.2
subtree (under 3
) also becomes 2,4,#,#,#
.4
(under 3
) is again 4,#,#
.3
subtree becomes 3,2,4,#,#,#,4,#,#
.1
subtree is 1,2,4,#,#,#,3,2,4,#,#,#,4,#,#
.4,#,#
and 2,4,#,#,#
both have a count greater than 1. We add the corresponding root nodes to the result.2
subtree and any one 4
subtree.
Brute-force approach:
The optimized approach is much more efficient and scalable for large trees.
In this problem, we efficiently detect duplicate subtrees by serializing each subtree and using a hash map to count occurrences.
This approach avoids expensive subtree comparisons and leverages fast lookups, making it suitable for large binary trees.
The key insight is that identical subtrees will have identical serializations, so we can detect duplicates by comparing these serialized representations.
By using post-order traversal, we ensure that each subtree's serialization is built from its children upward.
This method is both elegant and efficient, demonstrating how a clever representation can transform a brute-force problem into a tractable one.
# Definition for a binary tree node.
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
class Solution:
def findDuplicateSubtrees(self, root):
from collections import defaultdict
count = defaultdict(int)
result = []
def serialize(node):
if not node:
return "#"
serial = "{},{},{}".format(
node.val, serialize(node.left), serialize(node.right))
count[serial] += 1
if count[serial] == 2:
result.append(node)
return serial
serialize(root)
return result
// Definition for a binary tree node.
struct TreeNode {
int val;
TreeNode *left;
TreeNode *right;
TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};
class Solution {
public:
vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
unordered_map<string, int> count;
vector<TreeNode*> result;
serialize(root, count, result);
return result;
}
string serialize(TreeNode* node, unordered_map<string, int>& count, vector<TreeNode*>& result) {
if (!node) return "#";
string serial = to_string(node->val) + "," +
serialize(node->left, count, result) + "," +
serialize(node->right, count, result);
if (++count[serial] == 2)
result.push_back(node);
return serial;
}
};
// Definition for a binary tree node.
public class TreeNode {
int val;
TreeNode left;
TreeNode right;
TreeNode(int x) { val = x; }
}
class Solution {
public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
Map<String, Integer> count = new HashMap<>();
List<TreeNode> result = new ArrayList<>();
serialize(root, count, result);
return result;
}
private String serialize(TreeNode node, Map<String, Integer> count, List<TreeNode> result) {
if (node == null) return "#";
String serial = node.val + "," + serialize(node.left, count, result) + "," + serialize(node.right, count, result);
count.put(serial, count.getOrDefault(serial, 0) + 1);
if (count.get(serial) == 2) {
result.add(node);
}
return serial;
}
}
/**
* Definition for a binary tree node.
* function TreeNode(val, left, right) {
* this.val = (val===undefined ? 0 : val)
* this.left = (left===undefined ? null : left)
* this.right = (right===undefined ? null : right)
* }
*/
/**
* @param {TreeNode} root
* @return {TreeNode[]}
*/
var findDuplicateSubtrees = function(root) {
const count = new Map();
const result = [];
function serialize(node) {
if (!node) return "#";
const serial = node.val + "," + serialize(node.left) + "," + serialize(node.right);
count.set(serial, (count.get(serial) || 0) + 1);
if (count.get(serial) === 2) {
result.push(node);
}
return serial;
}
serialize(root);
return result;
};