Want Help Cracking FAANG?

(Then click this)

×
Back to Question Bank

652. Find Duplicate Subtrees - Leetcode Solution

Problem Description

Given the root of a binary tree, you are asked to find all duplicate subtrees. A duplicate subtree is defined as a subtree that appears more than once in the tree, where two subtrees are considered duplicate if they have the same structure and the same node values.

For each kind of duplicate subtree, you need to return the root node of any one of them.

Constraints:

  • The number of nodes in the tree is in the range [1, 5000].
  • -200 <= Node.val <= 200
Note: The returned list of roots can be in any order, but each duplicate subtree should be represented only once.

Thought Process

To solve this problem, we need to identify all subtrees that occur more than once in the given binary tree. A brute-force approach would be to compare every subtree with every other subtree, but this is highly inefficient, especially for large trees.

Instead, we can notice that if two subtrees are identical, then their serialized (string or tuple) representations will also be identical. Thus, we can represent each subtree by a unique serialization (for example, a string that encodes the structure and values of the subtree).

By traversing the tree and recording the serialization of each subtree, we can use a hash map (dictionary) to count how many times each unique subtree appears. If a serialization appears more than once, we've found a duplicate subtree.

This shift from comparing trees directly to comparing their serializations is the key optimization that makes the problem tractable.

Solution Approach

We solve the problem using a post-order traversal, serializing each subtree, and using a hash map to record the count of each serialization. Here's a step-by-step breakdown:

  1. Serialize Subtrees: For each node, recursively serialize its left and right subtrees, then combine the serializations with the current node's value to create a unique string or tuple for the subtree rooted at that node.
  2. Use a Hash Map: Store each subtree serialization in a hash map (dictionary) with the count of how many times it has occurred.
  3. Detect Duplicates: When a serialization is found for the second time (i.e., the count becomes 2), record the current node as the root of a duplicate subtree.
  4. Return Results: After traversing the entire tree, return the list of root nodes for all detected duplicate subtrees.

We use post-order traversal so that we serialize children before their parent, ensuring that subtree serializations are built up from the leaves. The hash map allows us to check for duplicates in constant time per subtree.

Example Walkthrough

Sample Input:

        1
       / \
      2   3
     /   / \
    4   2   4
       /
      4
    
Step-by-step:
  1. Start at the leaves. Each 4 is serialized as 4,#,#.
  2. The left 2 subtree becomes 2,4,#,#,#.
  3. The right 2 subtree (under 3) also becomes 2,4,#,#,#.
  4. The right 4 (under 3) is again 4,#,#.
  5. The right 3 subtree becomes 3,2,4,#,#,#,4,#,#.
  6. The root 1 subtree is 1,2,4,#,#,#,3,2,4,#,#,#,4,#,#.
  7. Now, in our hash map, 4,#,# and 2,4,#,#,# both have a count greater than 1. We add the corresponding root nodes to the result.
Result: Return the root nodes of any one 2 subtree and any one 4 subtree.

Time and Space Complexity

Brute-force approach:

  • Time Complexity: O(N^2 * H), where N is the number of nodes and H is the height of the tree, because comparing each pair of subtrees can take up to O(H) time.
  • Space Complexity: O(N) for storing all subtrees.
Optimized approach (using serialization and hash map):
  • Time Complexity: O(N), since each node is visited once and serialization is built in O(1) per node (if using tuple or string concatenation).
  • Space Complexity: O(N), for storing the serializations and the result.

The optimized approach is much more efficient and scalable for large trees.

Summary

In this problem, we efficiently detect duplicate subtrees by serializing each subtree and using a hash map to count occurrences. This approach avoids expensive subtree comparisons and leverages fast lookups, making it suitable for large binary trees.

The key insight is that identical subtrees will have identical serializations, so we can detect duplicates by comparing these serialized representations. By using post-order traversal, we ensure that each subtree's serialization is built from its children upward.

This method is both elegant and efficient, demonstrating how a clever representation can transform a brute-force problem into a tractable one.

Code Implementation

# Definition for a binary tree node.
class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

class Solution:
    def findDuplicateSubtrees(self, root):
        from collections import defaultdict
        count = defaultdict(int)
        result = []

        def serialize(node):
            if not node:
                return "#"
            serial = "{},{},{}".format(
                node.val, serialize(node.left), serialize(node.right))
            count[serial] += 1
            if count[serial] == 2:
                result.append(node)
            return serial

        serialize(root)
        return result
      
// Definition for a binary tree node.
struct TreeNode {
    int val;
    TreeNode *left;
    TreeNode *right;
    TreeNode(int x) : val(x), left(NULL), right(NULL) {}
};

class Solution {
public:
    vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
        unordered_map<string, int> count;
        vector<TreeNode*> result;
        serialize(root, count, result);
        return result;
    }
    
    string serialize(TreeNode* node, unordered_map<string, int>& count, vector<TreeNode*>& result) {
        if (!node) return "#";
        string serial = to_string(node->val) + "," + 
                        serialize(node->left, count, result) + "," +
                        serialize(node->right, count, result);
        if (++count[serial] == 2)
            result.push_back(node);
        return serial;
    }
};
      
// Definition for a binary tree node.
public class TreeNode {
    int val;
    TreeNode left;
    TreeNode right;
    TreeNode(int x) { val = x; }
}

class Solution {
    public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
        Map<String, Integer> count = new HashMap<>();
        List<TreeNode> result = new ArrayList<>();
        serialize(root, count, result);
        return result;
    }
    
    private String serialize(TreeNode node, Map<String, Integer> count, List<TreeNode> result) {
        if (node == null) return "#";
        String serial = node.val + "," + serialize(node.left, count, result) + "," + serialize(node.right, count, result);
        count.put(serial, count.getOrDefault(serial, 0) + 1);
        if (count.get(serial) == 2) {
            result.add(node);
        }
        return serial;
    }
}
      
/**
 * Definition for a binary tree node.
 * function TreeNode(val, left, right) {
 *     this.val = (val===undefined ? 0 : val)
 *     this.left = (left===undefined ? null : left)
 *     this.right = (right===undefined ? null : right)
 * }
 */
/**
 * @param {TreeNode} root
 * @return {TreeNode[]}
 */
var findDuplicateSubtrees = function(root) {
    const count = new Map();
    const result = [];
    
    function serialize(node) {
        if (!node) return "#";
        const serial = node.val + "," + serialize(node.left) + "," + serialize(node.right);
        count.set(serial, (count.get(serial) || 0) + 1);
        if (count.get(serial) === 2) {
            result.push(node);
        }
        return serial;
    }
    
    serialize(root);
    return result;
};