Want Help Cracking FAANG?

(Then click this)

×
Back to Question Bank

287. Find the Duplicate Number - Leetcode Solution

Problem Description

The "Find the Duplicate Number" problem is a classic algorithmic challenge. You are given an array nums containing n + 1 integers, where each integer is in the range [1, n] (inclusive). There is only one repeated number in nums, but it could appear more than once. Your task is to find and return the duplicate number.

  • Constraints:
    • There is exactly one duplicated number in the array.
    • You cannot modify the input array (the array is read-only).
    • You must use constant extra space (O(1) space).
    • The runtime complexity must be less than O(n2).
    • You cannot use elements twice for matching (i.e., do not reuse elements).

Example:
Input: nums = [1,3,4,2,2]
Output: 2

Thought Process

At first glance, the problem seems to invite a brute-force solution: compare every pair of elements and return the duplicate. However, this approach is too slow (O(n2)), especially for large arrays.

Next, you might consider using a hash set to track seen numbers, but the problem requires constant extra space, so this is not allowed.

The key to solving this efficiently is to realize that the array's properties are similar to a linked list with a cycle: since there are n + 1 numbers in the range [1, n], at least one number must repeat (by the pigeonhole principle). This insight allows us to use Floyd's Tortoise and Hare (Cycle Detection) algorithm, which finds cycles in linked lists using two pointers and constant space.

Solution Approach

Here’s how we can use Floyd’s Cycle Detection algorithm to find the duplicate number:

  1. Model the Array as a Linked List:
    • Treat each value in nums as a “pointer” to the next index (i.e., at index i, the next index is nums[i]).
    • Because there is a duplicate, this structure must contain a cycle (the duplicate points to the same index as another value).
  2. Initialize Two Pointers:
    • Start both slow and fast at the start of the array (e.g., slow = nums[0], fast = nums[nums[0]]).
  3. Find the Intersection Point:
    • Move slow by one step and fast by two steps until they meet. This guarantees they meet inside the cycle.
  4. Find the Entrance to the Cycle (Duplicate Number):
    • Reset slow to the start of the array. Move both slow and fast one step at a time. The point where they meet again is the duplicate number.

Why This Works: The cycle's entry point corresponds to the duplicate number because that's where two "pointers" (indices) point to the same value.

Example Walkthrough

Let's walk through the solution with the input nums = [3,1,3,4,2].

  1. Model as Linked List:
    • Index 0 → 3
    • Index 1 → 1
    • Index 2 → 3
    • Index 3 → 4
    • Index 4 → 2
  2. Initialize:
    • slow = nums[0] = 3
    • fast = nums[nums[0]] = nums[3] = 4
  3. First Iteration:
    • slow = nums[3] = 4
    • fast = nums[nums[4]] = nums[2] = 3
  4. Second Iteration:
    • slow = nums[4] = 2
    • fast = nums[nums[3]] = nums[4] = 2
  5. Now, slow == fast (both are 2). Intersection found.
  6. Find Entrance to Cycle:
    • Reset slow to index 0: slow = nums[0] = 3; fast remains at 2.
    • Move both one step at a time:
    • slow = nums[3] = 4; fast = nums[2] = 3
    • slow = nums[4] = 2; fast = nums[3] = 4
    • slow = nums[2] = 3; fast = nums[4] = 2
    • slow = nums[3] = 4; fast = nums[2] = 3
    • Eventually, both meet at index 3 (value 3), which is the duplicate.

Result: The duplicate number is 3.

Time and Space Complexity

  • Brute-force Approach:
    • Time Complexity: O(n2) — Compare every pair of elements.
    • Space Complexity: O(1) — No extra space, but too slow.
  • Using a Hash Set:
    • Time Complexity: O(n)
    • Space Complexity: O(n) — Not allowed by constraints.
  • Floyd’s Tortoise and Hare (Optimized Solution):
    • Time Complexity: O(n) — Each pointer traverses at most 2n steps.
    • Space Complexity: O(1) — Uses only two pointers, meets the problem requirement.

Code Implementation

class Solution:
    def findDuplicate(self, nums):
        # Phase 1: Find the intersection point
        slow = nums[0]
        fast = nums[0]
        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]
            if slow == fast:
                break
        # Phase 2: Find the entrance to the cycle
        slow = nums[0]
        while slow != fast:
            slow = nums[slow]
            fast = nums[fast]
        return slow
      
class Solution {
public:
    int findDuplicate(vector<int>& nums) {
        int slow = nums[0];
        int fast = nums[0];
        // Phase 1
        do {
            slow = nums[slow];
            fast = nums[nums[fast]];
        } while (slow != fast);
        // Phase 2
        slow = nums[0];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }
        return slow;
    }
};
      
class Solution {
    public int findDuplicate(int[] nums) {
        int slow = nums[0];
        int fast = nums[0];
        // Phase 1
        do {
            slow = nums[slow];
            fast = nums[nums[fast]];
        } while (slow != fast);
        // Phase 2
        slow = nums[0];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }
        return slow;
    }
}
      
var findDuplicate = function(nums) {
    let slow = nums[0];
    let fast = nums[0];
    // Phase 1
    do {
        slow = nums[slow];
        fast = nums[nums[fast]];
    } while (slow !== fast);
    // Phase 2
    slow = nums[0];
    while (slow !== fast) {
        slow = nums[slow];
        fast = nums[fast];
    }
    return slow;
};
      

Summary

The "Find the Duplicate Number" problem is elegantly solved using Floyd’s Tortoise and Hare cycle detection algorithm. By treating the array as a linked list where each value points to the next index, we exploit the duplicate to detect a cycle and find its entrance — the duplicate number. This method is efficient, requiring only O(n) time and O(1) space, and demonstrates how classic algorithms can be applied to array problems when direct approaches are not allowed by the constraints.