Want Help Cracking FAANG?

(Then click this)

×
Back to Question Bank

1157. Online Majority Element In Subarray - Leetcode Solution

Code Implementation

import random
import bisect
from collections import defaultdict

class MajorityChecker:

    def __init__(self, arr):
        self.arr = arr
        self.idx_map = defaultdict(list)
        for i, num in enumerate(arr):
            self.idx_map[num].append(i)

    def query(self, left, right, threshold):
        for _ in range(20):
            rand_idx = random.randint(left, right)
            candidate = self.arr[rand_idx]
            idx_list = self.idx_map[candidate]
            l = bisect.bisect_left(idx_list, left)
            r = bisect.bisect_right(idx_list, right)
            if r - l >= threshold:
                return candidate
        return -1
      
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <random>

using namespace std;

class MajorityChecker {
    vector<int> arr;
    unordered_map<int, vector<int>> idx_map;
    mt19937 rng;
public:
    MajorityChecker(vector<int>& arr): arr(arr), rng(random_device{}()) {
        for (int i = 0; i < arr.size(); ++i)
            idx_map[arr[i]].push_back(i);
    }
    
    int query(int left, int right, int threshold) {
        uniform_int_distribution<int> dist(left, right);
        for (int t = 0; t < 20; ++t) {
            int rand_idx = dist(rng);
            int candidate = arr[rand_idx];
            const vector<int>& idx_list = idx_map[candidate];
            auto l = lower_bound(idx_list.begin(), idx_list.end(), left);
            auto r = upper_bound(idx_list.begin(), idx_list.end(), right);
            if (r - l >= threshold)
                return candidate;
        }
        return -1;
    }
};
      
import java.util.*;

class MajorityChecker {
    int[] arr;
    Map<Integer, List<Integer>> idxMap;
    Random rand;

    public MajorityChecker(int[] arr) {
        this.arr = arr;
        idxMap = new HashMap<>();
        rand = new Random();
        for (int i = 0; i < arr.length; ++i) {
            idxMap.computeIfAbsent(arr[i], k -> new ArrayList<>()).add(i);
        }
    }

    public int query(int left, int right, int threshold) {
        for (int t = 0; t < 20; ++t) {
            int randIdx = left + rand.nextInt(right - left + 1);
            int candidate = arr[randIdx];
            List<Integer> idxList = idxMap.get(candidate);
            int l = Collections.binarySearch(idxList, left);
            int r = Collections.binarySearch(idxList, right + 1);
            if (l < 0) l = -l - 1;
            if (r < 0) r = -r - 1;
            if (r - l >= threshold)
                return candidate;
        }
        return -1;
    }
}
      
class MajorityChecker {
    constructor(arr) {
        this.arr = arr;
        this.idxMap = new Map();
        for (let i = 0; i < arr.length; ++i) {
            if (!this.idxMap.has(arr[i])) this.idxMap.set(arr[i], []);
            this.idxMap.get(arr[i]).push(i);
        }
    }
    
    query(left, right, threshold) {
        for (let t = 0; t < 20; ++t) {
            let randIdx = left + Math.floor(Math.random() * (right - left + 1));
            let candidate = this.arr[randIdx];
            let idxList = this.idxMap.get(candidate);
            let l = this.lowerBound(idxList, left);
            let r = this.upperBound(idxList, right);
            if (r - l >= threshold) return candidate;
        }
        return -1;
    }

    lowerBound(arr, target) {
        let l = 0, r = arr.length;
        while (l < r) {
            let m = (l + r) >> 1;
            if (arr[m] < target) l = m + 1;
            else r = m;
        }
        return l;
    }
    upperBound(arr, target) {
        let l = 0, r = arr.length;
        while (l < r) {
            let m = (l + r) >> 1;
            if (arr[m] <= target) l = m + 1;
            else r = m;
        }
        return l;
    }
}
      

Problem Description

You are given an array arr of integers. You need to design a data structure that, given queries of the form (left, right, threshold), efficiently returns the majority element in the subarray arr[left..right] (inclusive), if it occurs at least threshold times. If no such element exists, return -1.

  • The majority element in a subarray is the value that appears at least threshold times within the range [left, right].
  • Each query is independent, and multiple queries will be performed.
  • Constraints:
    • 1 <= arr.length <= 2 * 10^4
    • 1 <= arr[i] <= 10^9
    • queries may be up to 10^4

The solution must be efficient enough to handle many queries, and brute-force enumeration of the subarray for each query is too slow.

Thought Process

Let's start by considering the brute-force approach: for each query, count the frequency of every element in the subarray arr[left..right]. If any element occurs at least threshold times, return it; otherwise, return -1.

However, this approach is inefficient, especially for large arrays and many queries, because each query could take O(N) time. We need to optimize.

Key insight: The majority element (if one exists) must appear frequently in the subarray. If we randomly pick indices in the subarray, the majority element is more likely to be picked than others. This suggests a probabilistic (random sampling) approach.

To check the frequency of a candidate quickly, we can precompute and store the list of indices for each value in arr. Then, we can binary search this list to count how many times the candidate appears in the queried range.

This approach combines random sampling (to efficiently guess a likely majority) with fast lookup (using binary search on precomputed positions).

Solution Approach

The solution uses a combination of pre-processing and random sampling to answer queries efficiently.

  1. Preprocessing:
    • Build a mapping from each unique value in arr to a sorted list of its indices. This allows us to quickly count how many times a value appears in any subarray using binary search.
  2. Answering a Query:
    • For a query (left, right, threshold), repeat the following up to 20 times:
      • Randomly select an index between left and right.
      • Let candidate be the value at that index.
      • Using the precomputed index list for candidate, use binary search to count how many times candidate appears in [left, right].
      • If the count is at least threshold, return candidate as the answer.
    • If no candidate is found after 20 attempts, return -1.
  3. Why Random Sampling?
    • If a majority element exists, it occupies a significant portion of the subarray. Thus, random sampling has a high probability of picking it within a small number of trials.
  4. Why Index Map with Binary Search?
    • Counting occurrences in a range is reduced to two binary searches, making it O(log N) per check, much faster than scanning the subarray.

Example Walkthrough

Suppose arr = [1,2,1,2,1,2,2] and we receive the query (0, 6, 4).

  1. The subarray is [1,2,1,2,1,2,2].
  2. We randomly pick an index, say index 1 (value 2). We check how many times 2 appears between indices 0 and 6.
    • The precomputed index list for 2 is [1,3,5,6].
    • Using binary search, we find that 2 appears at positions 1, 3, 5, and 6 (all within the range), so count = 4.
    • Since 4 ≥ threshold, we return 2 as the answer.
  3. If we had picked an index with value 1, we would check the list [0,2,4], which has only 3 occurrences in the range, which is less than threshold.
  4. If all 20 samples fail to find a candidate, we would return -1, but as shown, a majority is likely to be found quickly.

Time and Space Complexity

  • Brute-force:
    • For each query, O(N) time to count frequencies in the subarray.
    • For Q queries: O(QN) time, which is too slow for large N and Q.
  • Optimized (Random Sampling + Index Map):
    • Preprocessing: O(N), where N is the length of arr, to build the index map.
    • Each query:
      • Up to 20 random picks, each requiring O(log K) time for binary search (K is the number of occurrences of the candidate).
      • Overall, each query is O(20 log N) ≈ O(log N) in practice.
    • Space: O(N) for storing the index map.

Summary

The problem of finding the majority element in an online subarray query is efficiently solved by combining random sampling with precomputed index lists for fast frequency checks. This approach leverages the high probability of sampling a majority element and uses binary search for quick counting. The result is a solution that is both elegant and performant, able to handle large arrays and many queries with ease.