class Solution:
def hIndex(self, citations: List[int]) -> int:
n = len(citations)
paper_counts = [0] * (n+1)
for c in citations:
paper_counts[min(n, c)] += 1
h = n
papers = paper_counts[n]
while papers < h:
h -= 1
papers += paper_counts[h]
return h
# Time: O(n), Space: O(n)
The H-Index is a metric that attempts to measure both the productivity and citation impact of a researcher’s publications. Given a list of citations (where each element represents how many times a paper has been cited), the goal is to determine the highest number h
such that the researcher has published h
papers each cited at least h
times.
This problem is a common interview question that requires understanding of counting and sorting concepts, and it has both intuitive and optimized solutions.
A naive approach might sort the array and check each position to determine how many papers meet the citation threshold. However, this isn't efficient enough for large datasets.
A better strategy uses counting. Since the H-Index can never be greater than the total number of papers n
, we can create a counting array of size n + 1
where each index i
represents how many papers have exactly i
citations. Any citation count higher than n
is treated as n
to simplify handling.
First, initialize an array paper_counts
of size n + 1
with all zeros. Then, for each citation c
in the input array:
c >= n
, increment paper_counts[n]
.paper_counts[c]
.
Once the counting array is populated, we work backwards from h = n
down to 0
, cumulatively adding up how many papers have at least h
citations. The first h
where the number of such papers is at least h
is our answer.
n + 1
.This technique avoids sorting and leverages frequency counting, which allows us to quickly determine how many papers meet each citation threshold without scanning the original array multiple times. It's especially useful for large datasets and maintains linear time performance.
The H-Index problem demonstrates how counting and bounding techniques can be used to solve problems involving thresholds and rankings efficiently. Understanding this approach provides a strong foundation for similar optimization problems involving frequency analysis and numeric constraints.