(Translated by https://www.hiragana.jp/)
Rabin-Karp Algorithm for Pattern Searching - GeeksforGeeks
Open In App

Rabin-Karp Algorithm for Pattern Searching

Last Updated : 25 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Given two strings txt (the text) and pat (the pattern), consisting of lowercase English alphabets, find all 0-based starting indices where pat occurs as a substring in txt.

Examples: 

Input: txt = "geeksforgeeks", pat = "geeks"
Output: [0, 8]
Explanation: The string "geeks" occurs at index 0 and 8 in text.

Input: txt = "aabaacaadaabaaba", pat= "aaba"
Output: [0, 9, 12]
Explanation:

kmp-algorithm-for-pattern-searching

[Naive Approach] Brute Force Pattern Matching - O(n × m) Time and O(1) Space

The simplest way to solve this problem is to check for the pattern pat at every possible position in the text txt.

  • Slide the pattern over the text one character at a time.
  • For each position i from 0 to n - m (where n is the length of txt, m is the length of pat), compare the substring txt[i ... i + m - 1] with pat.
  • If they match, store index i as an occurrence

[Expected Approach] Rabin-Karp with Single Rolling Hash

In the Naive String Matching algorithm, we check whether every substring of the text of the pattern's size is equal to the pattern or not one by one.

Like the Naive Algorithm, the Rabin-Karp algorithm also check every substring. But unlike the Naive algorithm, the Rabin Karp algorithm matches the hash value of the pattern with the hash value of the current substring of text, and if the hash values match then only it starts matching individual characters. So Rabin Karp algorithm needs to calculate hash values for the following strings.

  • Pattern itself
  • All the substrings of the text of length m which is the size of pattern.

How is Hash Value calculated in Rabin-Karp?

The hash value in Rabin-Karp is calculated using a rolling hash function, which allows efficient hash updates as the pattern slides over the text. Instead of recalculating the entire hash for each substring, the rolling hash lets us remove the contribution of the old character and add the new one in constant time.

A string is converted into a numeric hash using a polynomial rolling hash. For a string s of length n, the hash is computed as:

=> hash(s) = (s[0] * p(n-1) + s[1] * p(n-2) + ... + s[n-1] * p0 ) %mod

Where:

  • s[i] is the numeric value of the i-th character ('a' = 1, 'b' = 2, ..., 'z' = 26)
  • p is a small prime number (commonly 31 or 37)
  • mod is a large prime number (like 1e9 + 7) to avoid overflow and reduce hash collisions

This approach allows us to compute hash values of substrings in constant time using precomputed powers and prefix hashes.

Hash Recurrence Relation:

Let preHash[i] represent the hash of the prefix substring s[0...i].

Then the recurrence is: preHash[i] = preHash[i - 1] * base + s[i]

Where:

  • p[0] = s[0]
  • s[i] is the numeric value of the i-th character ('a' = 1, 'b' = 2, ..., 'z' = 26)
  • base is a chosen prime number (commonly 31 or 37)
  • All operations are done under modulo mod to avoid overflow

How to Compute Substring Hash in O(1):

Since we have computed preHash[] array:
=> preHash[i] → the hash of the prefix s[0...i]
=> power[i] → (p^i) % mod, for all required i

Now to compute the hash of a substring s[l...r] (from index l to r), you use:

hash(s[l...r]) = (preHash[r] - (preHash[l - 1] * power[r - l + 1])) % mod

if l == 0: hash(s[0...r]) = preHash[r]

C++
Java Python C# JavaScript

Output
0 8 

Time Complexity: O(n + m), we compute prefix hashes and powers for both text and pattern in O(n + m). Then, we slide a window over the text, and each substring hash is compared in O(1).
Auxiliary Space: O(n + m) ,we store prefix hashes and power arrays for both text and pattern, taking O(n + m) space. Additionally, we use O(k) space for the result where k is the number of matches (bounded by O(n)).

[Efficient Approach] Rabin-Karp with Double Hashing

The idea behind double hashing is to reduce the probability of hash collisions by computing two hashes with different bases and moduli, and only considering a match if both hashes match.

Why Single Hash Can Fail (Hash Collisions):

When using a single hash function, there's always a chance that two different substrings may produce the same hash value. This is called a hash collision.

Why does it happen?
=> We're computing hashes modulo a large number (e.g., 10^9 + 7)
=> But since the number of possible substrings is huge, different substrings might accidentally result in the same hash after taking the modulo.

Consequence: If two different substrings have the same hash, the algorithm may falsely report a match (false positive). Rabin-Karp, in such a case, needs to verify the actual characters to confirm a match — which slows down performance.

Need for Double Hashing:

To reduce the probability of collisions, we use double hashing — i.e., compute two independent hashes with different: Base values (p1, p2) and Moduli (mod1, mod2)

How it helps:

=> Now, two substrings are considered equal only if both hash values match.
=> The probability of two different substrings colliding in both hash functions is extremely low — roughly 1/(mod1 x mod2), which is practically negligible.

C++
Java Python C# JavaScript

Output
0 8 

Time Complexity: O(n + m), we compute prefix hashes and powers for both text and pattern in O(n + m). Then, we slide a window over the text, and each substring hash is compared in O(1).
Auxiliary Space: O(n + m) ,we store prefix hashes and power arrays for both text and pattern, taking O(n + m) space. Additionally, we use O(k) space for the result where k is the number of matches (bounded by O(n)).

Limitations of Rabin-Karp Algorithm

When the hash value of the pattern matches with the hash value of a window of the text but the window is not the actual pattern then it is called a spurious hit. Spurious hit increases the time complexity of the algorithm. In order to minimize spurious hit, we use good hash function. It greatly reduces the spurious hit.

Related Articles: 
Searching for Patterns | Set 1 (Naive Pattern Searching) 
Searching for Patterns | Set 2 (KMP Algorithm)


Search Pattern (Rabin-Karp Algorithm) | DSA Problem
Next Article

Similar Reads