In many programming problems involving strings, we often need to search for occurrences of a pattern inside a text. A classic approach like the naive string matching algorithm checks for the pattern at every index of the text, leading to a time complexity of O(n·m), where n is the length of the text and m is the length of the pattern. This is inefficient for large inputs.
To address this, several efficient string matching algorithms exist one of them is the Z-Algorithm, which allows us to perform pattern matching in linear time. That means, it can find all matches in O(n + m) time.
The Z-Algorithm revolves around computing something called the Z-array.
Let’s say we have a string s of length n.
The Z-array Z[0..n-1] stores, for each index i, thelength of the longest substring startingat i that is also a prefix of the string s.
In simpler terms:
Z[i] tells us how many characters from position i onwards match with the beginning of the string.
Let’s take a small example to understand this better before we move on to the construction of the Z-array.
Example: s = "aabxaab".
C++
Z[0]=0// by definition; we don't compare the full string with itself
Now, we compute the remaining values from index 1 to 6.
Z[1] = 1 → Only the first character 'a' matches with the prefix.
Z[2] = 0 → 'b' does not match the first character of the prefix 'a'.
Z[3] = 0 → 'x' does not match the first character of the prefix 'a'.
Z[4] = 3 → The substring "aab" matches the prefix "aab".
Z[5] = 1 → Only 'a' matches with the first character of the prefix.
Z[6] = 0 → 'b' does not match the first character of the prefix 'a'.
Final Z-array
C++
z[]=[0,1,0,0,3,1,0]
Calculation of Z Array
The naive approach to compute the Z-array checks, for each index i, how many characters from s[i...] match the prefix starting at s[0]. This can lead to O(n²) time in the worst case. However, using the Z-algorithm, we can compute all Z[i] values in O(n) time.
While computing the Z-array, we maintain a window [l, r], known as the Z-box, which represents the rightmost substring that matches the prefix of the string.
l is the starting index of the current Z-box (where the prefix match begins).
r is the ending index (the farthest position that matches the prefix).
Specifically, s[l...r] matches s[0...(r - l)].
This window helps us reuse previous computations to optimize Z-array construction.
Why is Z-box helpful?
When processing index i, there are two possibilities:
If i > r (Outside the Z-box): => Start comparing the prefix and the substring beginning at i. => Count the number of matching characters and store this length in Z[i]. => Update the window [L, R] to represent this new matching segment.
If i ≤ r: => Let k be the position corresponding to i within the prefix (k = i - L). => Use the value Z[k] as a reference. -> If Z[k] is strictly less than the remaining length in [L, R], assign Z[i] = Z[k]. -> Otherwise, begin comparing characters beyond the current window to extend the match. => After extending, update the window [L, R] if a longer match was found.
C++
#include<iostream>#include<string>#include<vector>#include<algorithm>usingnamespacestd;vector<int>zFunction(strings){intn=s.length();vector<int>z(n);intl=0,r=0;for(inti=1;i<n;i++){if(i<=r){intk=i-l;// Case 2: reuse the previously computed valuez[i]=min(r-i+1,z[k]);}// Try to extend the Z-box beyond rwhile(i+z[i]<n&&s[z[i]]==s[i+z[i]]){z[i]++;}// Update the [l, r] window if extendedif(i+z[i]-1>r){l=i;r=i+z[i]-1;}}returnz;}intmain(){strings="aabxaab";vector<int>z=zFunction(s);for(inti=0;i<z.size();++i){cout<<z[i]<<" ";}}
Java
importjava.util.ArrayList;importjava.util.Arrays;publicclassGfG{publicstaticArrayList<Integer>zFunction(Strings){intn=s.length();ArrayList<Integer>z=newArrayList<>(n);for(inti=0;i<n;i++){z.add(0);}intl=0,r=0;for(inti=1;i<n;i++){if(i<=r){intk=i-l;// Case 2: reuse the previously computed valuez.set(i,Math.min(r-i+1,z.get(k)));}// Try to extend the Z-box beyond rwhile(i+z.get(i)<n&&s.charAt(z.get(i))==s.charAt(i+z.get(i))){z.set(i,z.get(i)+1);}// Update the [l, r] window if extendedif(i+z.get(i)-1>r){l=i;r=i+z.get(i)-1;}}returnz;}publicstaticvoidmain(String[]args){Strings="aabxaab";ArrayList<Integer>z=zFunction(s);for(intx:z){System.out.print(x+" ");}}}
Python
defzFunction(s):n=len(s)z=[0]*nl,r=0,0foriinrange(1,n):ifi<=r:k=i-l# Case 2: reuse the previously computed valuez[i]=min(r-i+1,z[k])# Try to extend the Z-box beyond rwhilei+z[i]<nands[z[i]]==s[i+z[i]]:z[i]+=1# Update the [l, r] window if extendedifi+z[i]-1>r:l=ir=i+z[i]-1returnzif__name__=="__main__":z=zFunction("aabxaab")print(" ".join(map(str,z)))
C#
usingSystem;usingSystem.Collections.Generic;publicclassGfG{publicstaticList<int>zFunction(strings){intn=s.Length;List<int>z=newList<int>(newint[n]);intl=0,r=0;for(inti=1;i<n;i++){if(i<=r){intk=i-l;// Case 2: reuse the previously computed valuez[i]=Math.Min(r-i+1,z[k]);}// Try to extend the Z-box beyond rwhile(i+z[i]<n&&s[z[i]]==s[i+z[i]]){z[i]++;}// Update the [l, r] window if extendedif(i+z[i]-1>r){l=i;r=i+z[i]-1;}}returnz;}publicstaticvoidMain(){strings="aabxaab";List<int>result=zFunction(s);Console.WriteLine(string.Join(" ",result));}}
JavaScript
functionzFunction(s){letn=s.length;letz=newArray(n).fill(0);letl=0,r=0;for(leti=1;i<n;i++){if(i<=r){letk=i-l;// Case 2: reuse the previously computed valuez[i]=Math.min(r-i+1,z[k]);}// Try to extend the Z-box beyond rwhile(i+z[i]<n&&s.charAt(z[i])===s.charAt(i+z[i])){z[i]++;}// Update the [l, r] window if extendedif(i+z[i]-1>r){l=i;r=i+z[i]-1;}}returnz;}// Driver Codeconstz=zFunction("aabxaab");console.log(z.join(" "));
Output
0 1 0 0 3 1 0
Time Complexity: O(n) Auxiliary Space: O(n)
Why This Works in Linear Time
The key to linear time complexity is that every time we do character comparisons (manual matching), we extend r the right end of the Z-box.
Since r only moves forward and never backward, the total number of such comparisons is at most n.
How Z-array Helps in Pattern Matching
Given two strings text (the text) and pattern (the pattern), consisting of lowercase English alphabets, find all 0-based starting indices where pattern occurs as a substring in text.
The key idea is to preprocess a new string formed by combining the pattern and the text, separated by a special delimiter (e.g., $) that doesn’t appear in either string. This avoids accidental overlaps.
We construct a new string as:
Kotlin
s=pattern+'$'+text
We then compute the Z-array for this combined string.
The Z-array at any position i tells us the length of the longest prefix of the pattern that matches the substring of the text starting at that position (adjusted for offset due to the pattern and separator).
So, whenever we find a position i such that:
Perl
Z[i]==lengthofpattern
it means the entire pattern matches the text at a position:
Perl
matchposition=i-(patternlength+1)
Illustrations:
1 / 15
C++
#include<iostream>#include<vector>usingnamespacestd;// Z-function to compute Z-arrayvector<int>zFunction(string&s){intn=s.length();vector<int>z(n);intl=0,r=0;for(inti=1;i<n;i++){if(i<=r){intk=i-l;// Case 2: reuse the previously computed valuez[i]=min(r-i+1,z[k]);}// Try to extend the Z-box beyond rwhile(i+z[i]<n&&s[z[i]]==s[i+z[i]]){z[i]++;}// Update the [l, r] window if extendedif(i+z[i]-1>r){l=i;r=i+z[i]-1;}}returnz;}// Function to find all occurrences of pattern in textvector<int>search(string&text,string&pattern){strings=pattern+'$'+text;vector<int>z=zFunction(s);vector<int>pos;intm=pattern.size();for(inti=m+1;i<z.size();i++){if(z[i]==m){// pattern match starts here in textpos.push_back(i-m-1);}}returnpos;}intmain(){stringtext="aabxaabxaa";stringpattern="aab";vector<int>matches=search(text,pattern);for(intpos:matches)cout<<pos<<" ";return0;}
Java
importjava.util.ArrayList;importjava.util.Arrays;publicclassGfG{// Z-function to compute Z-arraystaticArrayList<Integer>zFunction(Strings){intn=s.length();ArrayList<Integer>z=newArrayList<>();for(inti=0;i<n;i++){z.add(0);}intl=0,r=0;for(inti=1;i<n;i++){if(i<=r){intk=i-l;// Case 2: reuse the previously computed valuez.set(i,Math.min(r-i+1,z.get(k)));}// Try to extend the Z-box beyond rwhile(i+z.get(i)<n&&s.charAt(z.get(i))==s.charAt(i+z.get(i))){z.set(i,z.get(i)+1);}// Update the [l, r] window if extendedif(i+z.get(i)-1>r){l=i;r=i+z.get(i)-1;}}returnz;}// Function to find all occurrences of pattern in textstaticArrayList<Integer>search(Stringtext,Stringpattern){Strings=pattern+'$'+text;ArrayList<Integer>z=zFunction(s);ArrayList<Integer>pos=newArrayList<>();intm=pattern.length();for(inti=m+1;i<z.size();i++){if(z.get(i)==m){// pattern match starts here in textpos.add(i-m-1);}}returnpos;}publicstaticvoidmain(String[]args){Stringtext="aabxaabxaa";Stringpattern="aab";ArrayList<Integer>matches=search(text,pattern);for(intpos:matches)System.out.print(pos+" ");}}
Python
defzFunction(s):n=len(s)z=[0]*nl,r=0,0foriinrange(1,n):ifi<=r:k=i-l# Case 2: reuse the previously computed valuez[i]=min(r-i+1,z[k])# Try to extend the Z-box beyond rwhilei+z[i]<nands[z[i]]==s[i+z[i]]:z[i]+=1# Update the [l, r] window if extendedifi+z[i]-1>r:l=ir=i+z[i]-1returnzdefsearch(text,pattern):s=pattern+'$'+textz=zFunction(s)pos=[]m=len(pattern)foriinrange(m+1,len(z)):ifz[i]==m:# pattern match starts here in textpos.append(i-m-1)returnposif__name__=='__main__':text='aabxaabxaa'pattern='aab'matches=search(text,pattern)forposinmatches:print(pos,end=' ')
C#
usingSystem;usingSystem.Collections.Generic;publicclassGfG{// Z-function to compute Z-arraystaticList<int>zFunction(strings){intn=s.Length;List<int>z=newList<int>(newint[n]);intl=0,r=0;for(inti=1;i<n;i++){if(i<=r){intk=i-l;// Case 2: reuse the previously computed valuez[i]=Math.Min(r-i+1,z[k]);}// Try to extend the Z-box beyond rwhile(i+z[i]<n&&s[z[i]]==s[i+z[i]]){z[i]++;}// Update the [l, r] window if extendedif(i+z[i]-1>r){l=i;r=i+z[i]-1;}}returnz;}// Function to find all occurrences of pattern in textstaticList<int>search(stringtext,stringpattern){strings=pattern+'$'+text;List<int>z=zFunction(s);List<int>pos=newList<int>();intm=pattern.Length;for(inti=m+1;i<z.Count;i++){if(z[i]==m){// pattern match starts here in textpos.Add(i-m-1);}}returnpos;}publicstaticvoidMain(){stringtext="aabxaabxaa";stringpattern="aab";List<int>matches=search(text,pattern);foreach(intposinmatches)Console.Write(pos+" ");}}
JavaScript
functionzFunction(s){letn=s.length;letz=newArray(n).fill(0);letl=0,r=0;for(leti=1;i<n;i++){if(i<=r){letk=i-l;// Case 2: reuse the previously computed valuez[i]=Math.min(r-i+1,z[k]);}// Try to extend the Z-box beyond rwhile(i+z[i]<n&&s[z[i]]===s[i+z[i]]){z[i]++;}// Update the [l, r] window if extendedif(i+z[i]-1>r){l=i;r=i+z[i]-1;}}returnz;}functionsearch(text,pattern){lets=pattern+'$'+text;letz=zFunction(s);letpos=[];letm=pattern.length;for(leti=m+1;i<z.length;i++){if(z[i]===m){// pattern match starts here in textpos.push(i-m-1);}}returnpos;}// Driver Codelettext='aabxaabxaa';letpattern='aab';letmatches=search(text,pattern);console.log(matches.join(" "));
Output
0 4
Time Complexity: O(n + m), where n is the length of the text and m is the length of the pattern, since the combined string and Z-array are processed linearly. Auxiliary Space: O(n + m), used to store the combined string and the Z-array for efficient pattern matching.
Advantages of Z-Algorithm
Linear Time Complexity for pattern matching.
Uses prefix comparison, avoiding re-evaluation of matched characters.
Easier to code than KMP; works directly with prefix matches.
Useful for preprocessing in multiple string problems beyond pattern matching.
Real-Life Applications
Search Tools in Text Editors (e.g., VsCode, Sublime)
Plagiarism Detection Systems (detect repeated blocks)
We use cookies to ensure you have the best browsing experience on our website. By using our site, you
acknowledge that you have read and understood our
Cookie Policy &
Privacy Policy
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.