Algorithm Implementation/Strings/Longest common substring

Common dynamic programming implementations for the Longest Common Substring algorithm runs in O(nm) time. All of these implementations also use O(nm) storage. The astute reader will notice that only the previous column of the grid storing the dynamic state is ever actually used in computing the next column. Thus, these algorithm can be altered to have only an O(n) storage requirement. By reassigning array references between two 1D arrays, this can be done without copying the state data from one array to another. I may return later and update this page accordingly; for now, this optimization is left as an exercise to the reader.

For large n, faster algorithms based on rolling hashes exist that run in O(n log n) time and require O(n log n) storage.

Suffix trees can be used to achieve a O(n+m) run time at the cost of extra storage and algorithmic complexity.

Length of Longest Substring
Given two non-empty strings as parameters, this method will return the length of the longest substring common to both parameters. A variant, below, returns the actual string.

Retrieve the Longest Substring
This example uses the out keyword to pass in a string reference which the method will set to a string containing the longest common substring.

The extra complexity in this method keeps the number of new String objects created to a minimum. This is important in C# because, since strings are immutable: every time a string field is assigned to, the old string sits in memory until the garbage collector runs. Therefore some effort was put into keeping the number of new strings low.

The algorithm might be simplified (left as an exercise to the reader) by tracking only the start position (in, say str1, or both str1 and str2) of the string, and leaving it to the caller to extract the string using this and the returned length. Such a variant may prove more useful, too, as the actual locations in the subject strings would be identified.

COBOL
This algorithm uses no extra storage, but it runs in O(mnl) time. The 2 strings to compare should be placed in WS-TEXT1 and WS-TEXT2, and their lengths placed in WS-LEN1 and WS-LEN2, respectively. The output of this routine is MAX-LEN, the length of the largest common substring, WS-LOC1, the location within WS-TEXT1 where it starts, and WS-LOC2, the location within WS-TEXT2 where it starts.

Java
- Java-Adaptation of C# code for retrieving the longest substring

JavaScript
Variant to return the longest common substring and offset along with the length

TypeScript
Brute force as per other algorithms on this page, but storage is O(2n) as opposed to other implementations which require O(mn) storage