## How to find the longest continuous subsequence whose reverse is also a subsequence

Actually this is the longest common sub*string* problem applied to the sequence and its reverse: http://en.wikipedia.org/wiki/Longest_common_substring_problem

This is distinct from longest common sub*sequence*: http://en.wikipedia.org/wiki/Subsequence#Substring_vs._subsequence

## Longest common sub-sequence with a certain property?

First let's reduce the number of states: Let `f(i, j, d)`

be the length of the longest common zig-zag sequence starting at position i in the first string and position j in the second string and starting with direction d (up or down).

We have the recurrence

`f(i, j, up) >= MAX(i' > i, j' > j : f(i', j', up))`

if s1[i] = s2[j]:

f(i, j, up) >= MAX(i' > i, j' > j, s1[i'] > x : f(i', j', down))

an similar for the `down`

direction. Solving this in a straightforward way

will lead to a runtime of something like O(n^{4} · W) where W is the range of integers in the array. W is not polynomially bounded, so we definitely want to get rid of this factor, and ideally a couple of n factors along the way.

To solve the first part, you have to find the maximum f(i', j', up) with

i' > i and j' > j. This is a standard standard 2-d orthogonal range maximum query.

For the second case, you need to find the maximum (i', j', down) with i' > i, j' > j and s1[i'] > s1[i]. That is a range maximum query in the rectangle (i, ∞) x (j, ∞) x (s1[i], ∞).

Now having 3 dimensions here looks scary. However, if we process the states in say, decreasing order of i, then we can get rid of one dimension.

We thus reduced the problem to a range query in the rectangle (j, ∞) x (s1[i], ∞). Coordinate compression gets the dimension of values down to O(n).

You can use a 2-d data structure such as a range tree or binary-indexed tree to solve both kinds of range queries in O(log^{2} n). The total runtime will be O(n^{2} · log^{2} n).

You can get rid of one log factor using fractional cascading, but that is associated with a high constant factor. The runtime is then only one log-factor short of that for finding the longest common subsequence, which seems like a lower-bound for our problem.

## Longest Common Sub-sequence of N sequences (for diff purposes)

Yes, you're missing the correctness: there is no guarantee that the LCS of a pair of strings will have any overlap whatsoever with the LCS of the set overall. Consider this example:

`aaabb1xyz`

aaabb2xyz

cccdd1xyz

cccdd2xyz

If you pair these in the given order, you'll get LCSs of `aaabb`

and `cccdd`

, missing the `xyz`

for the set.

If, as you say, the strings are almost all identical, perhaps the differences aren't a problem for you. If the not-identical strings are very similar to the "median" string, then your incremental solution will work well enough for your purposes.

Another possibility is to do LCS on random pairs of strings until that median string emerges; then you start from that common point, and you should have a "good enough" solution.

### Related Topics

Is the C# Compiler Smart Enough to Optimize This Code

Task<> Does Not Contain a Definition for 'Getawaiter'

How Can Xml Documentation for Web API Include Documentation from Beyond the Main Project

Private Inner Classes in C# - Why Aren't They Used More Often

Dependency Injection VS Service Location

In C#, How to Cast a List<Child> to List<Parent>

Configurationmanager.Appsettings - How to Modify and Save

Having the Output of a Console Application in Visual Studio Instead of the Console

How to Pass an Object to Httpclient.Postasync and Serialize as a JSON Body

.Net Core 3.0: Razor Views Don't Automatically Recompile on Change

Copying Files into the Application Folder at Compile Time

Linq Order By, Group by and Order by Each Group