Longest Increasing Subsequence

Longest K Sequential Increasing Subsequences

This answer is modified from my answer to a similar question at Computer Science Stackexchange.

The LIS problem with at most k exceptions admits a O(n log² n) algorithm using Lagrangian relaxation. When k is larger than log n this improves asymptotically on the O(nk log n) DP, which we will also briefly explain.

Let DP[a][b] denote the length of the longest increasing subsequence with at most b exceptions (positions where the previous integer is larger than the next one) ending at element b a. This DP is not involved in the algorithm, but defining it makes proving the algorithm easier.

For convenience we will assume that all elements are distinct and that the last element in the array is its maximum. Note that this does not limit us, as we can just add m / 2n to the mth appearance of every number, and append infinity to the array and subtract one from the answer. Let V be the permutation for which 1 <= V[i] <= n is the value of the ith element.

To solve the problem in O(nk log n), we maintain the invariant that DP[a][b] has been calculated for b < j. Loop j from 0 to k, at the jth iteration calculating DP[a][j] for all a. To do this, loop i from 1 to n. We maintain the maximum of DP[x][j-1] over x < i and a prefix maximum data structure that at index i will have DP[x][j] at position V[x] for x < i, and 0 at every other position.

We have DP[i][j] = 1 + max(DP[i'][j], DP[x][j-1]) where we go over i', x < i, V[i'] < V[i]. The prefix maximum of DP[x][j-1] gives us the maximum of terms of the second type, and querying the prefix maximum data structure for prefix [0, V[i]] gives us the maximum of terms of the first type. Then update the prefix maximum and prefix maximum data structure.

Here is a C++ implementation of the algorithm. Note that this implementation does not assume that the last element of the array is its maximum, or that the array contains no duplicates.


#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

// Fenwick tree for prefix maximum queries
class Fenwick {
private:
vector<int> val;
public:
Fenwick(int n) : val(n+1, 0) {}

// Sets value at position i to maximum of its current value and
void inc(int i, int v) {
for (++i; i < val.size(); i += i & -i) val[i] = max(val[i], v);
}

// Calculates prefix maximum up to index i
int get(int i) {
int res = 0;
for (++i; i > 0; i -= i & -i) res = max(res, val[i]);
return res;
}
};

// Binary searches index of v from sorted vector
int bins(const vector<int>& vec, int v) {
int low = 0;
int high = (int)vec.size() - 1;
while(low != high) {
int mid = (low + high) / 2;
if (vec[mid] < v) low = mid + 1;
else high = mid;
}
return low;
}

// Compresses the range of values to [0, m), and returns m
int compress(vector<int>& vec) {
vector<int> ord = vec;
sort(ord.begin(), ord.end());
ord.erase(unique(ord.begin(), ord.end()), ord.end());
for (int& v : vec) v = bins(ord, v);
return ord.size();
}

// Returns length of longest strictly increasing subsequence with at most k exceptions
int lisExc(int k, vector<int> vec) {
int n = vec.size();
int m = compress(vec);
vector<int> dp(n, 0);
for (int j = 0;; ++j) {
Fenwick fenw(m+1); // longest subsequence with at most j exceptions ending at this value
int max_exc = 0; // longest subsequence with at most j-1 exceptions ending before this
for (int i = 0; i < n; ++i) {
int off = 1 + max(max_exc, fenw.get(vec[i]));
max_exc = max(max_exc, dp[i]);

dp[i] = off;
fenw.inc(vec[i]+1, off);
}
if (j == k) return fenw.get(m);
}
}

int main() {
int n, k;
cin >> n >> k;

vector<int> vec(n);
for (int i = 0; i < n; ++i) cin >> vec[i];

int res = lisExc(k, vec);
cout << res << '\n';
}

Now we will return to the O(n log² n) algorithm. Select some integer 0 <= r <= n. Define DP'[a][r] = max(DP[a][b] - rb), where the maximum is taken over b, MAXB[a][r] as the maximum b such that DP'[a][r] = DP[a][b] - rb, and MINB[a][r] similarly as the minimum such b. We will show that DP[a][k] = DP'[a][r] + rk if and only if MINB[a][r] <= k <= MAXB[a][r]. Further, we will show that for any k exists an r for which this inequality holds.

Note that MINB[a][r] >= MINB[a][r'] and MAXB[a][r] >= MAXB[a][r'] if r < r', hence if we assume the two claimed results, we can do binary search for the r, trying O(log n) values. Hence we achieve complexity O(n log² n) if we can calculate DP', MINB and MAXB in O(n log n) time.

To do this, we will need a segment tree that stores tuples P[i] = (v_i, low_i, high_i), and supports the following operations:

  1. Given a range [a, b], find the maximum value in that range (maximum v_i, a <= i <= b), and the minimum low and maximum high paired with that value in the range.

  2. Set the value of the tuple P[i]

This is easy to implement with complexity O(log n) time per operation assuming some familiarity with segment trees. You can refer to the implementation of the algorithm below for details.

We will now show how to compute DP', MINB and MAXB in O(n log n). Fix r. Build the segment tree initially containing n+1 null values (-INF, INF, -INF). We maintain that P[V[j]] = (DP'[j], MINB[j], MAXB[j]) for j less than the current position i. Set DP'[0] = 0, MINB[0] = 0 and MAXB[0] to 0 if r > 0, otherwise to INF and P[0] = (DP'[0], MINB[0], MAXB[0]).

Loop i from 1 to n. There are two types of subsequences ending at i: those where the previous element is greater than V[i], and those where it is less than V[i]. To account for the second kind, query the segment tree in the range [0, V[i]]. Let the result be (v_1, low_1, high_1). Set off1 = (v_1 + 1, low_1, high_1). For the first kind, query the segment tree in the range [V[i], n]. Let the result be (v_2, low_2, high_2). Set off2 = (v_2 + 1 - r, low_2 + 1, high_2 + 1), where we incur the penalty of r for creating an exception.

Then we combine off1 and off2 into off. If off1.v > off2.v set off = off1, and if off2.v > off1.v set off = off2. Otherwise, set off = (off1.v, min(off1.low, off2.low), max(off1.high, off2.high)). Then set DP'[i] = off.v, MINB[i] = off.low, MAXB[i] = off.high and P[i] = off.

Since we make two segment tree queries at every i, this takes O(n log n) time in total. It is easy to prove by induction that we compute the correct values DP', MINB and MAXB.

So in short, the algorithm is:

  1. Preprocess, modifying values so that they form a permutation, and the last value is the largest value.

  2. Binary search for the correct r, with initial bounds 0 <= r <= n

  3. Initialise the segment tree with null values, set DP'[0], MINB[0] and MAXB[0].

  4. Loop from i = 1 to n, at step i

    • Querying ranges [0, V[i]] and [V[i], n] of the segment tree,
    • calculating DP'[i], MINB[i] and MAXB[i] based on those queries, and
    • setting the value at position V[i] in the segment tree to the tuple (DP'[i], MINB[i], MAXB[i]).
  5. If MINB[n][r] <= k <= MAXB[n][r], return DP'[n][r] + kr - 1.

  6. Otherwise, if MAXB[n][r] < k, the correct r is less than the current r. If MINB[n][r] > k, the correct r is greater than the current r. Update the bounds on r and return to step 1.

Here is a C++ implementation for this algorithm. It also finds the optimal subsequence.

    #include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
using ll = long long;
const int INF = 2 * (int)1e9;

pair<ll, pair<int, int>> combine(pair<ll, pair<int, int>> le, pair<ll, pair<int, int>> ri) {
if (le.first < ri.first) swap(le, ri);
if (ri.first == le.first) {
le.second.first = min(le.second.first, ri.second.first);
le.second.second = max(le.second.second, ri.second.second);
}
return le;
}

// Specialised range maximum segment tree
class SegTree {
private:
vector<pair<ll, pair<int, int>>> seg;
int h = 1;

pair<ll, pair<int, int>> recGet(int a, int b, int i, int le, int ri) const {
if (ri <= a || b <= le) return {-INF, {INF, -INF}};
else if (a <= le && ri <= b) return seg[i];
else return combine(recGet(a, b, 2*i, le, (le+ri)/2), recGet(a, b, 2*i+1, (le+ri)/2, ri));
}
public:
SegTree(int n) {
while(h < n) h *= 2;
seg.resize(2*h, {-INF, {INF, -INF}});
}
void set(int i, pair<ll, pair<int, int>> off) {
seg[i+h] = combine(seg[i+h], off);
for (i += h; i > 1; i /= 2) seg[i/2] = combine(seg[i], seg[i^1]);
}
pair<ll, pair<int, int>> get(int a, int b) const {
return recGet(a, b+1, 1, 0, h);
}
};

// Binary searches index of v from sorted vector
int bins(const vector<int>& vec, int v) {
int low = 0;
int high = (int)vec.size() - 1;
while(low != high) {
int mid = (low + high) / 2;
if (vec[mid] < v) low = mid + 1;
else high = mid;
}
return low;
}

// Finds longest strictly increasing subsequence with at most k exceptions in O(n log^2 n)
vector<int> lisExc(int k, vector<int> vec) {
// Compress values
vector<int> ord = vec;
sort(ord.begin(), ord.end());
ord.erase(unique(ord.begin(), ord.end()), ord.end());
for (auto& v : vec) v = bins(ord, v) + 1;

// Binary search lambda
int n = vec.size();
int m = ord.size() + 1;
int lambda_0 = 0;
int lambda_1 = n;
while(true) {
int lambda = (lambda_0 + lambda_1) / 2;
SegTree seg(m);
if (lambda > 0) seg.set(0, {0, {0, 0}});
else seg.set(0, {0, {0, INF}});

// Calculate DP
vector<pair<ll, pair<int, int>>> dp(n);
for (int i = 0; i < n; ++i) {
auto off0 = seg.get(0, vec[i]-1); // previous < this
off0.first += 1;

auto off1 = seg.get(vec[i], m-1); // previous >= this
off1.first += 1 - lambda;
off1.second.first += 1;
off1.second.second += 1;

dp[i] = combine(off0, off1);
seg.set(vec[i], dp[i]);
}

// Is min_b <= k <= max_b?
auto off = seg.get(0, m-1);
if (off.second.second < k) {
lambda_1 = lambda - 1;
} else if (off.second.first > k) {
lambda_0 = lambda + 1;
} else {
// Construct solution
ll r = off.first + 1;
int v = m;
int b = k;
vector<int> res;
for (int i = n-1; i >= 0; --i) {
if (vec[i] < v) {
if (r == dp[i].first + 1 && dp[i].second.first <= b && b <= dp[i].second.second) {
res.push_back(i);
r -= 1;
v = vec[i];
}
} else {
if (r == dp[i].first + 1 - lambda && dp[i].second.first <= b-1 && b-1 <= dp[i].second.second) {
res.push_back(i);
r -= 1 - lambda;
v = vec[i];
--b;
}
}
}
reverse(res.begin(), res.end());
return res;
}
}
}

int main() {
int n, k;
cin >> n >> k;

vector<int> vec(n);
for (int i = 0; i < n; ++i) cin >> vec[i];

vector<int> ans = lisExc(k, vec);
for (auto i : ans) cout << i+1 << ' ';
cout << '\n';
}

We will now prove the two claims. We wish to prove that

  1. DP'[a][r] = DP[a][b] - rb if and only if MINB[a][r] <= b <= MAXB[a][r]

  2. For all a, k there exists an integer r, 0 <= r <= n, such that MINB[a][r] <= k <= MAXB[a][r]

Both of these follow from the concavity of the problem. Concavity means that DP[a][k+2] - DP[a][k+1] <= DP[a][k+1] - DP[a][k] for all a, k. This is intuitive: the more exceptions we are allowed to make, the less allowing one more helps us.

Fix a and r. Set f(b) = DP[a][b] - rb, and d(b) = f(b+1) - f(b). We have d(k+1) <= d(k) from the concavity of the problem. Assume x < y and f(x) = f(y) >= f(i) for all i. Hence d(x) <= 0, thus d(i) <= 0 for i in [x, y). But f(y) = f(x) + d(x) + d(x + 1) + ... + d(y - 1), hence d(i) = 0 for i in [x, y). Hence f(y) = f(x) = f(i) for i in [x, y]. This proves the first claim.

To prove the second, set r = DP[a][k+1] - DP[a][k] and define f, d as previously. Then d(k) = 0, hence d(i) >= 0 for i < k and d(i) <= 0 for i > k, hence f(k) is maximal as desired.

Proving concavity is more difficult. For a proof, see my answer at cs.stackexchange.

Longest Increasing subsequence length in NlogN.[Understanding the Algo]

tail[i] is the minimal end value of the increasing subsequence (IS) of length i+1.

That's why tail[0] is the 'smallest value' and why we can increase the value of LIS (length++) when the current value is bigger than end value of the current longest sequence.

Let's assume that your example is the starting values of the input:

input = 2, 5, 3, 7, 11, 8, 10, 13, 6, ...

After 9 steps of our algorithm tail looks like this:

tail = 2, 3, 6, 8, 10, 13, ...

What does tail[2] means? It means that the best IS of length 3 ends with tail[2]. And we could build an IS of length 4 expanding it with the number that is bigger than tail[2].

tail[0] = 2, IS length = 1: 2, 5, 3, 7, 11, 8, 10, 13, 6

tail[1] = 3, IS length = 2: 2, 5, 3, 7, 11, 8, 10, 13, 6

tail[2] = 6, IS length = 3: 2, 5, 3, 7, 11, 8, 10, 13, 6
tail[3] = 8, IS length = 4: 2, 5, 3, 7, 11, 8, 10, 13, 6

tail[4] = 10,IS length = 5: 2, 5, 3, 7, 11, 8, 10, 13, 6

tail[5] = 13,IS length = 6: 2, 5, 3, 7, 11, 8, 10, 13, 6

This presentation allows you to use binary search (note that defined part of tail is always sorted) to update tail and to find the result at the end of the algorithm.



Related Topics



Leave a reply



Submit