Tuesday, April 27, 2010

Dynamic Programming: Longest Common Substring

The idea how to solve Longest Common Substring problem by Dynamic Programming could be found here:
My implementation is published below.

Next ideas additionally could be used to reduce the memory usage:
  • Keep only the last and current row
  • Store only non-zero values in the rows (hash tables)

/**
* Longest Common Substring.<BR>
* <pre>
* | ls(i-1,j-1) + 1, if a[i]=b[j]
* ls(i,j) = |
* | 0, else
* Boundary conditions: let think that ls[i][-1] = 0, ls[-1][j] = 0
* </pre>
*/

public class LongestCommonSubstring {
public LongestCommonSubstring(String a, String b) {
this.a = a;
this.b = b;

initDp();
}

// strings a, b
private String a;
private String b;

/**
* ls(i,j) - maximum length of common strings that end at a[i] and b[j].
*/

private int ls[][];

// max ls - stores max length
private int maxLs;
private int maxI;
private int maxJ;

private void updateMaxLs(int ls, int i, int j) {
if (maxLs < ls) {
maxLs = ls;
maxI = i;
maxJ = j;
}
}

private void initMaxLs() {
maxLs = 0;
maxI = -1;
maxJ = -1;
}

private void initDp() {
initMaxLs();

// init with 0 by default
ls = new int[a.length() + 1][b.length() + 1];

// so this could be skipped
for (int i = 0; i <= a.length(); i++)
ls[i][0] = 0;
for (int j = 0; j <= b.length(); j++)
ls[0][j] = 0;

for (int i = 1; i <= a.length(); i++) {
for (int j = 1; j <= b.length(); j++) {
ls[i][j] = (a.charAt(i - 1) != b.charAt(j - 1)) ? 0 : ls[i - 1][j - 1] + 1;
updateMaxLs(ls[i][j], i, j);
}
}
}

public void printLs() {
System.out.printf("%1$c ", ' ');
for (int j = 0; j <= b.length(); j++)
System.out.printf("%1$c ", (j == 0) ? ' ' : b.charAt(j - 1));
System.out.printf("\n");

for (int i = 0; i <= a.length(); i++) {
System.out.printf("%1$c ", (i == 0) ? ' ' : a.charAt(i - 1));

for (int j = 0; j <= b.length(); j++) {
System.out.printf("%1$d ", ls[i][j]);
}
System.out.printf("\n");
}
}

public String getLcs() {
int size = ls[maxI][maxJ];
char sb[] = new char[size];

for(int l=0; l<size; l++)
sb[l] = a.charAt(maxI-size+l);
return new String(sb);
}

public static void main(String[] args) {
LongestCommonSubstring lcs = new LongestCommonSubstring("hello", "aloha");
lcs.printLs();
System.out.printf("LCS=%1$s\n\n", lcs.getLcs());

lcs = new LongestCommonSubstring("baba", "abba");
lcs.printLs();
System.out.printf("LCS=%1$s\n\n", lcs.getLcs());

lcs = new LongestCommonSubstring("Quick Search algorithm", "Tuned Boyer-Moore algorithm");
lcs.printLs();
System.out.printf("LCS=%1$s\n\n", lcs.getLcs());
}
}

Program output could be found below:
    a l o h a 
0 0 0 0 0 0
h 0 0 0 0 1 0
e 0 0 0 0 0 0
l 0 0 1 0 0 0
l 0 0 1 0 0 0
o 0 0 0 2 0 0
LCS=lo

a b b a
0 0 0 0 0
b 0 0 1 1 0
a 0 1 0 0 2
b 0 0 2 1 0
a 0 1 0 0 2
LCS=ba

T u n e d B o y e r - M o o r e a l g o r i t h m
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
r 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0
l 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0
o 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 5 0 0 0 0 0
r 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 6 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0
t 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0
m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10
LCS= algorithm

Monday, April 26, 2010

Dynamic Programming

Introductions to the Dynamic Programming (DP) could be found here:
Main characteristics of DP are:
To solve DP problem we need to do next steps:
  • define what we are calculating;
  • define recursion formula;
  • define boundary condition;
  • define calculation order.
DP problems are often used in Programming Contests

Sunday, April 11, 2010

Google Code Jam: Egg Drop Solution

The problem definition could be found at Code Jam web site.

Dynamic programming:
  1. The main idea is to understand dependency between current drop and next/prev drops.
  2. Let F(D, B) - function returning Fmax number of floors in a building when Solvable(F, D, B) is true.
  3. Let do a drop. We know the result for this drop: egg has been dropped or egg has not been dropped. So we covered +1 floor. Now we have D-1 drops left.
  4. If egg has not been dropped then we still have B breaks left and for all floors from 1 to current egg will not be dropped as well. To get value of Fmax we only need to estimate how many floors upstairs we could cover to have Solvable(F, D, B) as true. We could think that we are on the ground and required value could be evaluated by F(D-1, B).
  5. If egg has been dropped then we have B-1 breaks left. The situation below current floor is still unknown but to estimate how many floors downstairs we could cover and have Solvable(F, D, B) as true we could use F(D-1, B-1).
  6. Taking all above into an account we have F(D, B) = 1 + F(D-1, B) + F(D-1, B-1)
Implementation:
  1. Next facts could help to have efficient implementation.
  2. F(D, 1) = D - because to have Solvable(F, D, B) = true we should start from 1-st floor and continue dropping from next floor one by one.
  3. F(1, B) = 1 - because to have Solvable(F, D, B) = true we should start from 1-st floor and no more drops left after first attempt.
  4. F(x, x) = 2^x -1 - because to have Solvable(F, D, B) = true and to have optimal solution we should use divide and conquer approach
  5. F(x, y) = F(x, x), for all y > x - the same as for (4) and more possible breaks don't affect the result.
  6. We need to calculate F(D, B) only for B=1..32, because all other values could be evaluated by (5) or will be -1, because they will be greater then 2^32 (4294967296)
  7. It is enough to have array[1..Z][1..32], where F(Z, 2) > 4294967296 values to have fast, cache-based implementation of F(D, B).
    (I don't know how to calculate Z theoretically, but practically Z=10000 is not enough and Z=100000 is enough)

F(D, B) - Illustration of facts discussed above


My solution is published below.

Main functions:
  • solve - solves the problem
  • initFCache - inits F-cache
  • getF - gets Fmax from F-cache
  • getD - gets Dmin based on F-cache
  • getB - gets Bmin based on F-cache

public class EggDrop {
...

private Scanner scanner;
private PrintWriter writer;

public EggDrop(InputStream is, OutputStream os) {
scanner = new Scanner(is);
writer = new PrintWriter(os);
}
...

private static final long MAX_F_VALUE = 4294967296l;
private static final int LARGE_F_VALUE = -1;
private static final int MAX_B_INDEX = 32;

private static final int F_CACHE_SIZE = 100000;
private long[][] fCache;

/**
* Solve the problem
*/

public void solve() {
fCache = new long[F_CACHE_SIZE][MAX_B_INDEX];
initFCache();

int n = scanner.nextInt();

for (int i = 1; i <= n; i++) {
// int is enough to store all numbers
// see Integer.MAX_VALUE, 2,000,000,000 < 2,147,483,647
int F = scanner.nextInt();
int D = scanner.nextInt();
int B = scanner.nextInt();

writer.print("Case #");
writer.print(i + ": ");

long Fmax = getF(D, B);
int Dmin = getD(F, B, D);
int Bmin = getB(F, D, B);

writer.printf("%1$d %2$d %3$d\n", Fmax, Dmin, Bmin);
}
}

/**
* Get Fmax from F cache
*
* @param d D
* @param b B
* @return Fmax
*/

private long getF(int d, int b) {
if (b > MAX_B_INDEX)
b = MAX_B_INDEX;

if (b == 1)
return d;

if (d > F_CACHE_SIZE)
return -1;

return fCache[d - 1][b - 1];
}

/**
* Get Dmin based on F cache
*
* @param f F
* @param b B
* @param dMax D
* @return Dmin
*/

private int getD(long f, int b, int dMax) {
for (int d = 1; d <= dMax; d++) {
long maxF = getF(d, b);
if ((maxF == LARGE_F_VALUE) || (maxF >= f))
return d;
}
throw new IllegalStateException(String.format("D not found, F=%1$d, B=%2$d, Dmax=%3$d", f, b, dMax));
}

/**
* Get Bmin based on F cache
*
* @param f F
* @param d D
* @param bMax B
* @return Bmin
*/

private int getB(long f, int d, int bMax) {
for (int b = 1; b <= bMax; b++) {
long maxF = getF(d, b);
if ((maxF == LARGE_F_VALUE) || (maxF >= f))
return b;
}
throw new IllegalStateException(String.format("B not found, F=%1$d, D=%2$d, max B=%3$d", f, d, bMax));
}

/**
* Init F cache. DP.<BR>
* F(D, B) = F(D-1, B) + 1 + F(D-1, B-1)<BR>
* if F(D, B) > 4294967296 then F(D, B) = -1
*/

private void initFCache() {
Arrays.fill(fCache[0], 1);

for (int d = 1; d < F_CACHE_SIZE; d++) {
fCache[d][0] = d + 1;
for (int b = 1; b < MAX_B_INDEX; b++) {
if ((fCache[d - 1][b] == LARGE_F_VALUE) || (fCache[d - 1][b - 1] == LARGE_F_VALUE))
fCache[d][b] = LARGE_F_VALUE;
else {
fCache[d][b] = fCache[d - 1][b] + 1 + fCache[d - 1][b - 1];
if (fCache[d][b] >= MAX_F_VALUE)
fCache[d][b] = LARGE_F_VALUE;
}
}
}
}
}

See also other posts in Code Jam