Sunday, May 31, 2009

Suffix Trees: Java Code as Separate Classes

In this post - Suffix Trees: Java Code as Separate Classes.
Download java source code.

Please refer previous posts about Suffix Trees:
Class diagram:

contains and indexOf functions:
public boolean contains(String str) {
    return indexOf(str) >= 0;
}
   public int indexOf(String str) {
    if (str.length() == 0)
        return -1;
   
    int index = -1;
    Node node = root;
   
    int i = 0;
    while (i<str.length()) {
        if ((node == null) || (i == text.length()))
            return -1;
   
        Edge edge = node.findEdge(str.charAt(i));
        if (edge == null)
            return -1;
   
        index = edge.getBeginIndex()-i;
        i++;
   
        for(int j=edge.getBeginIndex()+1; j<=edge.getEndIndex(); j++) {
            if (i == str.length())
                break;
            if (text.charAt(j) != str.charAt(i))
                return -1;
            i++;
        }
        node = edge.getEndNode();
    }
    return index;
}

Future plans:
  • Generalized Suffix Tree;
  • Longest Common SubString;

More information about Suffix Trees Use Cases you can find in:

Friday, May 29, 2009

Stored Procedures in MySQL 5.x: log / trace messages

There is no standard way to log / trace message in MySQL.
But there is a trick - execute SQL queries like next:
SELECT 'log message' AS 'title';

So to have Log Message feature we need next stored procedures:
  • procedure to execute dynamic SQL statements
  • procedure to create SELECT SQL query and simulate log output

See the code below:

DELIMITER $$


-- PROCEDURE pLog
-- outputs log message
-- Params:
-- sTitle - title
-- sMsg - log message
DROP PROCEDURE IF EXISTS `pLog` $$
CREATE PROCEDURE `pLog`(IN sTitle VARCHAR(255), IN sMsg VARCHAR(255))
BEGIN
DECLARE strSQL VARCHAR(512);

SET strSQL = CONCAT('SELECT ''', sMsg, ''' AS ''', sTitle, '''');
CALL pExecuteImmediate(strSQL);
END $$


-- PROCEDURE pExecuteImmediate
-- executes dynamic SQL statement
-- Params:
-- tSQLStmt - SQL statement to be executed
DROP PROCEDURE IF EXISTS `pExecuteImmediate` $$
CREATE PROCEDURE `pExecuteImmediate`(IN tSQLStmt TEXT)
BEGIN
SET @executeImmediateSQL = tSQLStmt;
PREPARE executeImmediateSTML FROM @executeImmediateSQL;
EXECUTE executeImmediateSTML;
DEALLOCATE PREPARE executeImmediateSTML;
END $$


DELIMITER ;

Function pCheckCurrentUserRoot contains examples of using pLog

DELIMITER $$


-- FUNCTION pCheckCurrentUserRoot
-- checks if we are connected as root user
DROP PROCEDURE IF EXISTS `pCheckCurrentUserRoot` $$
CREATE PROCEDURE `pCheckCurrentUserRoot`()
BEGIN
DECLARE strUser VARCHAR(50);

CALL pLog('INFO', 'pCheckCurrentUserRoot...');

SELECT USER() INTO strUser;
CALL pLog('INFO', CONCAT('user: ', strUser));

IF (strUser LIKE 'root%') THEN
CALL pLog('INFO', 'current user is root');
ELSE
CALL pLog('INFO', 'current user is not root');
END IF;
END $$


DELIMITER ;

Additional information about Stored Procedures in MySQL 5.x could be found here:

Sunday, May 24, 2009

Suffix Trees: Refactored Java Code

In this post I will try to describe refactored java code.
Please refer previous post about Suffix Trees - Suffix Trees: Java Ukkonen's Algorithm

Tree structure:

Tree consists of:
  • nodes;
  • edges.
Edge has next data:
  • reference to start node;
  • reference to end node;
  • sub string begin index;
  • sub string end index.
Main methods:
  • split edge by a Suffix and return a new Node

Node
has next data:
  • reference to suffix node;
  • hash map of edge references;
  • name.
Main methods:
  • add Edge into hash map
  • remove Edge from hash map
  • find Edge in hash map by character

Suffix
is important for building tree. It has next data:
  • reference to origin node;
  • sub string begin index;
  • sub string end index.
Main methods:
  • canonize

All classes are wrapped by SuffixTree class that creates a tree. It has:
  • input text;
  • root Node.

Class diagram:



Future plans:
  • suffix trees use cases
  • further refactoring

Download Suffix Trees Refactored Java Code.
Several test runs are listed below:

book
Edge  Start   End   Suf   First Last  String
  1     0     1     -1    0     3     book
  3     0     3     0     1     1     o
  5     0     5     -1    3     3     k
  2     3     2     -1    2     3     ok
  4     3     4     -1    3     3     k

bookke
Edge  Start   End   Suf   First Last  String
  8     0     8     -1    5     5     e
  1     0     1     -1    0     5     bookke
  3     0     3     0     1     1     o
  6     0     6     0     3     3     k
  2     3     2     -1    2     5     okke
  4     3     4     -1    3     5     kke
  7     6     7     -1    5     5     e
  5     6     5     -1    4     5     ke

cacao
Edge  Start   End   Suf   First Last  String
  3     0     3     5     0     1     ca
  5     0     5     0     1     1     a
  7     0     7     -1    4     4     o
  1     3     1     -1    2     4     cao
  4     3     4     -1    4     4     o
  2     5     2     -1    2     4     cao
  6     5     6     -1    4     4     o

googol
Edge  Start   End   Suf   First Last  String
  5     0     5     3     0     1     go
  3     0     3     0     1     1     o
  8     0     8     -1    5     5     l
  1     5     1     -1    2     5     ogol
  6     5     6     -1    5     5     l
  4     3     4     -1    3     5     gol
  2     3     2     -1    2     5     ogol
  7     3     7     -1    5     5     l

abababc
Edge  Start   End   Suf   First Last  String
  9     0     9     0     1     1     b
  11    0     11    -1    6     6     c
  7     0     7     9     0     1     ab
  10    9     10    -1    6     6     c
  5     9     5     7     2     3     ab
  8     7     8     -1    6     6     c
  3     7     3     5     2     3     ab
  6     5     6     -1    6     6     c
  2     5     2     -1    4     6     abc
  4     3     4     -1    6     6     c
  1     3     1     -1    4     6     abc