2009/06/27

Interpreter for a Domain-Specific Languages (DSL)

Long time ago I had the idea of possible unification between data-centric DSLs and algorithm-centric DSLs (or, if you wish, something that could play well with algorithms and data). Possibilities do exist to unite data and algorithms. But in most cases it is done by embedding either code into markup, or unescaped data into program code.

Let's define a term. Data and code mixed together (in any way) we will call spice.

My idea was (and is): there should be a way to implement a DSL for spice in such a way, that:
  • There is no semantic difference between data and code.
  • Both are equally treated by the language (no unescaping, or similar).
What is the classic way to turn data into instructions (execute code)?
  • The paser validates the textual data and, based on it, generates a (mostly) tree-like data structure.
  • The interpreter "executes" the data structure by traversing it, depending on occuring instructions and some (either global or local) states.
I thought that it could be possible to either unify these steps, or skip the first, so we would only work with data, and had not to parse and validate text. In my opinion, representing instructions using a tree structure is best achieved by using XML. Please don't think of <this and="that"/> when you read XML. Please just think about a nicely standardized way to store data in trees, annotated with attributes, and the addition of namespaces.

So, why not just define some way, to interpret a tree? So, I did it.

PineTree is the (perhaps only temporary) name for my project.

Think of it as a XML programming language. Stay tuned, I'll blog about this later.

2009/06/24

What is the sense of KDE 4?

I am using Linux since over 8 years now as my primary operating system.

Since then, I welcomed every evolutionary step. With one exception: KDE 4.

The maintenance of KDE 3 is terrible, since most developers seem to spend their time on KDE 4. What do I got from it? A shiny new user interface? Yes. More functionality? No, less! More stability? No, less! It seems to me, even Gnome offers more functionality than KDE 4 (and this means something).

I lost my faith in KDE. I hope Gnome will become more attractive for power users soon, so I can switch to it.

2009/06/19

Sun's JVM is the best JVM

I've done some benchmarkings with following JVMs:
  • JRockit (Oracle, formerly BEA)
  • IBM JVM
  • Sun JVM (well, also Oracle)
Currently I have no time to provide concrete benchmark results, but one thing can be told already: JRockit and IBM's VMs are so bad in terms of memory or performance, that I can't understand the reason for their existence.

2009/06/15

Tagged File Saerch (Java + Desktop)

Few months ago I started to play around with Java on Desktop.

Java is perfect for development of Desktop apps
(with few ugly exceptions, such as no support for global hotkeys).

Being a fan of tagging and linked data structures, I've extracted most classes from my tagging-related web project, and restructured them as a standalone API. Now I'll use this API for the web project and the new desktop app.

(Needless to say that I'm using the latest Netbeans trunk snapshot as my IDE)

What does my new shiny tool do?

It indexes your HDD (can be limited to certain directories) and lets you specify tags for files and directories. It resides in the system tray, and when you click it, you see a dark rectangular text field, which allows to search for tags and partial file names.


Click on the screenshot to get a first impression how the tool could look like. (Currently, tagging is not supported at all).

2009/05/02

Tagging

I've done some experiments with tags:
  • custom "types" of tags
  • automatic tagging of nodes according to their textual content
  • implied tag trees (which tags tend to occur together?)
  • ...
Here's a nice screenshot of tags sorted by their usage frequency.



Note: names of tags and nodes are chosen by random concatenation of vowels and consonants ;)

2009/01/16

Custom URIs & new property system

Now I'm using customized URI schemes for identifying resources.

There are two access types: relative and absolute.

For relative URIs the authority is left out:
node:/1234567/1
For absolute URIs the authority is specified:
node://localhost:8000/1234567/1

Also note the new property system:
It supports basic data types, property lists and URIs (custom and HTTP).

XML Injection Web Framework

What do famous (and not-so-famous) web frameworks have in common? They offer tons of features, no matter if you need them or not. Some features, sadly, limit the frameworks in their extensibility.

I found no web framework which is able to offer most of its basic features in XML-based web markup languages other than XHTML. (Such as SVG)

I am especially interested in such solutions which allow to create content programmatically (such as Wicket does). But such solutions are mostly optimized for HTML only.

So I written an own web framework ;)

It offers very basic functionality compared with existing frameworks, but has (from my point of view) important advantages:

  • It allows programmatically generated content to be anything you want it to be
  • It injects dynamically generated content in static XML files (so the result can be transformed using let's say XSLT to anything else)

I don't plan to publish it separately (yet), but who knows, maybe one day...

2008/11/15

XQuery disappointments - Nightmare of unindexed data...

I am very disappointed with XQuery implementations. There are some data collections in this world (very few!) which have more than 100'000 elements ;) Without indexing the data, searches become slower with growing data amount (worse than linear!).

I played around a bit with Saxon, XQJ-EA-RI, Lucene and some Java's out-of-the-box XML APIs.

Negative notes:
  • The XQuery standard doesn't specify indexing features (WHY?)
  • XQJ-EA-RI does not implement any indexing features (no wonder, when none are specified).
  • I have read about few commercial apps which do indexing for XQuery. (But assuming that Java's ecosystem is opensource, I dare to ignore their existence).

What I tried was: indexing XML with Lucene. Obvious problems:
  • Lucene's search syntax is neither standardized, nor powerful. (name:TextGoesHere*)
  • Retrieved data is not XML. It's a bunch of garbage. (Think of an Map<String, String>). You have just lot of named fields, each of them having a string value. Wow! That's middle ages compared to lovely structured and typed XML data! (If I wanted middle ages, I'd use SQL.)

I also tried plain XQuery (without Lucene):
  • Slow! So slow, that to produce a denial-of-service attack on the server one had to send just 10 queries in parralel ;)
  • XQuery needs either a single pre-aggregated XML document, or some magical fn:collection(...) which points at a directory (oh no!) and reads all XML files from there (magic!).

So I have the choice: Either working with indexing (therefor fast) searches which produce data garbage, or working without indexing (therefor slow) and give up everything as soon as a database grows over 100'000 entries.
Because I want my app to be scalable, I decided to work with data garbage rather than limiting allowed amount of entries.

Following steps are done to do so:

Indexing:
  • Index XML data using Lucene (yes, using that futuristic concept of named fields). Index all fields which are relevant for saerching. Don't forget to specify fields named "ID" to save the XML ID (or some customized ID) of each XML object!

Searching:
  • Process the Lucene-query (such as "name:Hello*").
  • This will give you a list of 'documents' (those strange objects which hold all fields). Ask each document for field named "ID".
  • Convert each ID to an XML object (here, one should notice, that XML objects should be stored seperately, rather than aggregated, so that one can fast retrieve an object by ID, using, let's say Map<Long,SomeXmlObj>)
  • Aggregate all resulting XML objects to one XML document. Don't forget to encapsulate them all inside a root element.
  • Process this result with XQuery.

Observations from such concept:
  • It's fast!
  • It is slow when you search for "*" (all elements) inside Lucene-query, because this results in aggregating ALL objects to one document.
  • It's unflexible. What if I want to evaluate ALL objects using XQuery (without filtering candidate objects with Lucene)?
  • It's not XQuery, it's rather a mix of some Lucene-self-invented-pseudo-query plus XQuery.

Use case 1)

Let there be following XML objects:

<obj id="0" name="helloYou"> (content) </obj>
<obj id="1" name="helloWorld"> (content) </obj>
<obj id="2" name="helloUniverse"> (content) </obj>
<obj id="3" name="someName"> (content) </obj>

(Now, imagine 100'000 more of such data)


Our Lucene-query:

name:hello*

Our XQuery:

for $n in .
where $n/@id >= 1
return $n


Our result:

<obj id="1" name="helloWorld"> (content) </obj>
<obj id="2" name="helloUniverse"> (content) </obj>


Use case 2)

Asuming same XML data and same Lucene-query...

New XQuery:

for $n in .

where string-length($n/@name < 11)
return $n


New result:

<obj id="0" name="helloYou"> (content) </obj>

<obj id="1" name="helloWorld"> (content) </obj>


EOF

2008/10/26

Mini Memento Framework

Today, I want to introduce a simple Memento implementation.

It's simple and easy to use. No cast from Object here, not null comparison there, pure, beautiful Java 5 with annotations in foreground (and reflection in the background, pssst!)

In my main project, I'm using action queues. There are a lot of possible failures during a processing of a such queue, but only one successful state (at the end). I needed a possibility to restore the initial state of all involved objects, whose state has been changed. So I started three or four different (classic) approaches (just generic classes and interfaces), but none of them was elegant and simple enough, to fit my 'requirements'. I found no possibility to enforce type strictness.

As a last chance, I tried to declare a custom annotation and create a class which represents the 'memento', the state manager of objects.

First, I show you how to use the annotation:

public class StatefulObject
        implements IStateful {

    @Stateful
    protected String name;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @Override
    public void afterRestoreState() {
        // in our case, do nothing
    }
}

Here, the field "name" is declared to be related to the state of our object.

Following code demonstrates, how to store / retrieve states:

StatefulObject o1 = new StatefulObject();

Memento<StatefulObject> m =
         new Memento<StatefulObject>(StatefulObject.class);

o1.setName("old");
m.addNewState(o1); // save the state

o1.setName("new");
m.restoreInitialState(o1); // restore the state (name is back to "old")


To make this possible, we need an container (memento) which stores the states of the objects. It's lot of (mostly trivial) code.

In words: you traverse an class' fields using java reflection (this is done once in the constructor). To store and restore states, you use an Object[] array. To make the memento work as a stack you could use Stack<Object[]>.

Important note: there is a conceptional problem with "@Stateful" fields: when they're restored, they get overwritten. That means: references to such fields become invalid! Some abc==someStatefulField comparisons will be invalidated this way. Keep that in mind!

2008/10/18

JAX-RS serves JAXB objects!


I decided to play around with JAX-RS. It's a really nice API. Browsing the Internet for random information about it, I discovered it's ability to serve JAXB-marshalled objects out-of-the-box! Hence I'm already using JAXB for persistency, it was a work of a few minutes to convert the old "Info servlet" to a RESTful ressource. The screenshot shows it in action.

2008/10/17

To REST or not to REST?


Currently I use custom servlets (screenshot shows SVG servlet). But I'm thinking about switching to JAX-RS (not yet part of Java EE), which is an API for RESTful web services. It enables such nice things as mapping URLs to ressources.

2008/10/14

Apache Tomcat 6 vs GlassFish 3

It seems that the previously mentioned memory leak is GlassFish's fault. I've ran same test (now, for even two hours) using the Tomcat webserver. No memory leak. It consumes less memory (100MB instead of 250MB) and is faster (10000 requests per second instead of 5000).

Server test result (using Apache Tomcat 6):

Client test result (using Apache Tomcat 6):

Strangely, when different server is used, the client behaves also completely different.

We have to keep in mind, that GlassFish 3 "is not suitable for production deployments" (as mentioned here).