Wednesday, April 10, 2013

Java Streams vs C# LINQ vs Java6

A while back I ran into an article comparing C# LINQ to the upcoming Java8 Streams API: http://blog.informatech.cr/2013/03/24/java-streams-preview-vs-net-linq/.

I'm not really experienced with C# but I have a feeling that the language as a whole is quite verbose and, well, bad? Except for LINQ which performs magic with monadic comprehension.

I've been coding in Java for the most of my career, so I know a thing or two about it:
  1. it's really verbose.
  2. it's really, really, verbose.
  3. it's not nearly as verbose as you think. It's often just bad practice and inferior style that makes it so verbose and incomprehensible.
Since Java is still the language of my enterprise-day-job, I decided to ease my pain a bit, so I implemented my own functional utility library. I got a bit carried away, so I ended up with some annotation processors to bring something like poor-mans-first-class-functions into Java.

Since the library often provides rather clean ways to express ones intent, I wanted to see how it would compare to LINQ and Java Streams. So here it goes, examples from the Informatech blog supplemented with examples using plain old Java6 (released in 2007) using my functional library with annotation processors.

Challenge 1: Filtering


LINQ

string[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
List<string> filteredNames = names.Where(c => c.Contains("am"))
                                  .ToList();

Java Streams

String[] names = {"Sam","Pamela", "Dave", "Pascal", "Erik"};
List<String> filteredNames = stream(names)
                 .filter(c -> c.contains("am"))
                 .collect(toList());

Java6

String[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
List<string> filteredNames = newList(filter(names, contains("am")));

Challenge 2: Indexed Filtering


LINQ

string[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
var nameList = names.Where((c, index) => c.Length <= index + 1).ToList();

Java Streams

String[] names = {"Sam","Pamela", "Dave", "Pascal", "Erik"};
 
List<String> nameList;
Stream<Integer> indices = intRange(1, names.length).boxed();
nameList = zip(indices, stream(names), SimpleEntry::new)
            .filter(e -> e.getValue().length() <= e.getKey())
            .map(Entry::getValue)
            .collect(toList());

Java6

String[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
List<String> nameList = newList(map(filter(zipWithIndex(names), pred),
                                    Transformers.<String> right()));

static boolean pred(Map.Entry<Integer, String> candidate) {
    return candidate.getValue().length() <= candidate.getKey() + 1;
}

Challenge 3: Selecting/Mapping


LINQ

List<string> nameList1 = new List(){ "Anders", "David", "James",
                                     "Jeff", "Joe", "Erik" };
nameList1.Select(c => "Hello! " + c).ToList()
         .ForEach(c => Console.WriteLine(c));

Java Streams

List<String> nameList1 = asList("Anders", "David", "James",
                                "Jeff", "Joe", "Erik");
nameList1.stream()
     .map(c -> "Hello! " + c)
     .forEach(System.out::println);

Java6

List<String> nameList1 = newList("Anders", "David", "James", "Jeff", "Joe", "Erik");
foreach(map(nameList1, prepend("Hello! ")),
            PrintStream_.println8.apply(System.out));

Challenge 4: Selecting Many/Flattening


LINQ

Dictionary<string, List<string>> map = new Dictionary<string,List<string>>();
map.Add("UK", new List<string>() {"Bermingham", "Bradford", "Liverpool"});
map.Add("USA", new List<string>() {"NYC", "New Jersey", "Boston", "Buffalo"});
var cities = map.SelectMany(c => c.Value).ToList();

Java Streams

Map<String, List<String>> map = new LinkedHashMap<>();
map.put("UK", asList("Bermingham","Bradford","Liverpool"));
map.put("USA", asList("NYC","New Jersey","Boston","Buffalo"));
 
FlatMapper<Entry<String, List<String>>,String> flattener;
flattener = (entry,consumer) -> { entry.getValue().forEach(consumer); };
 
List<String> cities = map.entrySet()
             .stream()
             .flatMap( flattener )
             .collect(toList());

Java6

Map<String, List<String>> map = newMap(
    Pair.of("UK", newList("Bermingham", "Bradford", "Liverpool")),
    Pair.of("USA", newList("NYC", "New Jersey", "Boston", "Buffalo")));
List<String> cities = newList(flatten(map.values()));

Challenge 5: Taking an Arbitrary Number of Items


LINQ

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 };
var first4 = numbers.Take(4).ToList();

Java Streams

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13 };
 
List<Integer> firstFour;
firstFour = stream(numbers).limit(4)
                           .boxed()
                           .collect(toList());

Java6

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 };
List<Integer> firstFour = newList(take(newArray(numbers), 4));



Challenge 6: Taking Items Based on Predicate


LINQ

string[] moreNames = { "Sam", "Samuel", "Dave", "Pascal", "Erik",  "Sid" };
var sNames = moreNames.TakeWhile(c => c.StartsWith("S"));

Java Streams

String[] names  = { "Sam","Samuel","Dave","Pascal","Erik","Sid" };
 
List<String> found;
found = stream(names).collect(partitioningBy( c -> c.startsWith("S")))
                     .get(true);

Java6

String[] names = { "Sam", "Samuel", "Dave", "Pascal", "Erik", "Sid" };
List<String> found = newList(takeWhile(names, startsWith("S")));

Challenge 7: Skipping an Arbitrary Number of Items


LINQ

string[] vipNames = { "Sam", "Samuel", "Samu", "Remo", "Arnold","Terry" };
var skippedList = vipNames.Skip(3).ToList();//Leaving the first 3.

Java Streams

String[] vipNames = { "Sam", "Samuel", "Samu", "Remo", "Arnold","Terry" };
 
List<String> skippedList;
skippedList = stream(vipNames).substream(3).collect(toList());

Java6

String[] vipNames = { "Sam", "Samuel", "Samu", "Remo", "Arnold", "Terry" };
List<String> skippedList = newList(drop(vipNames, 3));

Challenge 8: Skipping Items Based on Predicate


LINQ

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20 };
var skippedList = numbers.SkipWhile(c => c < 10);

Java Streams

//With current streams API I found no way to implement this idiom.

Java6

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20 };
List<Integer> skippedList = newList(dropWhile(newArray(numbers), lessThan(10)));

Challenge 9: Ordering/Sorting Elements


LINQ

string[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = friends.OrderBy(c => c).ToArray();

Java Streams

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = stream(friends).sorted().toArray(String[]::new);

Java6

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = newArray(sort(friends), String.class);

Challenge 10: Ordering/Sorting Elements by Specific Criterium


LINQ

string[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = friends.OrderBy(c => c.Length).ToArray();

Java Streams

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = stream(friends)
           .sorted(comparing((ToIntFunction<String>)String::length))
           .toArray(String[]::new);

Java6

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = newArray(sort(friends, by(String_.length)), String.class);

Challenge 11: Ordering/Sorting Elements by Multiple Criteria


LINQ

string[] fruits = {"grape", "passionfruit", "banana",
                   "apple", "orange", "raspberry",
                   "mango", "blueberry" };
 
//Sort the strings first by their length and then alphabetically.
//preserving the first order.
var sortedFruits = fruits.OrderBy(fruit =>fruit.Length)
                         .ThenBy(fruit => fruit);

Java Streams

String[] fruits = {"grape", "passionfruit", "banana","apple",
                   "orange", "raspberry","mango", "blueberry" };
 
Comparator<String> comparator;
comparator = comparing((Function<String,Integer>)String::length,
                       Integer::compare)
            .thenComparing((Comparator<String>)String::compareTo);
 
fruits = stream(fruits) .sorted(comparator)
                        .toArray(String[]::new);

Java6

String[] fruits = { "grape", "passionfruit", "banana", "apple",
                    "orange", "raspberry", "mango", "blueberry" };
fruits = newArray(sort(fruits, by(String_.length).then(
                               byNatural())), String.class);

Challenge 12: Grouping by a Criterium


LINQ

string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};
var groups = names.GroupBy(c => c.Length);

Java Streams

String[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};
 
Map<Integer,List<String>> groups;
groups = stream(names).collect(groupingBy(String::length));

Java6

String[] names = { "Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha" };
Map<Integer, List<String>> groups = groupBy(names, String_.length);

Challenge 13: Filter Distinct Elements


LINQ

string[] songIds = {"Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"};
//This will work as strings implement IComparable
var uniqueSongIds = songIds.Distinct();

Java Streams

String[] songIds = {"Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"};
//according to Object.equals
stream(songIds).distinct();

Java6

String[] songIds = { "Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1" };
newSet(songIds);

Challenge 14: Union of Two Sets


LINQ

List<string> friends1 = new List<string>() {"Anders", "David","James",
                                            "Jeff", "Joe", "Erik"};
List<string> friends2 = new List<string>() { "Erik", "David", "Derik" };
var allMyFriends = friends1.Union(friends2);

Java Streams

List<String> friends1 = asList("Anders","David","James","Jeff","Joe","Erik");
List<String> friends2 = asList("Erik","David","Derik");
Stream<String> allMyFriends = concat(friends1.stream(),
                                     friends2.stream()).distinct();

Java6

List<String> friends1 = newList("Anders", "David", "James", "Jeff", "Joe", "Erik");
List<String> friends2 = newList("Erik", "David", "Derik");
Set<String> allMyFriends = union(newSet(friends1), newSet(friends2));

Challenge 15: First Element


LINQ

string[] otherFriends = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
string firstName = otherFriends.First();
string firstNameConditional = otherFriends.First(c => c.Length == 5);

Java Streams

String[] otherFriends = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
Optional<String> found = stream(otherFriends).findFirst();
 
Optional<String> maybe = stream(otherFriends).filter(c -> c.length() == 5)
                                             .findFirst();
if(maybe.isPresent()) {
   //do something with found data
}

Java6

String[] otherFriends = { "Sam", "Danny", "Jeff", "Erik", "Anders", "Derik" };
Option<String> found = headOption(otherFriends);
Option<String> maybe = find(otherFriends, String_.length.andThen(equalTo(5)));
for (String m: maybe) {
    // ...
}

Challenge 16: Generate a Range of Numbers


LINQ

var multiplesOfEleven = Enumerable.Range(1, 100).Where(c => c % 11 == 0);

Java Streams

IntStream multiplesOfEleven = intRange(1,100).filter(n -> n % 11 == 0);

Java6

Iterable<Integer> multiplesOfEleven = filter(range(1, 100), mod(11).andThen(equalTo(0)));

Challenge 17: All


LINQ

string[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
bool x = persons.All(c => c.Length == 5);

Java Streams

String[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
boolean x = stream(persons).allMatch(c -> c.length() == 5);

Java6

String[] persons = { "Sam", "Danny", "Jeff", "Erik", "Anders", "Derik" };
boolean x = forAll(persons, String_.length.andThen(equalTo(5)));

Challenge 18: Any


LINQ

string[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
bool x = persons.Any(c => c.Length == 5);

Java Streams

String[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
boolean x = stream(persons).anyMatch(c -> c.length() == 5);

Java6

String[] persons = { "Sam", "Danny", "Jeff", "Erik", "Anders", "Derik" };
boolean x = exists(persons, String_.length.andThen(equalTo(5)));

Challenge 19: Zip


LINQ

string[] salutations = {"Mr.", "Mrs.", "Ms", "Master"};
string[] firstNames = {"Samuel", "Jenny", "Joyace", "Sam"};
string lastName = "McEnzie";
 
salutations.Zip(firstNames, (sal, first) => sal + " " + first)
           .ToList()
           .ForEach(c => Console.WriteLine(c + " " + lastName));

Java Streams

String[] salutations = {"Mr.", "Mrs.", "Ms", "Master"};
String[] firstNames = {"Samuel", "Jenny", "Joyace", "Sam"};
String lastName = "McEnzie";
 
zip(
    stream(salutations),
    stream(firstNames),
    (sal,first) -> sal + " " +first)
.forEach(c -> { System.out.println(c + " " + lastName); });

Java6

String[] salutations = { "Mr.", "Mrs.", "Ms", "Master" };
String[] firstNames = { "Samuel", "Jenny", "Joyace", "Sam" };
String lastName = "McEnzie";

foreach(map(zip(salutations, firstNames, repeat(lastName)), mkString(" ")),
        PrintStream_.println8.apply(System.out));

Conclusion

Based on these examples I have a funny feeling that Java8 Streams API is going to be a failure. And since developers will not be able to extend it with useful constructs, it may well end up being just another nail in the coffin.

Of these examples, personally, I find the Java6 code to be the most readable. Even with its oddities, of which most are caused by the original authors decision to use ints (instead of Integers) and Lists (instead of Iterables). The ability to do this has been around since 2007, and Java8 will be released in... 2014?

I'm a bit biased, though, so what do you think?

Saturday, January 5, 2013

Haskell and non-blocking asynchronous IO

Here begins my journey to the magnificent world of Haskell.

I was chatting with a co-worker a while back about the influence of programming languages to code quality etc. He mentioned node.js and working with Promises. I kind of responded that I don't really like Promise-hell or Callback-hell and would rather just say what I want sequentially.

A while back I started implementing a chat server and client. In Haskell. Just for fun and to learn the language. Googling for examples I quickly wrote something like

acceptLoop socket = do
    (h,_,_) <- accept socket
    hSetBuffering h NoBuffering
    forkIO $ incoming h
    acceptLoop socket

The function forkIO kind of scared me and I decided to later come back to it and find out how to do asynchronous non-blocking IO in Haskell since it's such a buzz-word nowadays.

Well, turns out Haskell is one of those rare languages that does non-blocking IO by default. forkIO function doesn't spawn a new operating system thread, but instead a light-weight (i.e. green) thread. They claim that one can spawn tens of thousands of concurrent threads on a regular laptop. A regular OS thread can be spawned with forkOS function if needed.

Basically this means that I have been programming with non-blocking IO all the time without realizing it. It's still sequential, but could parallel operations be added easily, and without all the hassle with promises?

Let's first define some long running operation, pretending that it's fetching something over a slow network connection, or whatever:

-- some long-running "remote" operation
longRemoteOperation :: String -> IO (String)
longRemoteOperation a = do
    -- random delay to make parallel operations finish in random order
    d <- getStdRandom (randomR (1000000,1001000))
    _ <- threadDelay d
    putStr a
    return a

Synchronous (that is, sequential) operations would be the basic case. This function performs n operations one after another:

-- runs n operations synchronously
sync :: Int -> IO ()
sync 0 = return ()
sync n = do
    _ <- longRemoteOperation (show n)
    sync (pred n)

The two asynchronous versions (green threads and native threads) need a hack to prevent the program from exiting before all the threads are finished. Please forgive me...

-- runs n operations asynchronously using Haskell green threads
greenThread :: Int -> IO ()
greenThread = async forkIO

-- runs n operations asynchronously using native OS threads
osThread :: Int -> IO ()
osThread = async forkOS

-- a hack to wait until all threads are finished before exiting program 
async :: (IO () -> IO t) -> Int -> IO ()
async forkMode n = do 
        mvars <- replicateM n $ run $ longRemoteOperation "*"
        forM_ mvars takeMVar
     where 
        run f = do 
            x <- newEmptyMVar 
            _ <- forkMode $ (void f) `finally` putMVar x () 
            return x

The previous functions can be used to find out how many threads my poor little laptop can handle, but they do not resemble the way async operations are normally written. So let's write two more functions to see how parallel operation differs from sequential in practice:

-- runs 5 operations sequntially
sequential :: IO ()
sequential = do
    a1 <- longRemoteOperation "1"
    [a2, a3, a4] <- mapM longRemoteOperation ["2", "3", "4"]
    a5 <- longRemoteOperation "5"
    putStrLn $ foldl1 (++) [a1, a2, a3, a4, a5]

-- runs one operation, then 3 parallel, then one more
parallel :: IO ()
parallel = do
    a1 <- longRemoteOperation "1"
    [a2, a3, a4] <- mapConcurrently longRemoteOperation ["2", "3", "4"]
    a5 <- longRemoteOperation "5"
    putStrLn $ foldl1 (++) [a1, a2, a3, a4, a5]

Whoa, hold on a second! The difference is like one word? And no meddling with promises? 

Before we get too exited I have to admit that this only demonstrates a basic case of performing three operations in parallel and only continuing when all three are finished. More complicated workflows might also complicate the code, but my poor imagination couldn't come up with realistic requirements, so I satisfied with this. Please see Control.Concurrent.Async for more information.

Let's add a main method and perform some timing to make sure everything is happening as we expect:

-- module declaration and imports, for completeness...
module Main where

import System.Environment (getArgs)
import Control.Exception (finally)
import Control.Concurrent
import Control.Concurrent.Async (mapConcurrently)
import Control.Monad (forM_, replicateM, void)
import System.Random (getStdRandom, randomR)


main :: IO ()
main = do
    args <- getArgs
    case args of
        ["sync", n]    -> sync (read n)
        ["green", n]   -> greenThread $ read n
        ["os", n]      -> osThread $ read n
        ["sequential"] -> sequential
        ["parallel"]   -> parallel
        _       -> return ()

Let's first try the simple synchronous version with five operations. In each case the code prints a thread-number (or a star) when the thread finishes:

mac:asyncIO inferior$ time ./asyncIO "sync" 5
54321
real	0m5.013s
user	0m0.006s
sys	0m0.010s

The the whole thing took five seconds as expected. Next the forked:

mac:asyncIO inferior$ time ./asyncIO "green" 5
*****
real	0m1.020s
user	0m0.004s
sys	0m0.006s
mac:asyncIO inferior$ time ./asyncIO "os" 5
*****
real	0m1.008s
user	0m0.003s
sys	0m0.005s

Both green threads and native threads run similarly, and take about one second, as expected. But how about if we increase the number of threads?

mac:asyncIO inferior$ time ./asyncIO "green" 2000 > /dev/null

real	0m1.041s
user	0m0.041s
sys	0m0.033s
mac:asyncIO inferior$ time ./asyncIO "os" 2000 > /dev/null

real	0m1.504s
user	0m0.554s
sys	0m0.511s

With 2000 threads the green-thread version still performs in about a second, but the native threads took 50% longer. 

Now if I try with 3000 native threads I get: asyncIO: user error (Cannot create OS thread.)
Unfortunately this seems to be the OS limit:

mac:asyncIO inferior$ sysctl kern.num_taskthreads
kern.num_taskthreads: 2048

Anyone know how to increase the limit on a Mac?

Still, 20000 and 100000 green threads perform really nice, and I doubt that no matter what the limits, 100000 native threads would kill my laptop =)

mac:asyncIO inferior$ time ./asyncIO "green" 20000 > /dev/null

real	0m1.331s
user	0m0.380s
sys	0m0.243s
mac:asyncIO inferior$ time ./asyncIO "green" 100000 > /dev/null

real	0m2.889s
user	0m1.905s
sys	0m1.037s

We still have the two "regular programming style" methods remaining. Let's verify that they run as expected. Each thread prints again it's number when it finishes. Finally all numbers are printed again as a "complete result". See the code if you can't figure out my explanation...

mac:asyncIO inferior$ time ./asyncIO "sequential"
1234512345

real	0m5.011s
user	0m0.005s
sys	0m0.009s
mac:asyncIO inferior$ time ./asyncIO "parallel"
1324512345

real	0m3.012s
user	0m0.005s
sys	0m0.008s

Indeed, sequential takes five seconds and always prints the numbers in order, whereas parallel takes three seconds as expected, and the order of the second, third and fourth digit randomly changes, even though the final result is always in the correct order.

Haskell seems to make this stuff really easy. Yes, I know, not everything in Haskell is easy...

Feel free to leave a Node.js example to the comments. We'll see which one is more readable ;)



Tuesday, August 21, 2012

Simple construction of common queries with JPA2

In the previous post I demonstrated an API for executing queries. Now we need some queries. Due to some odd design choices, JPA2 Criteria API isn't exactly the easiest API for query construction. Maybe we could utilize its metamodel to create an easier, statically typed way to construct simple queries. If it could help with, let's say, 75% of all the queries in a large application, it might be useful.

So I made a couple of small utility classes for constructing queries. Simple, statically typed and syntactically almost a joy (well...) to read.

Here are methods for all entities of a certain type or a single entity with a specific ID, or multiple entities matching a set of IDs:
<E extends EntityBase<?>> CriteriaQuery<E> all(Class<E> entityClass);

<E extends EntityBase<?>> CriteriaQuery<E> single(Id<E> id);

<E extends EntityBase<?>, ID extends Id<? super E>>
CriteriaQuery<E> ofIds(Iterable<ID> ids, Class<E> entityClass)

When the query result type is an Entity, we can transform it in a few ways related to that entity, since we can dig the selection or root object from the query behind the scenes. We can for example project the query to the ID or any single attribute. These are just modifications to the underlying select clause:
<E extends EntityBase<?>> CriteriaQuery<Id<E>> id(CriteriaQuery<E> query);

<E extends EntityBase<?>, A extends Attribute<? super E, ?> & Bindable<R>, R> 
CriteriaQuery<R> value(A attribute, CriteriaQuery<E> query)

We can also add restrictions, that is, modify the where clause. There's nothing really fancy happening here, but the true usefulness may come from common restrictions specific to the application in question:
<E extends EntityBase<?>, T> 
CriteriaQuery<E> attributeEquals(SingularAttribute<? super E, T> attribute, 
                                 Option<T> value, 
                                 CriteriaQuery<E> query);

<E extends EntityBase<?>, A> 
CriteriaQuery<E> attributeIn(SingularAttribute<? super E, A> attribute, 
                             Iterable<A> values, 
                             CriteriaQuery<E> query);

<E extends EntityBase<?>> 
CriteriaQuery<E> exclude(Id<E> idToExclude, CriteriaQuery<E> query);

<E extends EntityBase<?>, ID extends Id<E>> 
CriteriaQuery<E> exclude(Iterable<ID> idsToExclude, CriteriaQuery<E> query);

<E extends EntityBase<?> & Activatable, A> 
CriteriaQuery<E> active(CriteriaQuery<E> query);

<E extends EntityBase<?>> 
CriteriaQuery<E> attributeStartsWith(SingularAttribute<? super E, String> attr, 
                                     String value, 
                                     CriteriaQuery<E> query);

Here's a way to use the metamodel attributes to construct a simple query performing consecutive inner joins:
<E extends EntityBase<?>, 
R1 extends EntityBase<?>,
A1 extends Attribute<? super E, ?> & Bindable<R1>>
CriteriaQuery<R1> related(E entity, A1 r1);

<E extends EntityBase<?>, 
R1 extends EntityBase<?>, 
R2 extends EntityBase<?>, 
A1 extends Attribute<? super E, ?> & Bindable<R1>, 
A2 extends Attribute<? super R1, ?> & Bindable<R2>>
CriteriaQuery<R2> related(E entity, A1 r1, A2 r2);

//...
// similar methods for more attributes

So, this is just a way to provide a bit less insane syntax for common queries. Together with paging and sorting from the query execution interface it might actually cover the most common needs.

Here's an example of querying certain municipality names of employees from a department:
// first find out the ID for Turku. One DB query, single value resultset:
Id<Municipality> turkuId = dao.get(
    Restrict.attributeEquals(Municipality_.name, Some("Turku"), 
      Query.all(Municipality.class),
  Project.id());

// we have a department to start with. No DB queries at this point:
Department dep = dao.getProxy(someDepId);

// query for the names of the home municipalities of employees from dep,
// excluding Turku for whatever reason, considering only active municipalities
// (whatever that means...), ordering by postal code and taking page 5. 
// Single DB query, only string values in the resultset.
List<String> municipalityNames = dao.getList(
    Restrict.active(
      Restrict.exclude(turkuId,
        Query.related(dep, Department_.employees, Employee_.homeMunicipality))),
  Page.of(5),
  Order.by(Municipality_.postalCode),
  Project.value(Municipality_.name));

The pure JPA2 Criteria Queries are almost impossible to read due to the design choices they made. Even the most simple query cannot be constructed with a single expression. There are some third party libraries that provide a more sensible way for constructing queries, for example QueryDSL from Mysema. However, that kind of approach requires a big leap to practically another query language. It might give a lot more readable queries, but at the same time we may lose possibilities to create useful abstractions if the library doesn't provide enough extension points. Most often they don't, although I do not have any first hand experience with QueryDSL.

The alternative approach presented here suffers from a bit awkward syntax and a limited applicability, but on the other hand, is only a thin wrapper around the Criteria API without causing any limitations. In the unfortunate case that a project team decides to actually use JPA2 Criteria Queries, using this kind of  approach for query construction is not a giant leap to take.

Earlier we went through a way to execute queries various ways with paging, ordering and simple projections. Now we have looked at a way to construct and modify simple queries without enormous pain and without external libraries. Next up, querying complex projections from an arbitrary CriteriaQuery.

Friday, August 10, 2012

Java Persistence API 2. Still useless?

In my day job I have the "privilege" to use JPA (Hibernate in practise) for persistence. So when JPA2 was released I was eager to find out if they had actually corrected their mistakes and made a framework comparable to competitors. Well, they hadn't.

Don't get me wrong. I actually don't hate Hibernate. I just hate the ORM part of it, since ORM is broken. Abstractions are a must, and SQL-level is IMHO a bit too low-level for "regular stuff". Hibernate (or an alternative of your choice, I don't really care) is useful for stuff like typing, query generation and since the introduction of JPA2 Metamodel also for describing the table structure at the application level.

Shortly after Hibernate support for JPA2 was officially released we decided to try it out in a real project. We chose to use the new Criteria API due to static type safety and pure interest. Well, as a colleague put it, the Criteria API is write-only: nearly unreadable. Who's the idiot that decided to make it the way it is?

Anyway, composability - as we all know - is the mother of all software design patterns. Functional programming languages are composable by nature. Pure SQL is composable by nature. So, how come people tend to completely forget composability when programming in Java?

I spent hours and hours trying to figure out how to create composable queries with the Criteria API, but I just couldn't come up with anything useful. Every strategy seemed equally awkward. Had they really created yet another non-composable sql-api?

Yes, I believe they had. So I decided to try something else. If I cannot reuse queries to construct other queries, maybe I could at least reuse the whole queries? I decided to separate the queries from their execution (if this feels somewhat obvious to you then I guess I'm just way behind you). This way I could create an arbitrary query of an entity E and use it to query for an E, many E:s, count or existence of E:s or - this is the good part - any (trivial) projection of E. And the same with either paging or trivial sorting or both...

Assume we have an arbitrarily complex query returning rows of Department. Then we can use that same query for different use cases:
Page page = Page.FIRST;
CriteriaQuery<Department> q = ...;
Department dep              = dao.get(q);
Option<Department> dep      = dao.find(q);
long amountOfDeps            = dao.count(q);
boolean depsExist            = dao.exists(q);
Option<Department> firstDep = dao.findFirst(q, Order.by(Department_.name));
Collection<Department> deps = dao.getList(q);
List<Department> deps       = dao.getList(q, page, Order.by(Department_.name));
                              dao.removeAll(q);

I made a really simple API to execute the queries. The idea was to restrict the possibilities of the developer as much as possible, so that there's no chance for screw-ups and thus less need for testing. I dislike testing and I loath TDD since it's completely distorted as a way to think, but that rant is for another blog post...

The API also supports execution of native and HQL queries, but their usage is limited since they don't contain the metadata needed to do stuff. The idea was that the business logic could just pick a query and execute it (or some projection etc of it) without needing to know its implementation. But on the other hand, it's nice that the compiler complains for example when the query implementation is changed to not support projections.

I use type signatures as much as possible to restrict how the specific queries can be executed. For example, remove-method can only be used for queries resulting in Removable entities, ordering can be used only for queries resulting in entities, and with the help of the metamodel, projections and sorting can only be made to existing attributes.

Here are the methods for executing the previous queries. Please correct me if the signatures are sub-optimal:
<T> T get(CriteriaQuery<T> query) throws NoResultException, NonUniqueResultException;

<T> Option<T> find(CriteriaQuery<T> query) throws NonUniqueResultException;

long count(CriteriaQuery<?> query);

boolean exists(CriteriaQuery<?> query);

<E extends EntityBase<?>> Option<E>
findFirst(CriteriaQuery<E> query,
          Iterable<? extends Order<? super E,?>> ordering);

<T> Collection<T> getList(CriteriaQuery<T> query);

<E extends EntityBase<?>> List<E>
getList(CriteriaQuery<E> query,
        Page page,
        Iterable<? extends Order<? super E, ?>> ordering);

<ID extends Id<E>, E extends EntityBase<ID> & Removable>
void removeAll(CriteriaQuery<E> query)

And here's one more and an example of querying a projection:
<E extends EntityBase<?>,R> Collection<R>  
getList(CriteriaQuery<E> query, ConstructorMeta_<E,R> constructor);

class DepartmentDto {
  DepartmentDto(Id<Department> id, String name, Set<Manager> managers) {...}
}

CriteriaQuery<Department> query = ...;
Collection<DepartmentDto> dto = dao.get(query,
                                DepartmentDto_.c1(Department_.id,
                                                  Department_.name,
                                                  Department_.managers));


There were some problems, as there always is. Apparently the Criteria API is not designed in a way that the queries could be modified freely. So we had to make sure that the queries are always constructed with the parameterles variant, CriteriaBuilder.createQuery(), to result in Object, and then casted to the correct type. Not a real problem, but a bit of a nasty hack. Later I removed that limitation by copying the queries when needed, but apparently they are not designed to be copied either =) So, the whole thing might mysteriously fail some day with complex queries. Welcome to the mutable, stateful world of Java filled with horrible APIs...

In the end, I'm really satisfied with this query-execution-separation since it greatly increased reusability of our queries. And still remained statically type safe. In the next blog post I will present "the next step towards LINQ": How to construct queries with minimal (well, sort of) pain yet statically (well, mostly) typed. Turns out that we can easily construct queries for whole entity hierarchies (or something...) populating DTOs through constructors type safely, without n+1-problems. The approach has some limitations, but it might well be enough for 90% (or not...) of queries, which would be a blast =)

Thursday, January 12, 2012

Statically typed Vector and Matrix algebra

I took a course on Machine Learning a while back, just for fun. Stanford University seems to have really put some effort into free courses in the Internet, since Artificial Intelligence and Introduction to Databases were also time well spent.

The exercises in Octave reminded me of Matlab and the pain I had to endure while studying in Tampere University of Technology. It would be so much easier if the editor actually complained when trying to multiply matrices of incompatible dimensions etc. Could Scala perhaps be used to provide some static type safety to matrix operations?

Well, you guessed it.

Creating a linear algebra library wouldn't be the most exciting project (well, not this time, anyway...) so I decided to make a thin wrapper for Scalala. I was not striving for a full-featured library, but instead more like a proof-of-concept, so I only implemented a few operations.

I'm not smart enough to come up with the required type algebra, so I shamelessly copied the hard parts from here. Hopefully some day I'm going to understand all those lines that went almost unmodified through my clipboard...

Haskell is another great language and can provide more or less similar type safety. Since I don't speak Haskell that well, I'll let you read about a Haskell implementation here.

I made my code available in GitHub. Feel free to use it as you will. The name net.lahteenmaki.scalam is due to my total lack of imagination, sorry about that.

Here's a demo. First of all, only a single import is needed to use all the functionality
scala> import net.lahteenmaki.scalam._
import net.lahteenmaki.scalam._

Create some regular vectors containing integers
scala> val v2 = Vector(1,2)
v2: net.lahteenmaki.scalam.RowVector[Int,D2] = 1  2 

scala> val v3 = Vector(1,2,3)
v3: net.lahteenmaki.scalam.RowVector[Int,D3] = 1  2  3 

or doubles. Actually anything scalala.scalar.Scalar[T]
scala> Vector(1.0,2.0)
res1: net.lahteenmaki.scalam.RowVector[Double,D2] = 1.00000   2.00000 

Trying to create a vector with differing element types gives a compiler error
scala> Vector(1,2.0)
<console>:11: error: T is not a scalar value
              Vector(1,2.0)
                    ^

Transposing a row vector creates a column vector of the same dimension
scala> v2.T
res3: net.lahteenmaki.scalam.ColumnVector[Int,D2] =1 2 

I included some implicits to create vectors from tuples
scala> (1,2).T
res4: net.lahteenmaki.scalam.ColumnVector[Int,D2] =1 2 

There's nothing special in scalar multiplication, except that the element types change similar to Scalala
scala> v2*2
res5: net.lahteenmaki.scalam.RowVector[Int,D2] = 2  4 

scala> v2*2.0
res6: net.lahteenmaki.scalam.RowVector[Double,D2] = 2.00000   4.00000 

Addition should retain the dimensions and be only allowed to vectors of the same dimension
scala> v2 + v2
res7: net.lahteenmaki.scalam.RowVector[Int,D2] = 2  4 

scala> Vector(1,2) + Vector(1.0,2.0)
res8: net.lahteenmaki.scalam.RowVector[Double,Succ[Succ[D0]]] = 2.00000   4.00000 

scala> v2 + v3
<console>:13: error: overloaded method value + with alternatives:
 [B](other: net.lahteenmaki.scalam.RowVector[B,D2])
    (implicit o: v2.BinOp[B,scalala.operators.OpAdd])
    net.lahteenmaki.scalam.RowVector[B,D2]
 <and>
 [B](other: net.lahteenmaki.scalam.Matrix[B,D1,D2])
    (implicit o: v2.BinOp[B,scalala.operators.OpAdd])
    net.lahteenmaki.scalam.Matrix[B,D1,D2]
 cannot be applied to (net.lahteenmaki.scalam.RowVector[Int,D3])
              v2 + v3
                 ^
Yes, we did get a compile time error. Splendid.

Vector multiplication is also only defined for compatible sizes
scala> v2 * v2.T
res10: net.lahteenmaki.scalam.Matrix[Int,D1,D1] = 5 

scala> v2 * v2
<console>:12: error: Could not find a way to  values of type
net.lahteenmaki.scalam.RowVector[Int,D2] and scalala.operators.OpMulMatrixBy
              v2 * v2
                 ^

scala> v2 * v3
<console>:13: error: Could not find a way to  values of type
net.lahteenmaki.scalam.RowVector[Int,D3] and scalala.operators.OpMulMatrixBy
              v2 * v3
                 ^
Again, the compiler won't let me multiply a row vector with another one. Nice.

How about concatenating vectors?
scala> v2 ++ v3
res13: net.lahteenmaki.scalam.RowVector[Int,Add[D2,D3]] = 1  2  1  2  3 

scala> val v: RowVector[Int,D5] = v2 ++ v3
v: net.lahteenmaki.scalam.RowVector[Int,D5] = 1  2  1  2  3 

scala> v2 ++ v2.T
<console>:12: error: type mismatch;
 found   : net.lahteenmaki.scalam.ColumnVector[Int,D2]
 required: net.lahteenmaki.scalam.Matrix[Int,D1,?]
              v2 ++ v2.T
                       ^
The compiler can deduce the dimension of the result, and won't let me concatenate a row vector with a column vector. Just what I wanted.

Then the classic over-indexing case
scala> v2[D1]
res15: Int = 1

scala> v2[D2]
res16: Int = 2

scala> v2[D3]
<console>:12: error: Cannot prove that
D3#Compare[D2]#Match[True,True,False,Bool] =:= True.
              v2[D3]
                ^
Spectacular. The compiler won't let me get an element n+1 from an n-dimensional vector.

Same operations can be implemented for matrices, as well as some helper methods for constructing simple matrices
scala> val m22 = Matrix.ones[Int,D2]
m22: net.lahteenmaki.scalam.Matrix[Int,D2,D2] =
1  1 
1  1 

scala> val m23 = Matrix.ones[Int,D2,D3]
m23: net.lahteenmaki.scalam.Matrix[Int,D2,D3] =
1  1  1 
1  1  1 

scala> Matrix.zeros[Double,D2]
res18: net.lahteenmaki.scalam.Matrix[Double,D2,D2] =
 0.00000   0.00000 
 0.00000   0.00000 

scala> Matrix.rand[D5,D5]
res19: net.lahteenmaki.scalam.Matrix[Int,D5,D5] =
8   6   10  2   2  
3   2   11  1   15
10  1   18  9   5  
11  5   8   10  18 
0   17  2   12  24 

scala> m22.T
res20: net.lahteenmaki.scalam.Matrix[Int,D2,D2] =
1  1 
1  1 

scala> m22 + m22
res21: net.lahteenmaki.scalam.Matrix[Int,D2,D2] =
2  2 
2  2 

scala> m22 + m23
<console>:13: error: type mismatch; 
found   : net.lahteenmaki.scalam.Matrix[Int,D2,D3] 
required: net.lahteenmaki.scalam.Matrix[?,D2,D2]
              m22 + m23
                    ^

scala> m22 * 5.5
res23: net.lahteenmaki.scalam.Matrix[Double,D2,D2] =
 5.50000   5.50000 
 5.50000   5.50000 

scala> m22 * m23
res24: net.lahteenmaki.scalam.Matrix[Int,D2,D3] =
2  2  2 
2  2  2 

scala> m22 * v2
<console>:13: error: Could not find a way to  values of type
 net.lahteenmaki.scalam.RowVector[Int,D2] and scalala.operators.OpMulMatrixBy
              m22 * v2
                 ^

scala> v3 * Matrix.rand[D1,D5]
<console>:12: error: Could not find a way to  values of type
 net.lahteenmaki.scalam.Matrix[Int,D1,D5] and scalala.operators.OpMulMatrixBy
              v3 * Matrix.rand[D1,D5]
                ^

scala> m23 * m22
<console>:13: error: Could not find a way to  values of type
 net.lahteenmaki.scalam.Matrix[Int,D2,D2] and scalala.operators.OpMulMatrixBy
              m23 * m22
                 ^

scala> m23[D1,D1]
res28: Int = 1

scala> m23[D2,D3]
res29: Int = 1

scala> m23[D3,D3]
<console>:12: error: Cannot prove that
 D3#Compare[D2]#Match[True,True,False,Bool] =:= True.
              m23[D3,D3]
                 ^

Everything is working for small vectors and matrices, but how about bigger ones? I actually only declared dimensions from D1 to D22, but one could always declare more, probably generate them
scala> val v7 = Vector(1,2,3,4,5,6,7)
v7: net.lahteenmaki.scalam.RowVector[Int,D7] = 1  2  3  4  5  6  7 

scala> val v21 = v7 ++ v7 ++ v7
v21: net.lahteenmaki.scalam.RowVector[Int,Add[Add[D7,D7],D7]] =
 1  2  3  4  5  6  7  1  2  3  4  5  6  7  1  2  3  4  5  6  7 

scala> val v23 = v21 ++ Vector(22,23)
v23: net.lahteenmaki.scalam.RowVector[Int,Add[Add[Add[D7,D7],D7],D2]] =
 1  2  3  4  5  6  7  1  2  3  4  5  6  7  1  2  3  4  5  6  7  22  23 

scala> v23[D23]
<console>:14: error: not found: type D23
              v23[D23]
                  ^
<console>:14: error: Cannot prove that
 (Add[Add[Add[D7,D7],D7],D2],)#Match[True,True,False,Bool] =:= True.
              v23[D23]
                 ^

scala> type D23 = Succ[D22]
defined type alias D23

scala> v23[D23]
res32: Int = 23

So, this is nice. Almost too good to be true?

There are some issues, of course. You probably noticed already in the beginning that the produced error messages aren't exactly helpful for an average programmer. This might be improved if Scala introduced more features like @implicitNotFound that could be used to provide the compiler with custom error messages.

Also, in cases where the dimension changes, the compiler cannot deduce the resulting dimension, but instead gives out the cryptic Add[Add[...]] signatures which need to be manually casted to "readable" signatures, if needed. This might be just an issue with my implementation, though, I don't know.

Perhaps the biggest problem might turn out to be performance. Compiling Scala is already a heavy job, and handling types for a 10000x10000 matrix might just be beyond any possible compiler optimizations.

Tuesday, June 28, 2011

Composable querying with Scala

It's been a while since my last post, but finally I was able to find enough time to re-implement the whole querying-thing and experiment with composability. My initial implementation was partly mutable (orgh, sorry about that...) which made true composability somewhat difficult (well, impossible, I guess).

I've been using JPA2 (Java Persistence API, version 2) for a while now at my day job, and I finally figured out a way to make the metamodel and the criteria API at least somewhat useful. This was done by separating the construction of the queries from actually executing them, which conveniently allowed me to use the same query with different projections, pagination etc.

But no matter what I tried I just couldn't figure out how to construct the queries by composing them from reusable parts. The predicate seems to be the biggest unit of reusability, but it's more of a joke than anything useful. My current belief is that JPA2 just cannot be used to create composable queries.

Now welcome Scala. LINQ in .NET also boasts with composability, and why shouldn't it, since I guess it really works. Scala has its for comprehension, and behold, it can be used to create composable queries. Here are some examples that actually work (well, since SQL generation isn't one of the most interesting problems in software science, my generated SQL might be more or less incorrect, but it does seem to work correctly on a h2 database):

object Queries {
  def wellPaidEmployees(es: View[Employee]) = for {
    e <- es if e.salary.isDefined && e.salary.get > 3000
  } yield e

  def namesAndSalariesOf(es: View[Employee]) = for {
    e <- es
  } yield (e.fullName, e.salary)

  def namesAndSalariesOfWellPaidEmployees(es: View[Employee]) =
      namesAndSalariesOf(wellPaidEmployees(es))

  def increasedSalariesForDepartment(ds: View[Department]) = for {
    d <- ds
    e <- d.employees if e.salary.isDefined
  } yield (e.fullName, "Old salary: ", e.salary.get,
           "New salary: ", e.salary.get + e.salary.get*2%42)

  val itDepartment = for (d <- Departments if d.name == "IT") yield d

  val rndDepartment = for {
    d <- Departments if d.name.toLowerCase contains "research"
  } yield d

  val rndEmployees = for (d <- rndDepartment; e <- d.employees) yield e

  val wellPaidRnDEmployees = wellPaidEmployees(rndEmployees)

  val exceptionalSalaries = for {
    e <- Employees if e.fullName contains "Bill"
  } yield e.salary.get

  val employeesFromITandRndDepartments = (for {
    d1 <- itDepartment
    d2 <- rndDepartment
    e1 <- d1.employees
    e2 <- d2.employees
  } yield Set(e1, e2)).flatten.distinct

  val amountOfEmployeesFromITandRndDepartments =
      employeesFromITandRndDepartments size

  val exceptionalSalariesFromRndDepartment = for {
    e <- rndEmployees if exceptionalSalaries contains e.salary.get
  } yield e.salary

  val namesAndSalariesOfWellPaidRnDEmployees =
      namesAndSalariesOfWellPaidEmployees(rndEmployees)

  val increasedSalariesForRnD = increasedSalariesForDepartment(rndDepartment)
}

Now look at that, it's just beautiful! I can use independent queries as part of other queries, or define incomplete queries that can be completed by providing the missing parts. Also, there's no need for the actual database session while constructing the queries. Since all the queries are immutable, they can be defined as singleton values.

And all these compile and work if I throw away all SQL stuff and just use case classes and in-memory collections. I only need to change View[_] to Traversable[_] or declare something like type View[E] = Traversable[E]. The last of the queries, when executed, generates SQL like this:

SELECT e48.fullName, 'Old salary: ', e48.salary, 'New salary: ',
      (e48.salary+MOD((e48.salary*2),42))
FROM (SELECT d49.*
      FROM Department d49
      WHERE LOWER(d49.name) LIKE '%research%') d50
INNER JOIN Employee e48 ON d50.id=e48.department_id
WHERE e48.salary IS NOT NULL

While these examples already demonstrate some implemented "SQL features", I guess I'm now going to spend some time implementing a bunch of more stuff to see if I run into trouble. After that, it would be fascinating to try querying XML...

Monday, February 14, 2011

Querying with Scala

Let's say we have a simple domain model with departments and employees (behold my imagination...). Forget all persistence or SQL related stuff, let's just have it all in-memory:

object InMemory {
  case class Employee(name: String, salary: Option[Int])
  case class Department(name: String, employees: Set[Employee])

  val jack = Employee("Jack Janitor", Some(2000))
  val jill = Employee("Jill Jitter", None)
  val matt = Employee("Matt Manager", Some(3250))
  val sarah = Employee("Sarah Surrender", Some(3000))
  val bill = Employee("Bill Biller", Some(4500))

  def employees = Seq(jack, jill, matt, sarah, bill)
  def departments = Seq(Department("Research and Development", Set(bill, sarah)),
                        Department("IT", Set(jack, jill)), 
                        Department("Management", Set(matt)))
}

How about querying the data?

Since the language of this example is Scala, I would like to write the queries in Scala. Had I implemented this in Java, I would be wanting to walk the object graph iterating collections. But I cannot just go and iterate through all the employees of a department to find those whose salary is high enough, since in real life that might cause all the employees of the department to be loaded from the database, in some cases one-by-one. So I'm forced to use some silly JPQL or a criteria query to give the system the power to properly optimize my actions. The important thing here is that what I really want to do is not to iterate through a collection, but to declare that I'm interested in employees belonging to a certain department. The iteration is just the implementation of this problem in Java. As a friend of mine said, I'm over-specifying the problem by performing the iteration.

Scala does not force this over-specification. I can use for-comprehension for querying, which is quite abstract regarding what's actually happening behind the scenes:

import InMemory._

val wellPaidEmployees = for {
  d <- departments
  e <- d.employees if e.salary.isDefined && e.salary.get > 3000
} yield e

val namesAndSalariesOfRnDEmployees = for {
  d <- departments if d.name startsWith "Research"
  e <- d.employees
} yield (e.name, e.salary)

val underpaidEmployees = for {
  e <- employees if !e.salary.isDefined || e.salary.get + 100 < 3300
} yield (e.name, e.salary.getOrElse(0) % 42)

The syntax Scala offers is actually so abstract, that it shows in no way that I'm actually picking stuff from in-memory collections. This immediately raises an interesting question: what exactly is needed to move this data to an SQL database?

First of all, the case classes defining the model are a bit too in-memory-specific. Let's change them a bit:

import engine._
import engine.Types._
import engine.Scalaq._
package External {
  class Department extends Table {
    val name = $[String]
    val employees: ->*[Employee] = ->*(_.department)
  }
  class Employee extends Table {
    val name = $[String]
    val salary = ?[Int]
    val department: ->[Department] = ->(_.employees)
  }
  def departments = new Department
  def employees = new Employee
}

This is actually declaring the same information, but it also builds a model of the model, i.e. a meta model. Forgive my choice of "names" to define properties and relations, I have a bad habit to sometimes strip away unnecessary characters =). Now by changing the import of InMemory to External in the query examples, the same code compiles. This is exactly what I want. The type of the data storage should not affect my queries, since I'm not querying the database, I'm querying the data.

At this point you might be thinking: Hey, this idiot is trying to build yet another tool to abstract away SQL completely from the application. That's not my intention at all. Abstraction is always a compromise. When we abstract away the fact that our data store is an SQL database, we give away a bunch of tools it provides. There are and always will be queries so complex or so resource-hungry that one just cannot give a satisfying implementation without assuming an SQL backend. At some point that's not enough, and one needs to know it's an Oracle 11g database. Therefore, every abstraction like this should only strive to solve 95% or so of the cases.

Back to the queries. After changing the import clause the for-comprehensions don't return the actual data anymore, they return some objects containing the information needed to later construct the actual query against the data store, whatever it is. You might have noticed that none of the example codes had anything related to SQL (well, the base class name Table should probably be something else...).  If we add some jdbc-connection-related helper methods (not listed), we can actually perform these queries against a database:

val Seq(a,b,c) = transaction("jdbc:h2:mem:test") { implicit c =>
  import engine.sql._
  val session = new Session with H2Dialect
  import session._
  execute(generateDDL(departments, employees))
  testData foreach execute
  Seq(executeQuery(generateQuery(wellPaidEmployees)),
      executeQuery(generateQuery(namesAndSalariesOfRnDEmployees)),
      executeQuery(generateQuery(underpaidEmployees)))
}

First the SQL schema is created based on the model definition and populated with some test data. Then the SQL corresponding to the queries are generated and the resulting strings executed. Printing the final three string objects will print the actual results of the queries.

The current implementation of the engine is rather simple with a few hundred lines of somewhat readable Scala. This means that although implicits are being used quite heavily, the concept as a whole is still quite easily comprehensible.

So, is this somehow revolutionary? Hell no. It's a simple example performing simple queries. All the important stuff like composability or alternative data stores are still missing. On the other hand, does e.g. JPA have those properties?

Various nice features can be spotted in this implementation (or could be, if you looked at my source code):
  • pure, static, compiled Scala
  • statically and strongly typed (one cannot compare a string to an integer, or directly use an optional value...)
  • DDL generation
  • some basic SQL features including inner joins, comparisons, string matching and some arithmetic functions
  • possibility to pass data store specific parameters (like max length of varchar) to the model properties
  • custom types ("user types")
Aggregate functions seem also implementable, though I don't yet have them finished. Composition is something I must experiment with soon since it's an important feature. Other experiments include inserts/updates, populating objects with the data easily, some other data store types... These might bring some additional noise to the model declaration but hopefully keep the queries abstract.

There is a project called ScalaQuery which has implemented something like this. I do not like it's approach, though, which is stated in its overview in the web site (I highlighted the annoying parts): 
ScalaQuery is an API / DSL (domain specific language) built on top of JDBC for accessing relational databases in Scala.
I consider basic querying as an abstract thing having no relation to the type of the data store, but ScalaQuery is making ties to things like JDBC and SQL. This is also visible in its syntax. I haven't yet found a need to make that kind of deviations from regular Scala, but it might be that I just haven't been there yet.

The examples I've given are just my initial experiments, and the syntax is most likely going to change at least slightly. I'm hoping that additional features won't force me to bring any additional verbosity, though. I will post a working jar-file later so that you can try it if you have any interest. I will also post all source code in the future, when I'm done enough experimenting.

If You have any thoughts of this kind of abstract querying in Scala, I'd be glad to hear your thoughts. Now I'm heading to JFokus, see you there.

Saturday, February 5, 2011

Greetings!

I felt like I needed a place to write some thoughts and views of programming. So, I decided to create a blog. I was going to write a wiki-style site in Scala but since my free time is awfully limited, it would have taken years =) Well, I do have such a site for sharing some events from the lives of my two boys, but the functionality is quite limited...

I hope I can find the time to write something every now and then. Currently I'm experimenting with how Scala could be used for LINQ-style abstract queries, of which I hope to write something soon. There is already a similar project called ScalaQuery, but I didn't like some of the approaches that it seemed to take so I decided to write my own =). There's also a paper called ScalaQL, which I haven't read yet, but I suspect I have re-invented some of the same wheels. Might be wrong though...