Thursday, January 12, 2012

Statically typed Vector and Matrix algebra

I took a course on Machine Learning a while back, just for fun. Stanford University seems to have really put some effort into free courses in the Internet, since Artificial Intelligence and Introduction to Databases were also time well spent.

The exercises in Octave reminded me of Matlab and the pain I had to endure while studying in Tampere University of Technology. It would be so much easier if the editor actually complained when trying to multiply matrices of incompatible dimensions etc. Could Scala perhaps be used to provide some static type safety to matrix operations?

Well, you guessed it.

Creating a linear algebra library wouldn't be the most exciting project (well, not this time, anyway...) so I decided to make a thin wrapper for Scalala. I was not striving for a full-featured library, but instead more like a proof-of-concept, so I only implemented a few operations.

I'm not smart enough to come up with the required type algebra, so I shamelessly copied the hard parts from here. Hopefully some day I'm going to understand all those lines that went almost unmodified through my clipboard...

Haskell is another great language and can provide more or less similar type safety. Since I don't speak Haskell that well, I'll let you read about a Haskell implementation here.

I made my code available in GitHub. Feel free to use it as you will. The name net.lahteenmaki.scalam is due to my total lack of imagination, sorry about that.

Here's a demo. First of all, only a single import is needed to use all the functionality
scala> import net.lahteenmaki.scalam._
import net.lahteenmaki.scalam._

Create some regular vectors containing integers
scala> val v2 = Vector(1,2)
v2: net.lahteenmaki.scalam.RowVector[Int,D2] = 1  2 

scala> val v3 = Vector(1,2,3)
v3: net.lahteenmaki.scalam.RowVector[Int,D3] = 1  2  3 

or doubles. Actually anything scalala.scalar.Scalar[T]
scala> Vector(1.0,2.0)
res1: net.lahteenmaki.scalam.RowVector[Double,D2] = 1.00000   2.00000 

Trying to create a vector with differing element types gives a compiler error
scala> Vector(1,2.0)
<console>:11: error: T is not a scalar value
              Vector(1,2.0)
                    ^

Transposing a row vector creates a column vector of the same dimension
scala> v2.T
res3: net.lahteenmaki.scalam.ColumnVector[Int,D2] =1 2 

I included some implicits to create vectors from tuples
scala> (1,2).T
res4: net.lahteenmaki.scalam.ColumnVector[Int,D2] =1 2 

There's nothing special in scalar multiplication, except that the element types change similar to Scalala
scala> v2*2
res5: net.lahteenmaki.scalam.RowVector[Int,D2] = 2  4 

scala> v2*2.0
res6: net.lahteenmaki.scalam.RowVector[Double,D2] = 2.00000   4.00000 

Addition should retain the dimensions and be only allowed to vectors of the same dimension
scala> v2 + v2
res7: net.lahteenmaki.scalam.RowVector[Int,D2] = 2  4 

scala> Vector(1,2) + Vector(1.0,2.0)
res8: net.lahteenmaki.scalam.RowVector[Double,Succ[Succ[D0]]] = 2.00000   4.00000 

scala> v2 + v3
<console>:13: error: overloaded method value + with alternatives:
 [B](other: net.lahteenmaki.scalam.RowVector[B,D2])
    (implicit o: v2.BinOp[B,scalala.operators.OpAdd])
    net.lahteenmaki.scalam.RowVector[B,D2]
 <and>
 [B](other: net.lahteenmaki.scalam.Matrix[B,D1,D2])
    (implicit o: v2.BinOp[B,scalala.operators.OpAdd])
    net.lahteenmaki.scalam.Matrix[B,D1,D2]
 cannot be applied to (net.lahteenmaki.scalam.RowVector[Int,D3])
              v2 + v3
                 ^
Yes, we did get a compile time error. Splendid.

Vector multiplication is also only defined for compatible sizes
scala> v2 * v2.T
res10: net.lahteenmaki.scalam.Matrix[Int,D1,D1] = 5 

scala> v2 * v2
<console>:12: error: Could not find a way to  values of type
net.lahteenmaki.scalam.RowVector[Int,D2] and scalala.operators.OpMulMatrixBy
              v2 * v2
                 ^

scala> v2 * v3
<console>:13: error: Could not find a way to  values of type
net.lahteenmaki.scalam.RowVector[Int,D3] and scalala.operators.OpMulMatrixBy
              v2 * v3
                 ^
Again, the compiler won't let me multiply a row vector with another one. Nice.

How about concatenating vectors?
scala> v2 ++ v3
res13: net.lahteenmaki.scalam.RowVector[Int,Add[D2,D3]] = 1  2  1  2  3 

scala> val v: RowVector[Int,D5] = v2 ++ v3
v: net.lahteenmaki.scalam.RowVector[Int,D5] = 1  2  1  2  3 

scala> v2 ++ v2.T
<console>:12: error: type mismatch;
 found   : net.lahteenmaki.scalam.ColumnVector[Int,D2]
 required: net.lahteenmaki.scalam.Matrix[Int,D1,?]
              v2 ++ v2.T
                       ^
The compiler can deduce the dimension of the result, and won't let me concatenate a row vector with a column vector. Just what I wanted.

Then the classic over-indexing case
scala> v2[D1]
res15: Int = 1

scala> v2[D2]
res16: Int = 2

scala> v2[D3]
<console>:12: error: Cannot prove that
D3#Compare[D2]#Match[True,True,False,Bool] =:= True.
              v2[D3]
                ^
Spectacular. The compiler won't let me get an element n+1 from an n-dimensional vector.

Same operations can be implemented for matrices, as well as some helper methods for constructing simple matrices
scala> val m22 = Matrix.ones[Int,D2]
m22: net.lahteenmaki.scalam.Matrix[Int,D2,D2] =
1  1 
1  1 

scala> val m23 = Matrix.ones[Int,D2,D3]
m23: net.lahteenmaki.scalam.Matrix[Int,D2,D3] =
1  1  1 
1  1  1 

scala> Matrix.zeros[Double,D2]
res18: net.lahteenmaki.scalam.Matrix[Double,D2,D2] =
 0.00000   0.00000 
 0.00000   0.00000 

scala> Matrix.rand[D5,D5]
res19: net.lahteenmaki.scalam.Matrix[Int,D5,D5] =
8   6   10  2   2  
3   2   11  1   15
10  1   18  9   5  
11  5   8   10  18 
0   17  2   12  24 

scala> m22.T
res20: net.lahteenmaki.scalam.Matrix[Int,D2,D2] =
1  1 
1  1 

scala> m22 + m22
res21: net.lahteenmaki.scalam.Matrix[Int,D2,D2] =
2  2 
2  2 

scala> m22 + m23
<console>:13: error: type mismatch; 
found   : net.lahteenmaki.scalam.Matrix[Int,D2,D3] 
required: net.lahteenmaki.scalam.Matrix[?,D2,D2]
              m22 + m23
                    ^

scala> m22 * 5.5
res23: net.lahteenmaki.scalam.Matrix[Double,D2,D2] =
 5.50000   5.50000 
 5.50000   5.50000 

scala> m22 * m23
res24: net.lahteenmaki.scalam.Matrix[Int,D2,D3] =
2  2  2 
2  2  2 

scala> m22 * v2
<console>:13: error: Could not find a way to  values of type
 net.lahteenmaki.scalam.RowVector[Int,D2] and scalala.operators.OpMulMatrixBy
              m22 * v2
                 ^

scala> v3 * Matrix.rand[D1,D5]
<console>:12: error: Could not find a way to  values of type
 net.lahteenmaki.scalam.Matrix[Int,D1,D5] and scalala.operators.OpMulMatrixBy
              v3 * Matrix.rand[D1,D5]
                ^

scala> m23 * m22
<console>:13: error: Could not find a way to  values of type
 net.lahteenmaki.scalam.Matrix[Int,D2,D2] and scalala.operators.OpMulMatrixBy
              m23 * m22
                 ^

scala> m23[D1,D1]
res28: Int = 1

scala> m23[D2,D3]
res29: Int = 1

scala> m23[D3,D3]
<console>:12: error: Cannot prove that
 D3#Compare[D2]#Match[True,True,False,Bool] =:= True.
              m23[D3,D3]
                 ^

Everything is working for small vectors and matrices, but how about bigger ones? I actually only declared dimensions from D1 to D22, but one could always declare more, probably generate them
scala> val v7 = Vector(1,2,3,4,5,6,7)
v7: net.lahteenmaki.scalam.RowVector[Int,D7] = 1  2  3  4  5  6  7 

scala> val v21 = v7 ++ v7 ++ v7
v21: net.lahteenmaki.scalam.RowVector[Int,Add[Add[D7,D7],D7]] =
 1  2  3  4  5  6  7  1  2  3  4  5  6  7  1  2  3  4  5  6  7 

scala> val v23 = v21 ++ Vector(22,23)
v23: net.lahteenmaki.scalam.RowVector[Int,Add[Add[Add[D7,D7],D7],D2]] =
 1  2  3  4  5  6  7  1  2  3  4  5  6  7  1  2  3  4  5  6  7  22  23 

scala> v23[D23]
<console>:14: error: not found: type D23
              v23[D23]
                  ^
<console>:14: error: Cannot prove that
 (Add[Add[Add[D7,D7],D7],D2],)#Match[True,True,False,Bool] =:= True.
              v23[D23]
                 ^

scala> type D23 = Succ[D22]
defined type alias D23

scala> v23[D23]
res32: Int = 23

So, this is nice. Almost too good to be true?

There are some issues, of course. You probably noticed already in the beginning that the produced error messages aren't exactly helpful for an average programmer. This might be improved if Scala introduced more features like @implicitNotFound that could be used to provide the compiler with custom error messages.

Also, in cases where the dimension changes, the compiler cannot deduce the resulting dimension, but instead gives out the cryptic Add[Add[...]] signatures which need to be manually casted to "readable" signatures, if needed. This might be just an issue with my implementation, though, I don't know.

Perhaps the biggest problem might turn out to be performance. Compiling Scala is already a heavy job, and handling types for a 10000x10000 matrix might just be beyond any possible compiler optimizations.

No comments:

Post a Comment