Monday, April 5, 2010

PInvoke native C/C++ libraries in F#

In this blog post, we explore on PInvoke in F#. Before going into PInvoke, an matrix multiplication example is also given to show how to build a DLL for PInvoke, which is important as in most of the cases we don't have pre-built DLLs.

PInvoke stands for Platform Invoke, meaning calling naive libraries. The two major reasons to use PInvoke is 1) native performance, native code is usually faster than .Net managed code, especially for numerical computing routines, and 2) existing stable(well tested) libraries can be used on the fly on .Net.

Of course, it also has some disadvantages, e.g. 1) extra instructions are used for each PInvoke call, this is not important as long as P/Invoke is not used too frequently, 2) calling native code means not safe anymore, e.g. no exception, no memory access checking.

There are several tutorials online for C#: here, here and here. However, these are more focused on existing windows dlls, their motivation for PInvoke is that some functions are not available on .Net, thus calling an existing native DLL, e.g. Win32 API. However, we are talking about performance in this series, PInvoke means that we rewrite the performance critical part in C/C++, or compile an existing library (maybe we also need to make a C wrapper of it in the meantime) and call it using PInvoke. So we should start with the native code part as we don’t have the native dll yet.

Matrix multiplication

Because C and C++ are slightly different, and C++ usually compiles C code well. So we’ll deal with C++ mode in VC++ compiler, i.e. all files have .cpp extensions.

First we write mattrixproduct.cpp:

__declspec(dllexport) int minus(int a, int b)
{
return a - b;
}

extern "C" __declspec(dllexport) int add(int a, int b)
{
return a + b;
}

extern "C" __declspec(dllexport)
void matmul(double *a, int an, int am,
double *b, int bn, int bm,
double *c)
{
int i, j, k;

for (i=0; i<an; i++)
for (j=0; j<bm; j++) {
double s = 0;
for (k=0; k<am; k++)
s += a[i*am + k] * b[k*bm + j];
c[i*bm+j] = s;
}
}
__declspec(dllexport) is the header we must add to each function to tell the compiler to generate this function in the DLL. extern "C" is to tell the compiler that this is a C code. We use command line:
cl.exe /LD mattrixproduct.cpp

to get mattrixproduct.dll. We can also create a new Visual C++ project and set the target to .dll.

Here’s the information for this dll: (use command line: link /DUMP /EXPORTS mattrixproduct.dll)

ordinal hint RVA name
1 0 00001000 ?minus@@YAHHH@Z = ?minus@@YAHHH@Z (int __cdecl minus(int,int))
2 1 00001010 add = _add
3 2 00001020 matmul = _matmul

We can see that minus function (without extern “C”) has a non-readable name, while the other two have correct names.

Now, we have the DLL, let’s move on to the PInvoke in F#. Here is the code:

open System
open System.Runtime.InteropServices
open Microsoft.FSharp.NativeInterop
open Microsoft.FSharp.Math

module Native =
[<System.Runtime.InteropServices.DllImport(@"mattrixproduct.dll",
EntryPoint="add")>]
extern int add(int a, int b);

// notice the wired name for minus
[<System.Runtime.InteropServices.DllImport(@"mattrixproduct.dll",
EntryPoint="?minus@@YAHHH@Z")>]
extern int minus(int a, int b);

[<System.Runtime.InteropServices.DllImport(@"mattrixproduct.dll",
EntryPoint="matmul")>]
extern void matmul(double *a, int an, int am, double *b, int bn,
int bm, double *c);


let a = Native.add(10,20)
let b = Native.minus(10,20)
let A = matrix [[1.0;2.0;]; [3.0;4.0;]]
let B = matrix [[3.0;2.0;1.0]; [1.0;1.0;1.0]]
let C = Matrix.zero 2 3
let Ap = PinnedArray2.of_matrix(A)
let Bp = PinnedArray2.of_matrix(B)
let Cp = PinnedArray2.of_matrix(C)
Native.matmul(Ap.Ptr, 2, 2, Bp.Ptr, 2, 3, Cp.Ptr)
Ap.Free()
Bp.Free()
Cp.Free()

printfn "a = %A\nb = %A\nC = %A" a b C
There are several things to mention:

1. For basic type, like int, double, as in the add and minus, they don’t need any special treatment.

2. The hard part is the array. This introduces concept called data marshaling. Because .Net arrays and native arrays are different, so they cannot be directly mapped. This is why we open namespaces for Interop. Another thing is that data marshaling does not support 2 dimensional arrays directly! This is why I used 1-D array in the C implementation. High performance code usually only use 1-D arrays, so this inconvenience does not quite matter to us. To marshal the array, we get a one dimensional pointer for a matrix using PinnedArray2.of_matrix, which is defined in F# PowerPack (in NativeArrayExtensions.fs), and then pass this pointer in the function call.

The performance

Let's now test the performance of the wrapped native matrix multiplication. First write some test on both small matrices and big matrices:

let mm A B =
let Ap = PinnedArray2.of_matrix(A)
let Bp = PinnedArray2.of_matrix(B)
let C = Matrix.zero (A.NumRows) (B.NumCols)
let Cp = PinnedArray2.of_matrix(C)
Native.matmul(Ap.Ptr, A.NumRows, A.NumCols, Bp.Ptr, B.NumRows, B.NumCols, Cp.Ptr)
Ap.Free()
Bp.Free()
Cp.Free()
C

let test() =
// use small matrices
let A = matrix [[1.0;2.0;]; [3.0;4.0;]]
let B = matrix [[3.0;2.0;1.0]; [1.0;1.0;1.0]]
tic()
mm A B |> ignore
toc("my native matrix *")
A * B |> ignore
toc("F# *")

// use big matrices
let r = new System.Random()
let A1 = Matrix.init (92*112) 280 (fun i j -> r.NextDouble())
let A' = A1.Transpose
tic()
let C1 = mm A' A1
toc("my native matrix *")
let C2 = A' * A1
toc("F# *")

if C1.Equals(C2) then true
else false
The timing functions tic() and toc() are defined in Part IV of the matrix and linear algebra series. Run it in F# interactive:

> test();;
my native matrix * 1.758400 ms
F# * 0.005100 ms
my native matrix * 7207.073800 ms
F# * 16524.670200 ms
val it : bool = true

First the return value is true, indicating that my C implementation is correct. In the small matrix case, the overhead of PInvoke is obvious. In the big matrix case, the performance boost using native code is also obvious.

Note 1: C++ class

So far so good. Because we didn’t touch C++ yet! Classes, templates are more complex.

To work with these. One method is to write C wrappers for C++ classes. This method is commonly used in practice.

We can also directly add __declspec(dllexport) to the class definition. However, creating objects via constructors is not supported anymore. One pattern is to write two static methods: one for creating an object and the other for freeing an object. Member functions are used by providing this pointer explicitly. If you know how to enable object style programming in C, this patter would be familiar to you. Anyway, in this style, the accessing ability is equivalent to C’s. As this pattern requires modifying existing class definitions(although only a little), it is less commonly used, thus I don’t give detailed example here.

Note 2: Pure C

The above example is given in C++ as in most of the cases we are dealing with C++ files, at least C files could be compiled in C++.

For Pure C files (with .c extensions), things are actually easier. Just removing extern "C" would be OK.

Using existing C/C++ code without PInvoke

We can use C++/CLI to compile a C/C++ library without modifying anything into a unsafe managed dll. I will write a separate blog for this later.

Remark

In this tutorial, we know the basics of P/Invoke in F#. We don’t touch two things:

1) Marshaling complex parameters, e.g. different format of strings, non-array pointers, C++ classes, etc.

2) MEMORY! MEMORY! MEMORY! The hard and dangerous part of P/Invoke is memory management. I don’t have enough experience of this part yet. Currently I just follow the style in Math-Provider.

No comments:

Post a Comment