Pascal Warrior's Journey

Thursday, February 5, 2015

Comparing FPC CGI vs FPC Embedded Web Server Application

Recently, I just tested koding.com and quite satisfied with its service. Despite free, you get a full blown OS with address that can be accessed publicly for hosting your web app. Suddenly, I think this is a good time for comparison game again.

The red corner will be FPC CGI runs though Apache and the blue one will be FPC Embedded Web Server. The source is hidden (it's my private project that eventually will be a commercial web), but the tested page is a simple templated page. The testing is done by loadimpact.com with following result:

Quick analysis shows that embedded web server serves slightly less requests than CGI on Apache. Scrolling down a little it also shows that Apache serving time is 2 - 2.5 seconds per request, while embedded web server is consistently serves in 3 seconds per request, with 3.2 seconds in the first request probably due to loading. Scrolling down again is what surprises me. The CGI has several failures while the embedded web server successfully delivers all the requests. Note that to be fair all resources (CSS) is served by the CGI as well, just like the embedded web server.

The conclusion is, embedded web server does its job well, despite it doesn't support Keep-Alive header (for persistent connection) yet. It serves a number of requests similar to Apache and it delivers all requests completely. It seems to be good enough for small to medium traffic web applications.

Saturday, December 20, 2014

Data access benchmark: direct query, dOPF object query, dOPF entity query

It's been a long time since my last post (over a year!). In this post I'd like to share some benchmark result of data access using Free Pascal. The setup is as below:

Machine: HP Pavilion 14-n038tx

OS: Manjaro Linux x86-64 0.8.11 + first update + flash plugin critical security update, kernel 3.17.4-1-MANJARO
CPU: Intel Core i5-4200u 1.6 GHz (Turbo Boost up to 2.6 GHz)
RAM: 4 GB (stock) + 4 GiB (additional) = 7.6 GiB
HDD: Hitachi HTS54757 Rev A50A 750 GB ~= 698 GiB

Development Environment:

FPC trunk x86_64-linux svn rev 29179
dOPF release 3.2.0 (commit c4966fba8bdf75fc637901edeebcd9e9d2a2521a)
mysql56dyn unit from mysql package
mysql56conn unit from fcl-db package

Benchmark Environment

MariaDB 10.0.15, default settings
Table 1: 2600 rows
Table 2: 2654 rows
Select query that joins table 1 and table 2 on an integer key

Actual table information is hidden, because they're real table used in my company's product, which is of course a proprietary product. A little info: table 2 has a foreign key to table 1, because it's meant as multilingual information for table 1 rows.

Benchmark flow is simple:

Connect to db
Query
Loop over the result while creating JSON array of object
Write it out (for output correctness comparison)

The result is, well, interesting as well as confusing. Why? Here's the result when I limit my select query result (in the SQL statement) to 2651 rows:

Looks normal, isn't it? A framework will have overhead so direct query will always be faster. But wait, what if I change the limit to 2652 or even without one:

Dafuq? How come dOPF ones get faster? At this stage, I thought the resulting txt files of dOPF might be broken, so I diff them with direct query one. Nope, they're all the same. I have no explanation at the moment because I haven't got the time to dig in what dOPF (or probably SQLdb as its connector) do to make this weird result.

dOPF surely has a big advantage over direct query because it's easy to change backend dbms simply by changing the value of TdSQLdbConnector.Driver. If the result is also faster than direct query, then there's no any other reason to use direct query for portable, cross dbms programming solution.

dOPF as a framework is very modular, it supports:

manual SQL
entities (result row is automatically converted as Pascal object, with automatic memory management)
SQL builder (automatic SQL generation from given conditions)
OPF itself (no SQL required in your code, you can save, load and access objects just like any other Pascal objects)

You're not forced to use one, feel free to mix any of them as you like.

If you want to try yourself, the whole package (source files + benchmark script) is available here. You should be able to run this on any supported platforms, provided you have bash (>3.0) & bc installed (because the benchmark script depends on it). Don't forget to edit dbconst.inc to fill it with your database name and query.

Feel free to add more persistence frameworks (tiOPF for instance) or libraries (ZEOS for instance) to make more data available.

Monday, October 14, 2013

Android Programming with Lazarus through Custom Drawn Interface

OK, so you've read that it's possible to write Android applications with Lazarus and Free Pascal. You go straight to the wiki, follow the bunch of steps there and FAIL! And then you start grumbling and doubting whether it's really possible or just a joke.

Listen up, dude. The Android support in FPC is still in development, and a pure arm-android target has just been added a couple of months ago. This target is available for those experienced enough with FPC (bootstrapping the compiler, set options for building, etc.) and not lazy to do the whole setup. Most problems come from those who don't read and do the steps thoroughly, possibly skipping important part. So if you're one of them, either change your behavior or wait until the support is available in the stable release.

I will try to explain step by step setting up FPC trunk with arm-android target support, followed by setting up Lazarus to support building Android application. Note that it's all done on Linux (Kubuntu 13.04) 32-bit, but it should work for any supported host platforms.

First thing first, latest stable FPC

FPC is a bootstrapping compiler, and it's guaranteed that latest stable version will be able to build trunk and next stable version. No guarantee for older version or between revisions of trunk, and things can be broken anytime on trunk. At this time of writing, latest stable FPC is of version 2.6.2. So grab that one if yours is not.

Next, Android NDK

For arm-android target, FPC makes use of external assembler and linker provided by Android NDK. Mine is still version r8e, but looking at the changelog version 9 should work just fine. Extract it anywhere you want, we will refer to this location as {ndk.dir}. To be sure, right under {ndk.dir} there should be README.TXT and RELEASE.TXT.

Let's identify the tools we need:

{ndk.dir}/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-as (assembler)
{ndk.dir}/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-ld (linker)

If you want, you can change the part after /toolchains/ in case you want to use androideabi 4.6 or 4.7. Look at the corresponding directory you have in your {ndk.dir}.

Open up your fpc.cfg file, by default it should contain the line:

-XP$FPCTARGET-

This line tells the compiler to prepend any external tools called with $FPCTARGET- (note the dash), so when the compiler wants to call "as", for arm-android target, it will call "arm-android-as" instead. As you can see, the name is then inconsistent with the NDK tools name. The solution is to create symbolic links for the tools with names expected by the compiler. For hosts that don't support symbolic links (e.g. Windows), you can create a small exe wrapper for the tools, or simply rename the tool. Put the symbolic links / wrappers somewhere in PATH (I put it in /usr/bin/).

Ensure you do it correctly by verifying the output of ls -l `which arm-android-<toolname>` (*nix only). It looks like this on my system (real directory replaced with {ndk.dir}):

$ ls -l `which arm-android-as`
lrwxrwxrwx 1 root root 120 Mar 8 2013 /usr/bin/arm-android-as -> {ndk.dir}/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-as
$ ls -l `which arm-android-ld`

lrwxrwxrwx 1 root root 120 Mar 8 2013 /usr/bin/arm-android-ld -> {ndk.dir}/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-ld

Try executing arm-android-as and arm-android-ld in terminal or command prompt to ensure it works.

Next, FPC trunk

Get FPC trunk either from svn (I won't teach how to use svn, go find tutorial somewhere) or Free Pascal's FTP. In case of svn, here's the address: http://svn.freepascal.org/svn/fpc/trunk

Build FPC for arm-android target

Using your terminal, go to FPC trunk directory and execute the following:

make crossall OS_TARGET=android CPU_TARGET=arm CROSSOPT='-Cp<ARM Arch> -Cf<ARM VFP>'

<ARM Arch> defines the ARM architecture you want to compile for, my device is ARMv6, so I use -CpARMv6.
<ARM VFP> defines the Vector Floating Point unit you want to use for floating point calculation, for at least ARMv6, VFPv2 and VFPv3 are available. The default is to use soft-float, which is very slow as the calculation is performed by software. Since I seldom use floating point, soft-float is fine for me, so I don't pass any -Cf option.

If everything goes well, it's time to install. Execute the following (still in the same FPC trunk folder):

make crossinstall OS_TARGET=android CPU_TARGET=arm INSTALL_PREFIX=<directory of your choice>

Feel free to choose any directory you want, but ensure it fulfills the standard requirement (no space in the file path). I suggest installing to the same host FPC directory so you can easily share fpc.cfg. FPC directory structure is made such that it's possible to install cross compiler (and the respective units) in the same tree as the host compiler. The fpc driver can then be used to query which ppc[ross]XXX to call.

If everything goes well, test the compiler. Execute the following:

fpc -Parm -Tandroid

It should output something like:

Free Pascal Compiler version 2.7.1 [2013/09/21] for arm
Copyright (c) 1993-2013 by Florian Klaempfl and others
Fatal: No source file name in command line
Fatal: Compilation aborted

Error: /usr/bin/ppcrossarm returned an error exitcode

Next, Android SDK

Grab Android SDK if you haven't, r22 should be fine. We just need the SDK tools, so no need to waste time and bandwidth downloading the ADT bundle. I will refer to the SDK installation directory as {sdk.dir}. To be sure, right under {ndk.dir} there should be SDK README.TXT.

Test AndroidLCL example, yay!

Go to your Lazarus installation directory (I will refer it as {lazarus.dir} from now on) and open examples/androidlcl/androidlcltest.lpi. Now open Project->Project Options, ensure in Target Platform OS is set to android and CPU is set to arm (or just pick the respective build mode). If upon FPC trunk building you use -Cf option, specify the same option in Other. You might need to also set it in Tools->Configure Build Lazarus dialog. Now press the Run->Build menu. If you get:

Trying to use a unit which was compiled with a different FPU mode

Then you don't put the -Cf option correctly. Remember you will need to put it for both your project (through Project Options dialog) and LCL (and its dependencies, through Configure Build Lazarus).

If everything goes well, you will get android/libs/armeabi/liblclapp.so in the project folder.

Get Ant

Android SDK uses ant build tool for building apk, so you'll need to install it as well.

Build the APK

Go to android folder under androidlcl project folder, and open the build.xml. Inside, you will see 2 loadproperties and 1 property tags. These points to files you will need to edit to match your SDK installation. Mine is below:

local.properties contains the sdk.dir which you should fill with {sdk.dir} (actual value where you install it, of course).

default.properties contains the target android API level. The complete list can be seen here. Note that you have to install the respective SDK platform through Android SDK manager.

ant.properties contains key.store and key.alias which is required for release version of your apk. For debug version, it's not required and the apk builder utility will assign a debug key on its own.

If all set, execute:

ant debug

in android folder. The resulting .apk will be in android/bin folder named LCLExample-debug.apk. Install that and enjoy.

From this point forward, you can make use of androidlcl structure as a template. General Java package structure and Android build system knowledge will be required to change the package name.

l
It

Friday, January 18, 2013

Top Down Agile Program Development

This time I would like to post something out of coding world, but rather something about software engineering.

Traditional agile program development commonly uses bottom up approach, where one builds the lowest level functionality first then goes up further. API changes is not an uncommon things to happen during agile development, which could lead to code rewriting problems at higher level(s). For example, consider the following function:

function Init(var R: TSomeRecord): Boolean;

which somehow in the middle of development gets changed to:

procedure Init(var R: TSomeRecord);

with purpose of raising an exception instead of returning boolean value to indicate successfulness of the operation.

If this change happens in far future, then all codes using this routine must be rewritten. The problem could arise because we usually don't think how we are going to use an API, or a combination/series of them, when we design. We only think of the API as itself, standalone, away from how it will be used. Therefore, to cover the problem, we could use alternative development method, that is the top down approach.

This method requires you to build program from its highest level first (usually user interface), then goes down further. The idea of this method is largely based on test driven development, when you write test cases first prior to implementing the test unit. The difference lies in the fact that test driven development is still bottom up in the large, yet it's top down for the unit to be implemented. This method ASSUMES the lower level API already exists, regardless of its existence. So, when you code you're thinking of using the API, possibly with other API, together to do some tasks. Because of this, the changes in the future would be minimal or even none. When something goes wrong when you test, the chance is that source of error is at lower level, which means less code to change (one place instead of several).

However, this method is not magical (in fact, none is), as with other methods it has disadvantages as well. You can't see immediate result of your program until the lowest level is implemented, though you can simply put "not implemented yet" output for some functionalities. Functions with deep dependencies (e.g.: requires function X which in turn requires function Y which in turn requires function Z and so on) will be the least ones visible. This method also doesn't play well with incremental approach, which is designed to be bottom up.

Final words: Choose your weapon wisely ;)

Monday, December 31, 2012

Brook Framework, a new web application framework for Free Pascal

Recently, a new post in Lazarus forum surprised me. Somebody, OK, Silvio, announced his new web application framework for Free Pascal with over 15000 lines of code written, having integration with the great Greyhound data access framework, named Brook Framework. OK, so what's interesting from this framework? Keep reading.

Architecture: routes, actions and methods

Even though it's built on top of fcl-web, it doesn't make use of fcl-web architecture. Brook uses the concept of routes as commonly found in frameworks for other programming languages. For each request path (that /something/and/probably/longer thing) you want to support, you register a class (TBrookAction descendant) to handle the request path. The class itself implements method for each HTTP method the class will support (commonly GET and POST, but other methods like HEAD and PUT are also supported). This is, IMHO, a more structured yet flexible way than module-action architecture as used by fcl-web. FYI, the module-action architecture has a hardcoded request path in the form of /module/action or ?module=module-name&action=action-name. This makes the request path difficult to be made search engine friendly because you have to pass additional parameters via param1=value1&param2=value2&param3=value3 and so on. OTOH, Brook allows you to register path in almost free form (taken from TBrookAction.Register documentation):

* - Allows any path. Example:

TMyAction.Register('*');

Can be called as http://localhost/cgi-bin/cgi1, http://localhost/cgi-bin/cgi1/foo/ etc;

/ - Adds an slash to the end of the URL if does not exist. Example:

TMyAction.Register('/foo/');

Can be called as http://localhost/cgi-bin/cgi1/foo or http://localhost/cgi-bin/cgi1/foo/. When called as http://localhost/cgi-bin/cgi1/foo, it will automatically redirected to http://localhost/cgi-bin/cgi1/foo/. If the pathinfo is different from /foo a 404 page is returned;

: - Creates variables URL. Their values can be read from the property Values. Example:

TMyAction.Register('/foo/:myvar');

Creates the "myvar", that can be read from the property Values, e.g:

Write(Values['myvar'].AsString);

NOTE: Two actions can't be registered with the same pattern except when they are called by means of different HTTP methods.

Data access integration

Actions could optionally have direct data access. To do that, the action must descend from TBrookDBAction instead of TBrookAction. The action could then read/write data from/to database (or something else) while serving a request. Actually, my experience in a good MVC structure teaches me not to tie the request handler and the data persistence layer. But since this is optional in Brook, I can live with it. Besides, it can be a good thing for quick development.

Enough talk...

So let's get started, we'll be using Lazarus for easiness.

Download brook here
Extract it somewhere, open packages/brookex.lpk with Lazarus and install, this will register entries for easily creating new brook projects in Project->New Project menu
Next, pick up one of the available data access backends. Since I'm currently playing a lot with Greyhound, I pick up brookgreyhoundrt.lpk. You may pick something else if you like such as ZEOS or ADS backend.
After Lazarus restart, pick Project->New Project menu, you'll see 2 new entries named "Simple CGI Application" and "Full CGI / FastCGI Application". Pick up the latter as it provides more features
A dialog with form will appear, the fields are intuitive so I guess I don't have to explain. Just fill the form and press Next
Another dialog will appear. Here you can set the actions you want and their respective path, optionally setting which one will be the default (if no specific path given the request). There's a button "Patterns help" that redirects to TBrookAction.Register documentation exactly like in the previous section. When you're done, press Next
Congratulations! Simply skip (doh)
A project will be created with one unit per action you register, and a bunch of predefined files: 404.html, 500.html, Brokers.pas and the .lpr. The most important file is Brokers.pas. This unit acts as a central configuration settings. So, whatever configuration you need, set it here. You'll see that it already registers the 404 and 500 page. For FastCGI application, you can set port here by using:
```
TBrookFCGIApplication(BrookApp.Instance).Port := {Your port number here};
```
Don't forget to add BrookApplication and BrookFCLFCGIBroker to the uses clause
Now open up an action unit and you'll see the Get method is already overriden with a default content. You can edit that later to produce html or whatever output you want. For now, we just want to test that it works
Build the project and run (in case of FastCGI) or copy to your webserver's cgi directory (in case of CGI)
Now go to your browser and type the url to your application, I personally use FastCGI with Nginx on port 8080, and if my action is /index/, I'll type in my browser: http://localhost:8080/index/
If you see your output, then you've managed to make it work. Feel free to improve

Pascal for web? Why not? ;)

Sunday, June 24, 2012

Encryption / Decryption and Asynchronous Socket Programming

Back again, it's been a long time since my last post due to lack of free time and laziness :p

Recently, I've got some posts in Lazarus / Free Pascal forums asking for some incompletely documented features, namely the (en|de)cryption unit (blowfish) and asynchronous socket (from fcl-net). Free Pascal is shipped with huge powerful libraries which are mostly, unfortunately, undocumented. Through this post, I hope I can help document it a bit through examples (I'm still lazy for real documentation commit :p). Let's start, shall we?

Blowfish, the cryptography unit

current documentation: http://www.freepascal.org/docs-html/fcl/blowfish/index.html

This unit implements encryption / decryption classes with keys, and is able to apply it on any TStream descendant. For easiness, we'll use TStringStream for the example. On to the code:

{$mode objfpc}{$H+}

uses
  Classes,
  BlowFish;

var
  en: TBlowFishEncryptStream;
  de: TBlowFishDeCryptStream;
  s1,s2: TStringStream;
  key,value,temp: String;
begin
  { 1 }
  key := 'testkey';
  value := 'this is a string';
  { 2 }
  s1 := TStringStream.Create('');
  en := TBlowFishEncryptStream.Create(key,s1);
  { 3 }
  en.WriteAnsiString(value);
  en.Free;
  WriteLn('encrypted: ' + s1.DataString);
  { 4 }
  s2 := TStringStream.Create(s1.DataString);
  s1.Free;
  { 5 }
  de := TBlowFishDeCryptStream.Create(key,s2);
  { 6 }
  temp := de.ReadAnsiString;
  WriteLn('decrypted: ' + temp);
  
  de.Free;
  s2.Free;
end.

Explanations per curly brackets:

First, we prepare the key (key) and data to be encrypted (value)
Next, we create a TBlowFishEncryptStream instance, providing the key and stream to write the encrypted data into (s1)
Now we write the unencrypted data. For testing, we output the encrypted data. You'll see that it would be a bunch of weird bytes
Next, we will try to decrypt the data back to its original form. First, we create another TStringStream, this time we give the encrypted data as the stream data
Then we create a TBlowFishDeCryptStream instance, providing the key that was used to encrypt the data and the stream from which the encrypted data would be read
Next, read the data and output it. You'll see it's the original 'this is a string'

So easy, huh? On to the next one.

fcl-net, the undocumented treasure

current documentation: err.. none

This package offers a lot of networking things: asychronous socket, dns resolver, HTTP servlet, socket streams, etc. We would concentrate only on the asychronous socket (and implicitly, socket streams). At first glance, it looks uneasy to use. I have to dig in the sources to see how it works and guess how to use it. We will implement a server with multiple client support. To stay focus, the client will only connect, send a 'hello' message, then disconnects. The server would display notification for an incoming connection, the message sent by the client, and when the client disconnects. The server can only be terminated with Ctrl+C. Jump in to the server code:

{$mode objfpc}{$H+}

uses
  { 1 }
  {$ifdef unix}cthreads,{$endif}
  Classes,SysUtils,Sockets,fpAsync,fpSock;

type
  { 2 }
  TClientHandlerThread = class(TThread)
  private
    FClientStream: TSocketStream;
  public
    constructor Create(AClientStream: TSocketStream);
    procedure Execute; override;
  end;
  { 3 }
  TTestServer = class(TTCPServer)
  private
    procedure TestOnConnect(Sender: TConnectionBasedSocket; AStream: TSocketStream);
  public
    constructor Create(AOwner: TComponent); override;
  end;
{ 4 }
function AddrToString(Addr: TSockAddr): String;
begin
  Result := NetAddrToStr(Addr.sin_addr) + ':' + IntToStr(Addr.sin_port);
end;

{ TClientHandlerThread }
{ 5 }
constructor TClientHandlerThread.Create(AClientStream: TSocketStream);
begin
  inherited Create(false);
  FreeOnTerminate := true;
  FClientStream := AClientStream;
end;
{ 6 }
procedure TClientHandlerThread.Execute;
var
  Msg : String;
  Done: Boolean;
begin
  Done := false;
  repeat
    try
      Msg := FClientStream.ReadAnsiString;
      WriteLn(AddrToString(FClientStream.PeerAddress) + ': ' + Msg);
    except
      on e: EStreamError do begin
        Done := true;
      end;
    end;
  until Done;
  WriteLn(AddrToString(FClientStream.PeerAddress) + ' disconnected');
end;

{ TTestServer }
{ 7 }
procedure TTestServer.TestOnConnect(Sender: TConnectionBasedSocket; AStream: TSocketStream);
begin
  WriteLn('Incoming connection from ' + AddrToString(AStream.PeerAddress));
  TClientHandlerThread.Create(AStream);
end;
{ 8 }
constructor TTestServer.Create(AOwner: TComponent);
begin
  inherited;
  OnConnect := @TestOnConnect;
end;

{ main }
{ 9 }
var
  ServerEventLoop: TEventLoop;
begin
  ServerEventLoop := TEventLoop.Create;
  with TTestServer.Create(nil) do begin
    EventLoop := ServerEventLoop;
    Port := 12000;
    WriteLn('Serving...');
    Active := true;
    EventLoop.Run;
  end;
  ServerEventLoop.Free;
end.

It's a bit long, so take a breath:

We will need each client to be handled in its own thread, so we need cthreads unit for *nix OSes
The client handler thread, it would work on the given client socket stream
The server, we will create an override constructor to hook when a client connects
Helper routine to get ip:port as string
Overriden constructor for the thread, will call the inherited constructor (with false argument to indicate the thread shouldn't be in suspended state), setting the object to free itself whenever the execution has finished, and assign the socket stream to a private attribute
The core of the thread. Will try to read what the client sends and output it in the server log until the client disconnects
OnConnect handler, prints notification message whenever a client connects and create a handler thread for it
Overriden constructor for the server, assigns the OnConnect handler
Main program. Create event loop object for the server, creates the server, assigning event loop and port to listen, and ready to accept connections...

Phew, go on to the client. This time it's simpler:

{$mode objfpc}{$H+}

uses
  Classes,SysUtils,Sockets,fpAsync,fpSock;

var
  ClientEventLoop: TEventLoop;
begin
  ClientEventLoop := TEventLoop.Create;
  with TTCPClient.Create(nil) do begin
    EventLoop := ClientEventLoop;
    Host := '127.0.0.1';
    Port := 12000;
    Active := true;
    EventLoop.Run;
    Stream.WriteAnsiString('Hello');
    Active := false;
  end;
  ClientEventLoop.Free;
end.

Not numbered since it's only a single main code block. First it creates event loop for the client, create the client, assigning event loop, host and port of the server to connect, activate the connection, send a 'Hello' message to the server and disconnects. Clear enough, eh?

OK, that's all for now. Got any questions? Just ask :)

Friday, January 14, 2011

Big mistake: wrong process model

This is a lesson learn for application developer. Remember my last post about compiler with LLVM backend? It doesn't end happily. I didn't manage to connect all the components, even some of them are either incomplete or not implemented at all. I pass the trial session badly. Even though I managed to graduate, but the ending is not good.

So, after the trial session, here comes the revision session. I crazily decided to rewrite the compiler FROM SCRATCH. But this time, I use feature based iterative incremental process model (previously component based iterative incremental). First, only single expression is allowed. Parser OK, AST OK, semantic checking OK. Next, assignment statement. Parser OK, AST OK, semantic checking OK. Next, compound assignment statement, return statement, method, class and so on. Ended with code generator. Guess what? Something that I didn't manage to finish in 3 months is finished in 1 week!

If only I realize this faster... maybe I'll get A (well, I finally get A- which is not too bad).

Friday, December 24, 2010

LLVM IR Builder in Object Pascal

I'm about to graduate from my university (January 2011 if there's no more problems), and as a final assigment (though it's optional in my faculty, but it would be a great experience and honor to have one) I choose to implement a compiler. One of our labs, Formal Method in Software Engineering (or simple FMSE), is the lab where my supervisor gets involved. Therefore, the compiler I'm writing would probably based on a project they're (or have been) working on. Yep, I was given LinguSQL language to implement.

Previously, the language had a compiler, that generates Java code (bad choice IMO) which is then compiled by a Java compiler. The problem with this approach, as in normal Java application, is the HUGE runtime environment that must be distributed if someone wants to use the application. Furthermore, since Java uses interpreted bytecode (don't count GCJ, I even believe less than 10 persons in my campus know that thing exists), the performance is at maximum only 1/3 of native binaries (someone in OSDEV forum ever said). Last but not least, the execution isn't trivial. One must type "java xxx" in order to execute the program.

As a native application (deve)lov(|p)er (read: developer and lover), I decided to implement a compiler for this language that generates native binaries. However, I don't have enough experience in generating native binaries (or at least native assembly). So, remembering an option, I asked my supervisor what if the compiler generates LLVM assembly language (also called LLVM Intermediate Representation or IR)? Do you what he said? "What is LLVM?" (doh). OK, so after bla bla bla, he accepted my choice. The advantages of generating LLVM assembly instead of native assembly are:

A LOT of optimizations for free
It can be compiled to native assembly for MANY platforms
Easy integration with existing libraries

Despite those advantages, there are also disadvantages:

It uses SSA format
Written in C++, more specifically, it officially only supports G++! (there are some hacks to use MSVC but... still it's not official)
Most important one: it doesn't have Object Pascal frontend

The SSA format is not actually a disadvantage, but it's just a little harder to generate code for. But the last one is really a show stopper... or a challenge depending on how you look at it :)

So the work begins. I continue the previous research, the previous Java based compiler that generates Java code uses JavaCC, followed by JavaCUP, and finally UUAG. The first two are parser generators, with some differences, mainly JavaCC generates LL parsers, while JavaCUP generates LALR one. Both are BAD. I always find parser generators are bad since we have no idea whether it's correct or not, and the grammar can't be deduced from the code (except for recursive descent parser generators like Coco/R). The last one is a Haskell based product, which actually runs like recursive descent parser, only in functional languages they're called parser combinators. The last one is quite good, with one important problem: when a parsing error happens, the parser tries to find all possible corrections, therefore slows down the parsing and eats resources. This behavior can't be customized easily and that's what makes me writing the whole thing from scratch using classic approach: a true recursive descent parser. This is the best parser I've ever learned, since it's the most flexible one (there are tons of way to handle parsing error and that's totally up to you, with many methods possibly combined or used specifically for certain productions) and still shows the grammar in its code.

Come to the code generation part, the problem I stated above must be covered. I create my own LLVM IR Builder to generate LLVM assembly language. Due to the SSA structure, it's a bit difficult, but I managed to create it quite successful with beautiful modular architecture. It can now generate modules consisting of functions and global variables, where each functions can have local variables, labels (for branch and loop), arithmetic instructions, memory instructions, etc. It's not yet complete, but already capable of generating simple programs. I'll put it in my bitbucket account when I think it's quite production ready.

Wants some code? OK:

program llvmirbuildertest;

{$mode objfpc}{$H+}

uses
  llvmirbuilder;

var
  x,y,l,s,a,b: TLLVMSymbol;
  c: TLLVMConstant;
  cl: TLLVMCallInstruction;
begin
  x := TLLVMSymbol.Create('x',lltInteger,true);
  y := TLLVMSymbol.Create('y',lltInteger);
  c := TLLVMConstant.Create('255',lltInteger);
  l := TLLVMLoadInstruction.Create('tmp',lltInteger,x);
  s := TLLVMStoreInstruction.Create('tmp',lltInteger,y,x);
  cl:= TLLVMCallInstruction.Create('func',lltInteger);
  a := TLLVMAddInstruction.Create('a',lltInteger,x,c);
  b := TLLVMSubInstruction.Create('b',lltInteger,c,y);
  WriteLn(l.GenerateCode);
  WriteLn(s.GenerateCode);
  WriteLn(cl.GenerateCode);
  WriteLn(a.GenerateCode);
  WriteLn(b.GenerateCode);
  a.Free;
  b.Free;
  cl.Free;
  s.Free;
  l.Free;
  c.Free;
  y.Free;
  x.Free;
end.

and the generated LLVM IR:

%tmp = load i32 * @x
store i32 %y, i32 * @x
call i32 @func()
%a = add i32 @x, 255
%b = sub i32 255, %y

Note that it's a partial code, so compiling this with llvm-as would absolutely produce an error.

Saturday, December 18, 2010

Using Google Maps service in a Pascal based Web Application

My final team assignment of Information Retrieval (IR) class allows us to create any kind of IR system, and we decided to create a food ingredients and restaurant search system. It displays ingredients for a chosen food and then displays 10 top restaurants that provide the food. For the first one, it's simply a database approach. But the latter is a Geographic IR (GIR). Instead of creating it from scratch (which could take probably a whole life), we decided to use google maps service.

Due to the requirement that the system must be accessible from web, it's splitted into two parts: client and server. The client part is coded by my friend and the server part is mine. The client part is coded with ExtJS and we use JSON to communicate between the client and server. Since the server is a Service Oriented Application (SOA), it's no problem what language it's written in. And since I'm a Pascal geek (you can call me maniac if you want), I choose to write it in Pascal.

To create the server, I use fpWeb components available from standard Lazarus distribution. The server itself connects to Google Maps service to look for the coordinates. Hmm... how can the server do this? Luckily, there are two powerful networking components and libraries for Pascal, namely lNet and Synapse. I used to lNet due to its component based approach, however after struggling a bit, lNet seems to leak some features I need (esp. proxy support because my campus' internet access is protected by that). Then another Pascalian told me that it's very easy to do with Synapse. And so it goes:

with THTTPSend.Create do
  try
    ProxyHost := ConfigFile.ReadString('proxy','host','');
    ProxyPort := ConfigFile.ReadString('proxy','port','');
      URL := Format('http://maps.google.com/maps/geo?q=%s&output=json&oe=utf8&sensor=true' +
        '&key=%s',[QueryStr,GoogleAPIKey]);
      if not HTTPMethod('GET',URL) then begin
        AResponse.Content := Format('Error %d: %s',[ResultCode,ResultString]);
      end else begin
        AResponse.Contents.LoadFromStream(Document);
      end;
      AppendToLog(Headers.Text + AResponse.Content);
  finally
      Free;
    end;
  Handled := true;

The server uses TIniFile (as ConfigFile in above code) to store and retrieve proxy information so it can be adjusted without recompilation. QueryStr is the string that we want Google Maps to look for and GoogleAPIKey is the... err.. API key to access Google Maps. FYI, I'm using version 2 of the API which still requires a key, version 3 doesn't need it anymore.

It doesn't yet show anything useful, only the original JSON from Google Maps. Later, this JSON must be processed and merged with database result, and then sent to the client to be processed further. But after all... it's done in Pascal :)