Friday, January 18, 2013

Top Down Agile Program Development

This time I would like to post something out of coding world, but rather something about software engineering.

Traditional agile program development commonly uses bottom up approach, where one builds the lowest level functionality first then goes up further. API changes is not an uncommon things to happen during agile development, which could lead to code rewriting problems at higher level(s). For example, consider the following function:

function Init(var R: TSomeRecord): Boolean;

which somehow in the middle of development gets changed to:

procedure Init(var R: TSomeRecord);

with purpose of raising an exception instead of returning boolean value to indicate successfulness of the operation.

If this change happens in far future, then all codes using this routine must be rewritten. The problem could arise because we usually don't think how we are going to use an API, or a combination/series of them, when we design. We only think of the API as itself, standalone, away from how it will be used. Therefore, to cover the problem, we could use alternative development method, that is the top down approach.

This method requires you to build program from its highest level first (usually user interface), then goes down further. The idea of this method is largely based on test driven development, when you write test cases first prior to implementing the test unit. The difference lies in the fact that test driven development is still bottom up in the large, yet it's top down for the unit to be implemented. This method ASSUMES the lower level API already exists, regardless of its existence. So, when you code you're thinking of using the API, possibly with other API, together to do some tasks. Because of this, the changes in the future would be minimal or even none. When something goes wrong when you test, the chance is that source of error is at lower level, which means less code to change (one place instead of several).

However, this method is not magical (in fact, none is), as with other methods it has disadvantages as well. You can't see immediate result of your program until the lowest level is implemented, though you can simply put "not implemented yet" output for some functionalities. Functions with deep dependencies (e.g.: requires function X which in turn requires function Y which in turn requires function Z and so on) will be the least ones visible. This method also doesn't play well with incremental approach, which is designed to be bottom up.

Final words: Choose your weapon wisely ;)

Monday, December 31, 2012

Brook Framework, a new web application framework for Free Pascal

Recently, a new post in Lazarus forum surprised me. Somebody, OK, Silvio, announced his new web application framework for Free Pascal with over 15000 lines of code written, having integration with the great Greyhound data access framework, named Brook Framework. OK, so what's interesting from this framework? Keep reading.

Architecture: routes, actions and methods

Even though it's built on top of fcl-web, it doesn't make use of fcl-web architecture. Brook uses the concept of routes as commonly found in frameworks for other programming languages. For each request path (that /something/and/probably/longer thing) you want to support, you register a class (TBrookAction descendant) to handle the request path. The class itself implements method for each HTTP method the class will support (commonly GET and POST, but other methods like HEAD and PUT are also supported). This is, IMHO, a more structured yet flexible way than module-action architecture as used by fcl-web. FYI, the module-action architecture has a hardcoded request path in the form of /module/action or ?module=module-name&action=action-name. This makes the request path difficult to be made search engine friendly because you have to pass additional parameters via param1=value1&param2=value2&param3=value3 and so on. OTOH, Brook allows you to register path in almost free form (taken from TBrookAction.Register documentation):

* - Allows any path. Example:

TMyAction.Register('*');

Can be called as http://localhost/cgi-bin/cgi1, http://localhost/cgi-bin/cgi1/foo/ etc;

/ - Adds an slash to the end of the URL if does not exist. Example:

TMyAction.Register('/foo/');

Can be called as http://localhost/cgi-bin/cgi1/foo or http://localhost/cgi-bin/cgi1/foo/. When called as http://localhost/cgi-bin/cgi1/foo, it will automatically redirected to http://localhost/cgi-bin/cgi1/foo/. If the pathinfo is different from /foo a 404 page is returned;

: - Creates variables URL. Their values can be read from the property Values. Example:

TMyAction.Register('/foo/:myvar');

Creates the "myvar", that can be read from the property Values, e.g:

Write(Values['myvar'].AsString);

NOTE: Two actions can't be registered with the same pattern except when they are called by means of different HTTP methods.

Data access integration

Actions could optionally have direct data access. To do that, the action must descend from TBrookDBAction instead of TBrookAction. The action could then read/write data from/to database (or something else) while serving a request. Actually, my experience in a good MVC structure teaches me not to tie the request handler and the data persistence layer. But since this is optional in Brook, I can live with it. Besides, it can be a good thing for quick development.

Enough talk...

So let's get started, we'll be using Lazarus for easiness.

  1. Download brook here
  2. Extract it somewhere, open packages/brookex.lpk with Lazarus and install, this will register entries for easily creating new brook projects in Project->New Project menu
  3. Next, pick up one of the available data access backends. Since I'm currently playing a lot with Greyhound, I pick up brookgreyhoundrt.lpk. You may pick something else if you like such as ZEOS or ADS backend.
  4. After Lazarus restart, pick Project->New Project menu, you'll see 2 new entries named "Simple CGI Application" and "Full CGI / FastCGI Application". Pick up the latter as it provides more features
  5. A dialog with form will appear, the fields are intuitive so I guess I don't have to explain. Just fill the form and press Next
  6. Another dialog will appear. Here you can set the actions you want and their respective path, optionally setting which one will be the default (if no specific path given the request). There's a button "Patterns help" that redirects to TBrookAction.Register documentation exactly like in the previous section. When you're done, press Next
  7. Congratulations! Simply skip (doh)
  8. A project will be created with one unit per action you register, and a bunch of predefined files: 404.html, 500.html, Brokers.pas and the .lpr. The most important file is Brokers.pas. This unit acts as a central configuration settings. So, whatever configuration you need, set it here. You'll see that it already registers the 404 and 500 page. For FastCGI application, you can set port here by using:
    TBrookFCGIApplication(BrookApp.Instance).Port := {Your port number here};
    
    Don't forget to add BrookApplication and BrookFCLFCGIBroker to the uses clause
  9. Now open up an action unit and you'll see the Get method is already overriden with a default content. You can edit that later to produce html or whatever output you want. For now, we just want to test that it works
  10. Build the project and run (in case of FastCGI) or copy to your webserver's cgi directory (in case of CGI)
  11. Now go to your browser and type the url to your application, I personally use FastCGI with Nginx on port 8080, and if my action is /index/, I'll type in my browser: http://localhost:8080/index/
  12. If you see your output, then you've managed to make it work. Feel free to improve
  13. Pascal for web? Why not? ;)

Sunday, June 24, 2012

Encryption / Decryption and Asynchronous Socket Programming


Back again, it's been a long time since my last post due to lack of free time and laziness :p

Recently, I've got some posts in Lazarus / Free Pascal forums asking for some incompletely documented features, namely the (en|de)cryption unit (blowfish) and asynchronous socket (from fcl-net). Free Pascal is shipped with huge powerful libraries which are mostly, unfortunately, undocumented. Through this post, I hope I can help document it a bit through examples (I'm still lazy for real documentation commit :p). Let's start, shall we?

Blowfish, the cryptography unit

current documentation: http://www.freepascal.org/docs-html/fcl/blowfish/index.html

This unit implements encryption / decryption classes with keys, and is able to apply it on any TStream descendant. For easiness, we'll use TStringStream for the example. On to the code:

{$mode objfpc}{$H+}

uses
  Classes,
  BlowFish;

var
  en: TBlowFishEncryptStream;
  de: TBlowFishDeCryptStream;
  s1,s2: TStringStream;
  key,value,temp: String;
begin
  { 1 }
  key := 'testkey';
  value := 'this is a string';
  { 2 }
  s1 := TStringStream.Create('');
  en := TBlowFishEncryptStream.Create(key,s1);
  { 3 }
  en.WriteAnsiString(value);
  en.Free;
  WriteLn('encrypted: ' + s1.DataString);
  { 4 }
  s2 := TStringStream.Create(s1.DataString);
  s1.Free;
  { 5 }
  de := TBlowFishDeCryptStream.Create(key,s2);
  { 6 }
  temp := de.ReadAnsiString;
  WriteLn('decrypted: ' + temp);
  
  de.Free;
  s2.Free;
end.

Explanations per curly brackets:

  1. First, we prepare the key (key) and data to be encrypted (value)
  2. Next, we create a TBlowFishEncryptStream instance, providing the key and stream to write the encrypted data into (s1)
  3. Now we write the unencrypted data. For testing, we output the encrypted data. You'll see that it would be a bunch of weird bytes
  4. Next, we will try to decrypt the data back to its original form. First, we create another TStringStream, this time we give the encrypted data as the stream data
  5. Then we create a TBlowFishDeCryptStream instance, providing the key that was used to encrypt the data and the stream from which the encrypted data would be read
  6. Next, read the data and output it. You'll see it's the original 'this is a string'
So easy, huh? On to the next one.

fcl-net, the undocumented treasure

current documentation: err.. none

This package offers a lot of networking things: asychronous socket, dns resolver, HTTP servlet, socket streams, etc. We would concentrate only on the asychronous socket (and implicitly, socket streams). At first glance, it looks uneasy to use. I have to dig in the sources to see how it works and guess how to use it. We will implement a server with multiple client support. To stay focus, the client will only connect, send a 'hello' message, then disconnects. The server would display notification for an incoming connection, the message sent by the client, and when the client disconnects. The server can only be terminated with Ctrl+C. Jump in to the server code:

{$mode objfpc}{$H+}

uses
  { 1 }
  {$ifdef unix}cthreads,{$endif}
  Classes,SysUtils,Sockets,fpAsync,fpSock;

type
  { 2 }
  TClientHandlerThread = class(TThread)
  private
    FClientStream: TSocketStream;
  public
    constructor Create(AClientStream: TSocketStream);
    procedure Execute; override;
  end;
  { 3 }
  TTestServer = class(TTCPServer)
  private
    procedure TestOnConnect(Sender: TConnectionBasedSocket; AStream: TSocketStream);
  public
    constructor Create(AOwner: TComponent); override;
  end;
{ 4 }
function AddrToString(Addr: TSockAddr): String;
begin
  Result := NetAddrToStr(Addr.sin_addr) + ':' + IntToStr(Addr.sin_port);
end;

{ TClientHandlerThread }
{ 5 }
constructor TClientHandlerThread.Create(AClientStream: TSocketStream);
begin
  inherited Create(false);
  FreeOnTerminate := true;
  FClientStream := AClientStream;
end;
{ 6 }
procedure TClientHandlerThread.Execute;
var
  Msg : String;
  Done: Boolean;
begin
  Done := false;
  repeat
    try
      Msg := FClientStream.ReadAnsiString;
      WriteLn(AddrToString(FClientStream.PeerAddress) + ': ' + Msg);
    except
      on e: EStreamError do begin
        Done := true;
      end;
    end;
  until Done;
  WriteLn(AddrToString(FClientStream.PeerAddress) + ' disconnected');
end;

{ TTestServer }
{ 7 }
procedure TTestServer.TestOnConnect(Sender: TConnectionBasedSocket; AStream: TSocketStream);
begin
  WriteLn('Incoming connection from ' + AddrToString(AStream.PeerAddress));
  TClientHandlerThread.Create(AStream);
end;
{ 8 }
constructor TTestServer.Create(AOwner: TComponent);
begin
  inherited;
  OnConnect := @TestOnConnect;
end;

{ main }
{ 9 }
var
  ServerEventLoop: TEventLoop;
begin
  ServerEventLoop := TEventLoop.Create;
  with TTestServer.Create(nil) do begin
    EventLoop := ServerEventLoop;
    Port := 12000;
    WriteLn('Serving...');
    Active := true;
    EventLoop.Run;
  end;
  ServerEventLoop.Free;
end.

It's a bit long, so take a breath:

  1. We will need each client to be handled in its own thread, so we need cthreads unit for *nix OSes
  2. The client handler thread, it would work on the given client socket stream
  3. The server, we will create an override constructor to hook when a client connects
  4. Helper routine to get ip:port as string
  5. Overriden constructor for the thread, will call the inherited constructor (with false argument to indicate the thread shouldn't be in suspended state), setting the object to free itself whenever the execution has finished, and assign the socket stream to a private attribute
  6. The core of the thread. Will try to read what the client sends and output it in the server log until the client disconnects
  7. OnConnect handler, prints notification message whenever a client connects and create a handler thread for it
  8. Overriden constructor for the server, assigns the OnConnect handler
  9. Main program. Create event loop object for the server, creates the server, assigning event loop and port to listen, and ready to accept connections...
Phew, go on to the client. This time it's simpler:

{$mode objfpc}{$H+}

uses
  Classes,SysUtils,Sockets,fpAsync,fpSock;

var
  ClientEventLoop: TEventLoop;
begin
  ClientEventLoop := TEventLoop.Create;
  with TTCPClient.Create(nil) do begin
    EventLoop := ClientEventLoop;
    Host := '127.0.0.1';
    Port := 12000;
    Active := true;
    EventLoop.Run;
    Stream.WriteAnsiString('Hello');
    Active := false;
  end;
  ClientEventLoop.Free;
end.

Not numbered since it's only a single main code block. First it creates event loop for the client, create the client, assigning event loop, host and port of the server to connect, activate the connection, send a 'Hello' message to the server and disconnects. Clear enough, eh?

OK, that's all for now. Got any questions? Just ask :)

Friday, January 14, 2011

Big mistake: wrong process model

This is a lesson learn for application developer. Remember my last post about compiler with LLVM backend? It doesn't end happily. I didn't manage to connect all the components, even some of them are either incomplete or not implemented at all. I pass the trial session badly. Even though I managed to graduate, but the ending is not good.

So, after the trial session, here comes the revision session. I crazily decided to rewrite the compiler FROM SCRATCH. But this time, I use feature based iterative incremental process model (previously component based iterative incremental). First, only single expression is allowed. Parser OK, AST OK, semantic checking OK. Next, assignment statement. Parser OK, AST OK, semantic checking OK. Next, compound assignment statement, return statement, method, class and so on. Ended with code generator. Guess what? Something that I didn't manage to finish in 3 months is finished in 1 week!

If only I realize this faster... maybe I'll get A (well, I finally get A- which is not too bad).

Friday, December 24, 2010

LLVM IR Builder in Object Pascal

I'm about to graduate from my university (January 2011 if there's no more problems), and as a final assigment (though it's optional in my faculty, but it would be a great experience and honor to have one) I choose to implement a compiler. One of our labs, Formal Method in Software Engineering (or simple FMSE), is the lab where my supervisor gets involved. Therefore, the compiler I'm writing would probably based on a project they're (or have been) working on. Yep, I was given LinguSQL language to implement.

Previously, the language had a compiler, that generates Java code (bad choice IMO) which is then compiled by a Java compiler. The problem with this approach, as in normal Java application, is the HUGE runtime environment that must be distributed if someone wants to use the application. Furthermore, since Java uses interpreted bytecode (don't count GCJ, I even believe less than 10 persons in my campus know that thing exists), the performance is at maximum only 1/3 of native binaries (someone in OSDEV forum ever said). Last but not least, the execution isn't trivial. One must type "java xxx" in order to execute the program.

As a native application (deve)lov(|p)er (read: developer and lover), I decided to implement a compiler for this language that generates native binaries. However, I don't have enough experience in generating native binaries (or at least native assembly). So, remembering an option, I asked my supervisor what if the compiler generates LLVM assembly language (also called LLVM Intermediate Representation or IR)? Do you what he said? "What is LLVM?" (doh). OK, so after bla bla bla, he accepted my choice. The advantages of generating LLVM assembly instead of native assembly are:
  1. A LOT of optimizations for free
  2. It can be compiled to native assembly for MANY platforms
  3. Easy integration with existing libraries
Despite those advantages, there are also disadvantages:
  1. It uses SSA format
  2. Written in C++, more specifically, it officially only supports G++! (there are some hacks to use MSVC but... still it's not official)
  3. Most important one: it doesn't have Object Pascal frontend
The SSA format is not actually a disadvantage, but it's just a little harder to generate code for. But the last one is really a show stopper... or a challenge depending on how you look at it :)

So the work begins. I continue the previous research, the previous Java based compiler that generates Java code uses JavaCC, followed by JavaCUP, and finally UUAG. The first two are parser generators, with some differences, mainly JavaCC generates LL parsers, while JavaCUP generates LALR one. Both are BAD. I always find parser generators are bad since we have no idea whether it's correct or not, and the grammar can't be deduced from the code (except for recursive descent parser generators like Coco/R). The last one is a Haskell based product, which actually runs like recursive descent parser, only in functional languages they're called parser combinators. The last one is quite good, with one important problem: when a parsing error happens, the parser tries to find all possible corrections, therefore slows down the parsing and eats resources. This behavior can't be customized easily and that's what makes me writing the whole thing from scratch using classic approach: a true recursive descent parser. This is the best parser I've ever learned, since it's the most flexible one (there are tons of way to handle parsing error and that's totally up to you, with many methods possibly combined or used specifically for certain productions) and still shows the grammar in its code.

Come to the code generation part, the problem I stated above must be covered. I create my own LLVM IR Builder to generate LLVM assembly language. Due to the SSA structure, it's a bit difficult, but I managed to create it quite successful with beautiful modular architecture. It can now generate modules consisting of functions and global variables, where each functions can have local variables, labels (for branch and loop), arithmetic instructions, memory instructions, etc. It's not yet complete, but already capable of generating simple programs. I'll put it in my bitbucket account when I think it's quite production ready.

Wants some code? OK:

program llvmirbuildertest;

{$mode objfpc}{$H+}

uses
  llvmirbuilder;

var
  x,y,l,s,a,b: TLLVMSymbol;
  c: TLLVMConstant;
  cl: TLLVMCallInstruction;
begin
  x := TLLVMSymbol.Create('x',lltInteger,true);
  y := TLLVMSymbol.Create('y',lltInteger);
  c := TLLVMConstant.Create('255',lltInteger);
  l := TLLVMLoadInstruction.Create('tmp',lltInteger,x);
  s := TLLVMStoreInstruction.Create('tmp',lltInteger,y,x);
  cl:= TLLVMCallInstruction.Create('func',lltInteger);
  a := TLLVMAddInstruction.Create('a',lltInteger,x,c);
  b := TLLVMSubInstruction.Create('b',lltInteger,c,y);
  WriteLn(l.GenerateCode);
  WriteLn(s.GenerateCode);
  WriteLn(cl.GenerateCode);
  WriteLn(a.GenerateCode);
  WriteLn(b.GenerateCode);
  a.Free;
  b.Free;
  cl.Free;
  s.Free;
  l.Free;
  c.Free;
  y.Free;
  x.Free;
end.
and the generated LLVM IR:
%tmp = load i32 * @x
store i32 %y, i32 * @x
call i32 @func()
%a = add i32 @x, 255
%b = sub i32 255, %y
Note that it's a partial code, so compiling this with llvm-as would absolutely produce an error.

Saturday, December 18, 2010

Using Google Maps service in a Pascal based Web Application

My final team assignment of Information Retrieval (IR) class allows us to create any kind of IR system, and we decided to create a food ingredients and restaurant search system. It displays ingredients for a chosen food and then displays 10 top restaurants that provide the food. For the first one, it's simply a database approach. But the latter is a Geographic IR (GIR). Instead of creating it from scratch (which could take probably a whole life), we decided to use google maps service.

Due to the requirement that the system must be accessible from web, it's splitted into two parts: client and server. The client part is coded by my friend and the server part is mine. The client part is coded with ExtJS and we use JSON to communicate between the client and server. Since the server is a Service Oriented Application (SOA), it's no problem what language it's written in. And since I'm a Pascal geek (you can call me maniac if you want), I choose to write it in Pascal.

To create the server, I use fpWeb components available from standard Lazarus distribution. The server itself connects to Google Maps service to look for the coordinates. Hmm... how can the server do this? Luckily, there are two powerful networking components and libraries for Pascal, namely lNet and Synapse. I used to lNet due to its component based approach, however after struggling a bit, lNet seems to leak some features I need (esp. proxy support because my campus' internet access is protected by that). Then another Pascalian told me that it's very easy to do with Synapse. And so it goes:

with THTTPSend.Create do
  try
    ProxyHost := ConfigFile.ReadString('proxy','host','');
    ProxyPort := ConfigFile.ReadString('proxy','port','');
      URL := Format('http://maps.google.com/maps/geo?q=%s&output=json&oe=utf8&sensor=true' +
        '&key=%s',[QueryStr,GoogleAPIKey]);
      if not HTTPMethod('GET',URL) then begin
        AResponse.Content := Format('Error %d: %s',[ResultCode,ResultString]);
      end else begin
        AResponse.Contents.LoadFromStream(Document);
      end;
      AppendToLog(Headers.Text + AResponse.Content);
  finally
      Free;
    end;
  Handled := true;

The server uses TIniFile (as ConfigFile in above code) to store and retrieve proxy information so it can be adjusted without recompilation. QueryStr is the string that we want Google Maps to look for and GoogleAPIKey is the... err.. API key to access Google Maps. FYI, I'm using version 2 of the API which still requires a key, version 3 doesn't need it anymore.

It doesn't yet show anything useful, only the original JSON from Google Maps. Later, this JSON must be processed and merged with database result, and then sent to the client to be processed further. But after all... it's done in Pascal :)