[Maypole] Maypole::HTTPD

Simon Cozens simon@simon-cozens.org
Fri, 29 Oct 2004 15:55:00 +0100


--ew6BAiZeqk4r7MaW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I have no intention of releasing or documenting this, but if you want it, it's
at http://cvs.simon-cozens.org/viewcvs.cgi/Maypole-HTTPD/

It's a lightweight httpd for Maypole applications. I'd be delighted if someone
could make this suitable for release. If you want more information about how
it works and how to use it, see the attached Perl Journal article.

-- 
Gu sa-sur bi nu-ha-za sila-a KU.  -- Sumerian saying

--ew6BAiZeqk4r7MaW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="webgui.pod"


=head1 Using the web as a GUI

I could never get on with GUI programming at all. I don't think very
well in terms of the event loop paradigm. I don't want to spend a lot of
time laying out widgets and connections, but I don't like the look of
those toolkits like Tk that require you to pack widgets together.
And to cap it all, I at least like to pretend that my applications are
cross-platform, and most GUI widget sets just aren't.

At the same time, I've been using HTML and CSS for pretty much
everything - layout of documents for printing, presentation slides, you
name it. And of course I'd been writing lots of web applications with
Maypole. Why shouldn't I use a web browser to provide the GUI to a
nominally web-based application instead of writing a true GUI program?

Of course, it's hardly a new idea. Activestate's Komodo is an example of
a sophisticated application based on top of the Mozilla browser
platform. I recently had to write a Windows application, but to develop
it on the Mac, so I chose to write it using HTML and CSS for the
display, Javascript for the client side, and a Maypole backend to
connect the whole thing to a database. The application runs in a web
browser, using a local web server, but the end user doesn't need to know
or care - from their point of view, a window pops up on the screen and
they interact with it.

This of course requires a local web server, a copy of Perl, and almost
half of CPAN, some things which Windows is notorious for not providing.
Additionally, we don't really want the user to go through a laborious
process of installing and setting up all these complicated systems.
Ideally, we want the single executable to do everything itself, with no
installation required. To make this happen, we're going to have to write
our own web srever, and package it all up - the server, the application,
the browser, the templates to be displayed and everything else, into a
single binary.

Let's first look at the web server.

=head2 The Web Server

I'll start by saying that none of the ideas that I've used in this article are
original; we all stand on the shoulders of giants. There are many modules and
methods for creating a web server in Perl, but I've used the
C<standalone_httpd> from the RT web application.
(http://www.bestpractical.com/rt/) RT is trying to do the same sort of thing
that we're doing - having a web server that only knows how to talk to the RT
application, so that it can all be bundled into a single program.

C<standalone_httpd>s a simply-designed server, with the emphasis on portability
and speed. Let's take a look at how it's constructed and how we adapt it for
our Maypole application. We'll be talking about Maypole for our purposes, but
similar considerations would be applicable to any situation where we're trying
to build a compact web server around an application.

First, we use the old-fashioned C<Socket> operations to bind to the web server
port and listen for connections. It may be ugly, but it's fast, and that's
what counts here - we're looking for a real-time response, just like you'd
get with a conventional GUI application without the overhead of making HTTP
connections, so we need to cut down as much extraneous stuff as possible.

    my $port = shift;
    my $tcp  = getprotobyname('tcp');

    socket( HTTPDaemon, PF_INET, SOCK_STREAM, $tcp ) or die "socket: $!";
    setsockopt( HTTPDaemon, SOL_SOCKET, SO_REUSEADDR, pack( "l", 1 ) )
      or warn "setsockopt: $!";
    bind( HTTPDaemon, sockaddr_in( $port, INADDR_ANY ) ) or die "bind: $!";
    listen( HTTPDaemon, SOMAXCONN ) or die "listen: $!";

    print("You can connect to your RT server at http://localhost:$port/\n");

Now we've set up the listening socket, we can take requests:

   while (1) {

        for ( ; accept( Remote, HTTPDaemon ); close Remote ) {
            *STDIN  = *Remote;
            *STDOUT = *Remote;
            chomp( $_ = <STDIN> );

We accept the remote socket, and then set up standard input and standard output
to read from and print to that respectively; this mimics the usual CGI
environment. We also read the first line of the HTTP request from the socket.
Again, we could use C<HTTP::Request> to do this, but we need to keep it lean
and lightweight.

>From this line of the request, we can read off the method, the URI, any
GET parameters, and check that we're looking at a valid request:

            my ( $method, $request_uri, $proto, undef ) = split;

            my ( $file, undef, $query_string ) =
              ( $request_uri =~ /([^?]*)(\?(.*))?/ );    # split at ?

            last if ( $method !~ /^(GET|POST|HEAD)$/ );

Next we dispatch to a function which turns all of these things into the
kind of CGI environment variables that we would expect:

            build_cgi_env( method       => $method,
                           query_string => $query_string,
                           path         => $file,
                           method       => $method,
                           port         => $port,
                           peername     => "localhost",
                           peeraddr     => "127.0.0.1",
                           localname    => "localhost",
                           request_uri  => $request_uri );


We won't go into all the details of how that does its job, but we should know
that at this point, our program looks very much like an ordinary CGI script. So
it shouldn't be much of a surprise that the RT standalone HTTP server now just
creates a C<CGI> object and runs it through its C<HTML::Mason> handler, which
does all the processing and spits out the output to the client:

            RT::ConnectToDatabase();
            my $cgi = CGI->new();
            print "HTTP/1.0 200 OK\n";    # probably OK by now
            eval { $h->handle_cgi_object($cgi); };

And that's basically it - a web server that contains everything it needs to
respond to a request and hand it over to RT. Now we want to modify this so that instead of running a C<HTML::Mason> handler, it runs our Maypole application.

=head2 Adjustments for Maypole

We wrap this program up into C<Maypole::HTTPD>, and customize the part that
responds to the CGI request. Maypole already has a CGI driver, C<CGI::Maypole>,
so it's reasonable to use that. However, Maypole uses C<CGI::Simple>, and it
turns out for some reason that C<CGI::Simple> doesn't like our CGI environment;
additionally, the RT server always returns C<200 OK>, but we might not want to
do that on some occasions. Finally, a Mason request will automatically handle
static files that need to be served from the application, such as logos, CSS
and XSL files, and so on, but we don't have code in Maypole to handle this, so
we need to be able to serve files as well as pass things through the Maypole
process. Thankfully, in the application I had, I knew that every URL containing
C</static/> related to a static file we needed to serve up.

So we'll begin by laying out the things our code will need to do: serve
a file, or pass the request to Maypole and send the output.

    if ($path =~ /static/) { return $self->serve($path) }

    print "HTTP/1.1 200 OK\n"; 
    # Do something Maypole here

Let's deal with serving files, which is the normal use of a web server but
rather incidental to what we're doing. With C<serve>, we're given a path,
and we need to turn this into a file and serve it up with the correct MIME
type. 

    use File::Spec::Functions qw(canonpath);
    use File::MMagic;
    use URI::Escape;

    sub serve {
        my ($self, $path) = @_;
        $path = "./".canonpath(uri_unescape($path));
        if (-e $path and open FILE, $path) {
            binmode FILE;
            print "HTTP/1.1 200 OK\n"; 
            my $magic = File::MMagic->new();
            print "Content-type: ", $magic->checktype_filename($path), "\n\n";
            print <FILE>;
            return;
        }
        print "HTTP/1.1 400 Not found\n";
    }

We're using three common CPAN modules here: C<File::Spec::Functions> is
not only used to handle filenames in a platform-agnostic way, its
C<canonpath> function allows us to stop any file access attacks: if the
user looks for C<http://localhost/../../../etc/passwd> then we need to
stop that. C<canonpath> treats the path as being absolute, so strips out
the initial C<../>s, leaving us with C<./etc/passwd> which hopefully won't
be found.

C<URI::Escape> allows us to convert the filenames from their encoded form
- with C<%20> for space, for instance - to the form that the filenames would
take on the disk. If after these two measures, we can open a filehandle,
then we have a file to serve and we can finally send the C<OK> status code.

At this point, we need to know what MIME type to send to the browser,
so that the file can be displayed properly; a PNG file to be used as a logo,
for instance, needs to be served with type C<image/png>. The C<File::MMagic>
module sniffs the first few bytes of a filehandle and determines the
appropriate MIME type to send. Then we can send the payload of the file,
and all is fine.

Next, the more common case of processing a request through Maypole. To
make this happen, we need to know in our main loop the name of the
Maypole application to call, we need to ensure it's based on
C<CGI::Maypole>, and then we can use the handy C<run> method to process
the request much like Mason's C<handle_cgi_object>. So we modify our
C<main_loop> to take an application name as well as a port:

    sub main_loop {
        my ($self, $module, $port) = @_;
        $port ||= 8080;

Next we check the application is loaded, and then fiddle it so that it's
based on C<CGI::Maypole>:

        $module->require;
        { no strict;
            local *isa = *{$module."::ISA"};
            unshift @isa, "CGI::Maypole"
                unless $isa[0] eq "CGI::Maypole"
        }

Finally, when we come to handle the request, we just need to say

        $module->run;

and we have a working server. 

=head2 The client

To give the impression that this is not a client-server application but
a standard GUI application, we need to write a wrapper program that
starts up the server, starts up a web browser, and points it at the right
address. This is where we need to be slightly platform specific, but
thankfully the driver script is very short. Here's the driver for the
application I was writing, called "Songbee":

    use Songbee;
    use Maypole::HTTPD;

    $x = fork or Maypole::HTTPD->main_loop("Songbee");
    system("firefox http://localhost:8080/");
    kill 1, $x, $$;

This works well enough on both Windows and Unix; it forks a process to
run the web server part, and then runs the web browser. When the web
browser is done, it kills both processes. It needs to do this because on
Activestate windows, the "forked" process isn't really forked, it's just
a thread of the main process, so we need to kill C<$$>.

Now we come to the most difficult bit - working out how to package
together all these elements, plus all the associated data, into a single
file.

=head2 PARring the code together

This is where Autrijus Tang's "PAR" comes in. PAR stands for Perl
ARchive, and is a Perl analogue of Java's JAR system - essentially a Zip
file of a program and everything that Perl needs to run it. 

At its very simplest, C<PAR> is just a mechanism that allows you to read
modules from inside a zip file. Once you've created the zip file, like
so:

    % zip modules.par lib/Songbee.pm lib/Songbee/HTTPD.pm ...

you can use the C<PAR> module to treat it as an include path:

    use PAR;
    use lib "modules.par"; # Now we can find Songbee and friends
    use Songbee::HTTPD;

Of course, just loading F<Songbee.pm> and the other files is no good if
you don't have the modules that they depend on. Thankfully, there's a
very helpful tool called C<Module::ScanDeps> which reports on the
dependencies of a given Perl program. So running it on the driver that
we wrote earlier, we get a whole raft of dependencies that are going to
need to go into our PAR when we run the program on a "clean" Windows
computer without Perl installed:

     % scandeps.pl songbee.pl
    'Class::DBI::Loader'                       => '0.02',
    'Songbee'                                  => 'undef',
    'Songbee::HTTPD'                           => 'undef',
    'Compress::Zlib'                           => '1.32',
    'CGI::Simple'                              => '0.075',
    'Maypole'                                  => '1.5',
    'CGI::Simple::Cookie'                      => '0.02',
    'CGI::Simple::Util'                        => '0.002',
    'Class::DBI::ColumnGrouper'                => 'undef',
    'Class::Data::Inheritable'                 => '0.02',
    ...

Now all we need to do is put these things together - the driver, the
archive of the modules, the automated dependency scanning - so that we
run one command and we end up with an archive which contains the program
and everything we need to run it. Thankfully, PAR does that too.

PAR comes with a binary called C<pp>, the Perl Packager. This does
everything that we need, such that we can say:

    % pp -a -o songbee.par songbee.pl

This will create C<songbee.par> from F<songbee.pl> and all its dependent
Perl modules. Now we can use the PAR Loader, C<parl>, to run this:

    % parl songbee.par

and we find that... it doesn't work. Unfortunately, C<pp> only
statically analyses the program for modules that are used or required;
it knows nothing about modules that are required dynamically. For
instance, C<Songbee> uses SQLite as its database, but this is only
determined at runtime - nowhere is there an explicit C<use DBD::SQLite>,
so the module is not picked up by C<pp>. We can provide a list of
additional modules for C<pp> to pick up by mentioning them on the
command line:

    % pp -a -o songbee.par -MDBD::SQLite -M... songbee.pl
    
But since there are a lot of them, I found it easier just to add
explicit use statements to the driver:

    use DBD::SQLite;
    use DBIx::ContextualFetch;
    use Class::DBI::Loader;
    use Class::DBI::Loader::SQLite;
    use Class::DBI::SQLite;
    use Class::DBI::Relationship::HasA;
    use Class::DBI::Relationship::HasMany;
    use Maypole::Model::CDBI;
    use Maypole::View::TT;
    use Template::Plugin::XSLT;

Now everything works. Well, sort of. That last line, 
C<use Template::Plugin::XSLT>, also pulls in C<XML::LibXML> 
and C<XML::LibXSLT>, and they in turn require some dynamically loaded C
libraries to be available. 

This is no problem for C<pp>, so long as we inform it, and we can use
the C<-l> switch to point it at the libraries in question:

    pp -a -l c:\perl\bin\libxml2.dll -l c:\perl\bin\libxslt_win32.dll
        -l c:\perl\bin\libexslt_win32.dll -o songbee.par songbee.pl

(It was at this point I switched to a batch file to construct my PAR
files.)

Now we've got rid of most of the dependencies into the one PAR file:
what remains outside are the templates, the browser, and, of course,
Perl itself. Thankfully, the last bit is easy to get rid of - by
dropping the C<-a> option, C<pp> will no longer simply produce a C<.par> file
but will also bundle up the Perl interpreter with it too, and produce a
standalone executable:

    pp -l c:\perl\bin\libxml2.dll -l c:\perl\bin\libxslt_win32.dll
       -l c:\perl\bin\libexslt_win32.dll -o songbee.exe songbee.pl

We run this program, the browser window pops up, the templates are
loaded and work, and the end user just sees an application on their
screen. All is well. 

Now the final piece of the puzzle is to have to hide all the data inside
the C<.exe> as well.

=head2 PARring data

PAR provides us with a way of packaging up files, and, indeed, entire
directories inside our C<PAR> Zip files, as well as the Perl modules
that live in there. When a C<PAR>-based application runs, PAR extracts
the contents of the Zip file to a temporary directory. It then provides
a hook into the C<@INC> mechanism so that module files can be found via
the temporary directory. Additionally, it puts the name of the temporary
directory in the environment variable C<PAR_TEMP>, and provides the
subroutine C<PAR::read_file> to read a data file from the archive.

So the first problem is getting all the data files into the archive. I
did this by creating a manifest file like so:

    static
    custom
    factory
    playitem
    playlist
    song
    workship.db
    firefox.exe
    ...

I could then feed this to C<pp> with the C<-A> parameter. Most of the entries 
in this file are directories, but C<pp> includes all the files in them
recursively.

Now we have a significantly larger PAR file, but we're not using the
data in it yet. To do this, we could fix our application to use
C<PAR::read_file> every time it wants to open a data file, but this is
pretty difficult - as well as rewriting the part of the web server that
serves up static files, we'd have to reach into the bits of Maypole that
look for templates. 

A much easier way is to simply change to the directory that all the
data's in. We add this to our driver:

    $ENV{PAR_TEMP} && chdir($ENV{PAR_TEMP});

And of course, everything will work without further modification.

=head2 The Proof of the Pudding

Now we can serve up files, start the Firefox browser, and everything
else, in the right place - with all of the code and data coming out of
the single C<.exe> file produced by C<pp>. 

As a test - and since this is exactly what I need to do when I deploy
the program - I sent the executable to friend who I knew didn't have
Perl, Firefox or anything else installed; he double-clicked the nice
icon, and up popped a window. No messy installation, tedious set-up, or
anything. 

By using HTML elements as the GUI, I've saved myself a lot of bother
with GUI programming and been able to use Maypole to get the application
coded quickly, and by using this client-server mechanism I've been able
to develop on Macintosh, run on Linux and ship to friends on Windows.

--ew6BAiZeqk4r7MaW--