
Programming Overview
====================

This document explains how a Quixote application is structured.  Be sure
you have read the "Understanding the demo" section of demo.txt first --
this explains a lot of Quixote fundamentals.

There are three components to a Quixote application:

1) A driver script, usually a CGI or FastCGI script.  This is
   the interface between your web server (eg., Apache) and the bulk of
   your application code.

   The driver script is responsible for creating a Quixote publisher
   customized for your application and invoking its publishing loop.

2) A configuration file.  This file specifies various features of the
   Publisher class, such as how errors are handled, the paths of
   various log files, and various other things.  Read through
   quixote/config.py for the full list of configuration settings.
   
   The most important configuration parameters are:
      URL_PREFIX        prefix of URLs that will be directed to Quixote
                        (must also be set in your web server's
                        configuration; this is only used when Quixote
                        generates internal redirects)
      ERROR_EMAIL	e-mail address to which errors will be mailed
      ERROR_LOG		file to which errors will be logged

3) Finally, the bulk of the code will be in a Python package or module,
   called the root namespace.  The Quixote publisher will be set up to
   start traversing at the root namespace.


Driver script
-------------

The driver script is the interface between your web server and Quixote's
"publishing loop", which in turn is the gateway to your application
code.  Thus, there are two things that your Quixote driver script must
do:

  * create a Quixote publisher -- that is, an instance of the Publisher
    class provided by the quixote.publish module -- and customize it for
    your application

  * invoke the Quixote publishing loop by calling the 'publish_cgi()'
    method of the publisher

The publisher is responsible for translating URLs to Python objects and
calling the appropriate function, method, or PTL template to retrieve
the information and/or carry out the action requested by the URL.

The most important application-specific customization done by the driver
script is to set the root namespace of your application.  Broadly
speaking, a namespace is any Python object with attributes.  The most
common namespaces are modules, packages, and class instances.  The root
namespace of a Quixote application is usually a Python package, although
for a small application it could be a regular module.

The driver script can be very simple; for example, here is a
trimmed-down version of demo.fcgi, the driver script for the Quixote
demo:

-- demo.cgi ------------------------------------------------------------
from quixote import enable_ptl, Publisher

enable_ptl()
app = Publisher("quixote.demo")
app.publish_cgi()
------------------------------------------------------------------------

(Whether you call this demo.cgi, demo.fcgi, demo.py, or whatever is up
to you and your web server.)

That's almost the simplest possible case -- there's no
application-specific configuration info apart from the root namespace.
(The only way to make this simpler would be to remove the enable_ptl()
call.  This would remove the ability to import PTL modules, which is at
least half the fun with Quixote.)

Here's a slightly more elaborate example, for a hypothetical database of
books:

-- books.cgi -----------------------------------------------------------
from quixote import enable_ptl, Publisher
from quixote.config import Config

# Install the PTL import hook, so we can use PTL modules in this app
enable_ptl()

# Create a Publisher instance with the default configuration.
pub = Publisher('books')

# Read a config file to override some default values.
pub.read_config('/www/conf/books.conf')

# Enter the publishing main loop
pub.publish_cgi()
------------------------------------------------------------------------

The application code is kept in a package named simply 'books' in this
example, so its name is provided as the root namespace when creating the
Publisher instance.

The SessionPublisher class in quixote.publish can also be used; it
provides session tracking.  The changes required to use
SessionPublisher would be:

-- [variation on books.cgi] --------------------------------------------
...
from quixote.publish import SessionPublisher
from quixote.session import SessionManager
...
pub = SessionPublisher(PACKAGE_NAME)
pub.set_session_manager(SessionManager())
...
------------------------------------------------------------------------

It's also possible to subclass the Publisher or SessionManager classes
in order to provide some specialized behaviour necessary for your
application.  Some uses for this would be:

  * SessionManager stores user sessions in an in-memory
    dictionary, so all sessions are lost when the driver script
    terminates.  A plain CGI driver script terminates after handling
    every request, so obviously using FastCGI is essential for
    applications with non-persistent sessions.  Even so, a long-running
    FastCGI process sometimes terminates: you might need to restart it
    to reload modified application code, or your system might crash.

    If sessions are used to contain important data, such as the contents
    of a shopping cart, they should be stored persistently.  Currently,
    Quixote leaves it up to the application writer (ie. you) to add
    persistence to the basic session managemet mechanism.  You can do
    this by subclassing SessionManager and Publisher.

  * The default behaviour on an uncaught exception is to record
    the time, the traceback, and the contents of the request.  This is
    written to the configured error log (the ERROR_LOG configuration
    variable), and mailed to the configured e-mail address (the
    ERROR_EMAIL config variable).  If you want some different
    behaviour, you would have to subclass Publisher or SessionPublisher
    and override the finish_failed_request() method.

This document won't try to explain how to write subclasses of the
Quixote classes; read the docstrings in the code for detailed
explanations.

Getting the driver script to actually run is between you and your web
server.  See the configuration.txt document for help, especially with
Apache (which is the only web server we currently know anything about).


Configuration file
------------------

In the "books.cgi" driver script, configuration information is read
from a file by this line:
    pub.read_config('/www/conf/books.conf')

You should never edit the default values in quixote/config.py, because
your edits will be lost if you upgrade to a newer Quixote version.  You
should certainly read it, though, to understand what all the
configuration variables are.

The configuration file contains Python code, which is then evaluated
using Python's built-in function execfile().  Since it's Python code,
it's easy to set config variables:

-- books.conf ----------------------------------------------------------
ACCESS_LOG = "/www/log/access/books.log" 
DEBUG_LOG = "/www/log/books-debug.log"
ERROR_LOG = "/www/log/books-error.log"
------------------------------------------------------------------------

You can also execute arbitrary Python code to figure out what the
variables should be.  The following example changes some settings to
be more convenient for a developer when the WEB_MODE environment
variable is the string 'DEVEL':

-- [config file excerpt] -----------------------------------------------
web_mode = os.environ["WEB_MODE"]
if web_mode == "DEVEL":
    DISPLAY_EXCEPTIONS = 1
    SECURE_ERRORS = 0
    RUN_ONCE = 1
elif web_mode in ("STAGING", "LIVE"):
    DISPLAY_EXCEPTIONS = 0
    SECURE_ERRORS = 1
    RUN_ONCE = 0
else:
    raise RuntimeError, "unknown server mode: %s" % web_mode
------------------------------------------------------------------------

At the MEMS Exchange, we use this flexibility to display tracebacks in
DEVEL mode, to redirect generated e-mails to a staging address in
STAGING mode, and to enable all features in LIVE mode.


Application code
----------------

Finally, we reach the most complicated part of a Quixote application.
However, thanks to Quixote's design, everything you've ever learned
about designing and writing Python code is applicable, so there are no
new hoops to jump through.  The only new language to learn is PTL, which
is simply Python with a novel way of generating function return values
-- see PTL.txt for details.

An application's code lives in a Python package that contains both .py
and .ptl files.  Complicated logic should be in .py files, while .ptl
files, ideally, should contain only the logic needed to render your Web
interface and basic objects as HTML.  As long as your driver script
calls enable_ptl(), you can import PTL modules (.ptl files) just as if
they were Python modules.

Quixote's publisher will start at the root of this package, and will
treat the rest of the URL as a path into the package's contents.  Here
are some examples, assuming that the URL_PREFIX is "/q", your web server
is setup to rewrite "/q" requests as calls to (eg.) 
"/www/cgi-bin/books.cgi", and the root package for your application is
"books":

  http://.../q/         call         books._q_index()
  http://.../q/other    call         books.other(), if books.other
                                     is callable (eg. a function or
                                     method)
  http://.../q/other    redirect to  /q/other/, if books.other is a
                                     namespace (eg. a module or sub-package)
  http://.../q/other/   call         books.other._q_index(), if books.other
                                     is a namespace

One of Quixote's design goals is "Be explicit."  Therefore there's no
complicated rule for remembering which functions in a module are public;
you just have to list them all in the _q_exports variable, which should
be a list of strings naming the public functions.  You don't need to
list the _q_index function as being public; that's assumed.  Eg. if
'foo()' is a function to be exported (via Quixote to the web) from your
application's namespace, you should have this somewhere in that
namespace (ie. at module level in a module or __init__.py file):
  _q_exports = ['foo']

When a function is callable from the web, it must expect a single
parameter, which will be an instance of the HTTPRequest class.  This
object contains everything Quixote could discover about the current HTTP
request -- CGI environment variables, form data, cookies, etc.  When
using SessionPublisher, request.session is a Session object for the user
agent making the request.

The function should return a string; all PTL templates return a string
automatically.  request.response is an HTTPResponse instance, which has
methods for setting the content-type of the function's output,
generating an HTTP redirect, specifying arbitrary HTTP response headers,
and other common tasks.  (Actually, the request object also has a method
for generating a redirect.  It's usually better to use this -- ie. code
"request.redirect(...)" because generating a redirect correctly requires
knowledge of the request, and only the request object has that
knowledge.  "request.response.redirect(...)" only works if you supply an
absolute URL, eg. "http://www.example.com/foo/bar".)

Use
  pydoc quixote.http_request
  pydoc quixote.http_response
to view the documentation for the HTTPRequest and HTTPResponse classes,
or consult the source code for all the gory details.

There are exactly two ways to affect the how Quixote traverses a URL to
determine how to handle it: '_q_access()' and '_q_getname()'.

_q_access(request)

  If this function is present in a module, it will be called before
  attempting to traverse any further.  It can look at the contents of
  request and decide if the traversal can continue; if not, it should
  raise quixote.errors.AccessError (or a subclass), and Quixote will
  return a 403 Forbidden HTTP status code.  The return value is
  ignored if _q_access() doesn't raise an exception.

  For example, in the MEMS Exchange code, we have some sets of pages
  that are only accessible to signed-in users of a certain type.  The
  _q_access() function looks like this:

    def _q_access (request):
        if request.session.user is None:
            raise NotLoggedInError("You must be signed in to view reports.")
        if not (request.session.user.is_MX() or
                request.session.user.is_fab()):
            raise MXAccessError("You don't have access to the reports page.")

  This is less error-prone than having to remember to add checks to 
  every single public function.


_q_getname(request, component)

  This function translates an arbitrary string into an object that we
  continue traversing.  This is very handy; it lets you put
  user-space objects into your URL-space, eliminating the need for
  digging ID strings out of a query, or checking PATH_INFO after
  Quixote's done with it.  But it is a compromise with security: it
  opens up the traversal algorithm to arbitrary names not listed in
  _q_exports.  You should therefore be extremely paranoid about
  checking the value of 'component'.

  'request' is the request object, as it is everywhere else;
  'component' is a string containing the next chunk of the path.
  _q_getname() should return some object that can be traversed
  further, so it should have a _q_index() method, a _q_exports
  attribute, and optionally _q_access() or its own _q_getname().
  We generally write special classes for this purpose, though you
  could choose a particular module and return that instead.

  For example, we want people to be able to go to
  http://.../q/run/250/ to view run #250.  This is more readable than
  the alternatives '/q/run/?id=250' or even '/q/run?250'.  The
  corresponding function and class look like this:

    def _q_getname (request, component):
        return RunUI(request, component)

    class RunUI:
        _q_exports = ['details']

        def __init__ (self, request, component):
            run_id = int(component)
            run_db = get_run_database()
            self.run = run_db.get_run(run_id, run_version) 
            if not self.run.can_access(request.session.user):
                raise MXAccessError("You are not allowed to access run %d." %
                                    run_id)

        def _q_index (self, request):
            ...
        def details (self, request):
            ...

  The __init__() method is actually much longer, and is very paranoid
  about checking whether the value of 'component' is actually a number,
  if the run exists, and if the user is permitted to view that run.


-- 
A.M. Kuchling    <akuchlin@mems-exchange.org>
Neil Schemenauer <nascheme@mems-exchange.org>
Greg Ward        <gward@mems-exchange.org>


