Rich Text Format (RTF) documents from your web2py application

We needed to generate Microsoft Word .doc files from a web application based on web2py.  Creating a .doc file from scratch from python is no small task, but luckily for us rtf (Rich Text Format) files can be opened natively by Word and cover everything we needed.

Web2py includes the pyrtf library. Pyrtf is a set of python classes that make it possible to produce RTF documents from python programs. The library has no external dependencies and has proven reliable and fast.

The code snippet below is in a web2py function, it imports the library, initializes it and then creates a simple rtf doc.

After adding the line ‘Certified Staff Evaluation’ the function returns the newly created doc file. It is possible to create tables and include images – documentation for the library and example rtf generation files can be found at PyRTF. (note that web2py uses a slightly older version of pyrtf – you can see docs for it here.

from gluon.contrib.pyrtf import *

doc     = Document()
ss      = doc.StyleSheet
section = Section()
doc.Sections.append( section )

p = Paragraph( ss.ParagraphStyles.Heading1 )
p.append( 'Certified Staff Evaluation' )
section.append( p )
return doc

Web2PY , Google App Engine, and DatastoreTimeoutException

While web2py runs great on the Google App Engine there are several gotchas that can cause a lot of headache if you aren’t aware of them. [Note to readers – this post assumes an advanced understanding of web2py and python.]

Google App Engine will throw a timeout error (DatastoreTimeoutException ) if your web2py function takes too long to execute. This can happen with a long database update or query. The average time to trigger a timeout appears to be about 30 seconds.

One approach to resolve this, if you can’t refactor your query or update operation to guarantee it will stay under 30 seconds or 1,000 records, is to create a progress display page that periodically calls a web2py function to incrementally perform the operation you are attempting.

The example I’ll use is a fairly complex report that works against a large number of records that can take several minutes to execute.

We have a summary_report view that allows the user to choose multiple filter options (year, school, etc). On submit the controller summary_reports is called, the request.vars from the user selections are saved as session variables and the view summary_report_display is called. The variable session.summary_m is our chunk size for our query, you can increase it to speed up the report but going too high can cause the process to hit the 30 second limit.

def summary_reports():

    if request.vars.teacher_select:

        #stores running values from report generator

        #user selections


        #used in the partial query

        #counter for number of report results


    return dict()

Our summary_report_display view has a javascript function defined that calls our web2py function get_progress_on_summary_report every 3 seconds, the get_progress_on_summary_report does the heavy lifting.

{{extend 'layout.html'}}</pre>
<h1>Generating Reports</h1>
<h2>This may take a while if you have a lot of walkthroughs - but you can monitor progress below...</h2>
<div id="progress"></div>
<div id="summary_report_show"></div>

Here is the code in the web2py controller for summary_report_display that sets up the progress bar display and the javascript call to get_progress_on_summary_report every 3 seconds.

progress = DIV(_id="progress")
wrapper = DIV(progress,_style="width:400px;")
summary_report_show = DIV(_id="summary_report_show")

def summary_report_display():

    callback = js.call_function(get_progress_on_summary_report)
    page.ready(jq(progress).progressbar( dict( )() )
    return dict(wrapper=wrapper)

The get_progress_on_summary_report function in the web2py controller does most of the work.

def get_progress_on_summary_report():

    if session.summary_break==0:


        #base query includes all walkthroughs

        #build the query list based on user selections

        #if user hasn't selected 'All' for an individual search/filter term, then add to the query
        if session.school_select!='All':

        if session.semester_select!='All':

        if session.year_select!='All':

        if session.class_select!='All':

        if session.teacher_select!='All':

        if session.observer_select!='All':

        #handle start / end date if user entered same
        if len(session.datestart)>0:
            date_object = datetime.datetime.strptime(str(session.datestart), '%Y-%m-%d')

        if len(session.dateend)>0:
            date_objectEnd = datetime.datetime.strptime(str(session.dateend), '%Y-%m-%d')
        #create query from list
        summary_report_query = reduce(lambda a,b:(a&b),queries)

        rows = db(summary_report_query).select(limitby=(session.summary_i*session.summary_m,(session.summary_i+1)*session.summary_m))

        for r in rows:

            # do something with r
            for w_record_result in list_walkthrough_result:
                if not type(w_record_result.f_result) is NoneType:
                    if w_record_result.f_result==None:
                        #do nothing
                        if w_record_result.f_result in session.dictGraphVariables:

        if len(rows)
<a href="summary_reports_complete">Click Here to View Graphs</a>'
                return_value='Report generation complete - there are ' + str(session.summary_overallcount) + ' results.


            return jq(summary_report_show).html(return_value)()

The core of the the iterative process is the query below, it pulls a chunk of rows (size defined by summary_m) each cycle.

rows = db(summary_report_query).select(limitby=(session.summary_i*session.summary_m,(session.summary_i+1)*session.summary_m))

This performs our operation in chunks, once there are no more rows the function changes state and then returns a link to the summary_report_show view page which will parse the results and present them to the user, if no results are produced then the operation tells the user.

This approach will let you perform queries or db updates against an very large set of records without triggering the timeout.

PDFS from GAE using web2py & pdfcrowd

Recently we used pdfcrowd to print pdfs from several of our applications running on the Google App Engine.

Integration is very straight forward – we use the web2py python framework to run on top of GAE – so these steps reflect that. Also in our example we want our python function to return the content as a pdf file.

  1. Register at pdfcrowd
  2. Download their python client library
  3. Copy the source of the client library to the module directory of your web2py application directory
  4. Use the code snippet example below to initialize and call the pdfcrowd library
def print_observation_formpdfcrowd():

   out = StringIO.StringIO()

   pdfcrowd = local_import('pdfcrowd')

   client = pdfcrowd.Client("username", "apikey")

   # our html to convert
   html='<head></head><body>My HTML Layout</body>'

   client.convertHtml(html, out)

   # prepare PDF to download:

   pdf_file_name = 'evaluation.pdf'


   return out.getvalue()