Web2PY , Google App Engine, and DatastoreTimeoutException

While web2py runs great on the Google App Engine there are several gotchas that can cause a lot of headache if you aren’t aware of them. [Note to readers – this post assumes an advanced understanding of web2py and python.]

Google App Engine will throw a timeout error (DatastoreTimeoutException ) if your web2py function takes too long to execute. This can happen with a long database update or query. The average time to trigger a timeout appears to be about 30 seconds.

One approach to resolve this, if you can’t refactor your query or update operation to guarantee it will stay under 30 seconds or 1,000 records, is to create a progress display page that periodically calls a web2py function to incrementally perform the operation you are attempting.

The example I’ll use is a fairly complex report that works against a large number of records that can take several minutes to execute.

We have a summary_report view that allows the user to choose multiple filter options (year, school, etc). On submit the controller summary_reports is called, the request.vars from the user selections are saved as session variables and the view summary_report_display is called. The variable session.summary_m is our chunk size for our query, you can increase it to speed up the report but going too high can cause the process to hit the 30 second limit.


@auth.requires_login()
def summary_reports():

    if request.vars.teacher_select:

        #stores running values from report generator
        session.dictGraphVariables={}

        #user selections
        session.year_select=request.vars.year_select
        session.school_select=request.vars.school_select
        session.class_select=request.vars.class_select
        session.teacher_select=request.vars.teacher_select
        session.semester_select=request.vars.semester_select
        session.observer_select=request.vars.observer_select
        session.datestart=request.vars.datestart
        session.dateend=request.vars.dateend

        session.includearchive=request.vars.includearchive

        #used in the partial query
        session.summary_i=0
        session.summary_m=300
        session.summary_break=0

        #counter for number of report results
        session.summary_overallcount=0

        redirect(URL('summary_report_display'))

    return dict()

Our summary_report_display view has a javascript function defined that calls our web2py function get_progress_on_summary_report every 3 seconds, the get_progress_on_summary_report does the heavy lifting.


{{extend 'layout.html'}}</pre>
<h1>Generating Reports</h1>
<h2>This may take a while if you have a lot of walkthroughs - but you can monitor progress below...</h2>
<pre></pre>
<div id="progress"></div>
<div id="summary_report_show"></div>
<pre>

Here is the code in the web2py controller for summary_report_display that sets up the progress bar display and the javascript call to get_progress_on_summary_report every 3 seconds.

progress = DIV(_id="progress")
wrapper = DIV(progress,_style="width:400px;")
summary_report_show = DIV(_id="summary_report_show")

@auth.requires_login()
def summary_report_display():

    page.include("http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/jquery-ui.min.js")
    page.include("http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/themes/ui-darkness/jquery-ui.css")
    callback = js.call_function(get_progress_on_summary_report)
    page.ready(jq(progress).progressbar( dict(value=request.now.second) )() )
    page.ready(js.timer(callback,3000))
    return dict(wrapper=wrapper)

The get_progress_on_summary_report function in the web2py controller does most of the work.


def get_progress_on_summary_report():

    if session.summary_break==0:

        queries=[]

        #base query includes all walkthroughs
        queries.append(db.t_walkthrough.id>0)

        #build the query list based on user selections

        #if user hasn't selected 'All' for an individual search/filter term, then add to the query
        if session.school_select!='All':
            queries.append(db.t_walkthrough.school==int(session.school_select))

        if session.semester_select!='All':
            queries.append(db.t_walkthrough.semester==session.semester_select)

        if session.year_select!='All':
            queries.append(db.t_walkthrough.f_year==session.year_select)

        if session.class_select!='All':
            queries.append(db.t_walkthrough.t_class==int(session.class_select))

        if session.teacher_select!='All':
            queries.append(db.t_walkthrough.teacher==int(session.teacher_select))

        if session.observer_select!='All':
            queries.append(db.t_walkthrough.observer==int(session.observer_select))

        #handle start / end date if user entered same
        if len(session.datestart)>0:
            date_object = datetime.datetime.strptime(str(session.datestart), '%Y-%m-%d')
            queries.append(db.t_walkthrough.f_date>=date_object)

        if len(session.dateend)>0:
            date_objectEnd = datetime.datetime.strptime(str(session.dateend), '%Y-%m-%d')
            queries.append(db.t_walkthrough.f_date
        #create query from list
        summary_report_query = reduce(lambda a,b:(a&b),queries)

        rows = db(summary_report_query).select(limitby=(session.summary_i*session.summary_m,(session.summary_i+1)*session.summary_m))

        for r in rows:

            # do something with r
            session.summary_overallcount+=1
            list_walkthrough_result=db(db.t_walkthrough_result.walkthrough==r.id).select()
            for w_record_result in list_walkthrough_result:
                if not type(w_record_result.f_result) is NoneType:
                    if w_record_result.f_result==None:
                        #do nothing
                        testvalue='none'
                    else:
                        if w_record_result.f_result in session.dictGraphVariables:
                            session.dictGraphVariables[w_record_result.f_result]+=1.0
                        else:
                            session.dictGraphVariables[w_record_result.f_result]=1.0

        if len(rows)
<a href="summary_reports_complete">Click Here to View Graphs</a>'
            else:
                return_value='Report generation complete - there are ' + str(session.summary_overallcount) + ' results.

'

            return jq(summary_report_show).html(return_value)()
        else:
            return

The core of the the iterative process is the query below, it pulls a chunk of rows (size defined by summary_m) each cycle.

rows = db(summary_report_query).select(limitby=(session.summary_i*session.summary_m,(session.summary_i+1)*session.summary_m))

This performs our operation in chunks, once there are no more rows the function changes state and then returns a link to the summary_report_show view page which will parse the results and present them to the user, if no results are produced then the operation tells the user.

This approach will let you perform queries or db updates against an very large set of records without triggering the timeout.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s