While web2py runs great on the Google App Engine there are several gotchas that can cause a lot of headache if you aren’t aware of them. [Note to readers – this post assumes an advanced understanding of web2py and python.]
Google App Engine will throw a timeout error (DatastoreTimeoutException ) if your web2py function takes too long to execute. This can happen with a long database update or query. The average time to trigger a timeout appears to be about 30 seconds.
One approach to resolve this, if you can’t refactor your query or update operation to guarantee it will stay under 30 seconds or 1,000 records, is to create a progress display page that periodically calls a web2py function to incrementally perform the operation you are attempting.
The example I’ll use is a fairly complex report that works against a large number of records that can take several minutes to execute.
We have a summary_report view that allows the user to choose multiple filter options (year, school, etc). On submit the controller summary_reports is called, the request.vars from the user selections are saved as session variables and the view summary_report_display is called. The variable session.summary_m is our chunk size for our query, you can increase it to speed up the report but going too high can cause the process to hit the 30 second limit.
@auth.requires_login()
def summary_reports():
if request.vars.teacher_select:
#stores running values from report generator
session.dictGraphVariables={}
#user selections
session.year_select=request.vars.year_select
session.school_select=request.vars.school_select
session.class_select=request.vars.class_select
session.teacher_select=request.vars.teacher_select
session.semester_select=request.vars.semester_select
session.observer_select=request.vars.observer_select
session.datestart=request.vars.datestart
session.dateend=request.vars.dateend
session.includearchive=request.vars.includearchive
#used in the partial query
session.summary_i=0
session.summary_m=300
session.summary_break=0
#counter for number of report results
session.summary_overallcount=0
redirect(URL('summary_report_display'))
return dict()
Our summary_report_display view has a javascript function defined that calls our web2py function get_progress_on_summary_report every 3 seconds, the get_progress_on_summary_report does the heavy lifting.
{{extend 'layout.html'}}</pre>
<h1>Generating Reports</h1>
<h2>This may take a while if you have a lot of walkthroughs - but you can monitor progress below...</h2>
<pre></pre>
<div id="progress"></div>
<div id="summary_report_show"></div>
<pre>
Here is the code in the web2py controller for summary_report_display that sets up the progress bar display and the javascript call to get_progress_on_summary_report every 3 seconds.
progress = DIV(_id="progress")
wrapper = DIV(progress,_style="width:400px;")
summary_report_show = DIV(_id="summary_report_show")
@auth.requires_login()
def summary_report_display():
page.include("http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/jquery-ui.min.js")
page.include("http://ajax.googleapis.com/ajax/libs/jqueryui/1.7.2/themes/ui-darkness/jquery-ui.css")
callback = js.call_function(get_progress_on_summary_report)
page.ready(jq(progress).progressbar( dict(value=request.now.second) )() )
page.ready(js.timer(callback,3000))
return dict(wrapper=wrapper)
The get_progress_on_summary_report function in the web2py controller does most of the work.
def get_progress_on_summary_report():
if session.summary_break==0:
queries=[]
#base query includes all walkthroughs
queries.append(db.t_walkthrough.id>0)
#build the query list based on user selections
#if user hasn't selected 'All' for an individual search/filter term, then add to the query
if session.school_select!='All':
queries.append(db.t_walkthrough.school==int(session.school_select))
if session.semester_select!='All':
queries.append(db.t_walkthrough.semester==session.semester_select)
if session.year_select!='All':
queries.append(db.t_walkthrough.f_year==session.year_select)
if session.class_select!='All':
queries.append(db.t_walkthrough.t_class==int(session.class_select))
if session.teacher_select!='All':
queries.append(db.t_walkthrough.teacher==int(session.teacher_select))
if session.observer_select!='All':
queries.append(db.t_walkthrough.observer==int(session.observer_select))
#handle start / end date if user entered same
if len(session.datestart)>0:
date_object = datetime.datetime.strptime(str(session.datestart), '%Y-%m-%d')
queries.append(db.t_walkthrough.f_date>=date_object)
if len(session.dateend)>0:
date_objectEnd = datetime.datetime.strptime(str(session.dateend), '%Y-%m-%d')
queries.append(db.t_walkthrough.f_date
#create query from list
summary_report_query = reduce(lambda a,b:(a&b),queries)
rows = db(summary_report_query).select(limitby=(session.summary_i*session.summary_m,(session.summary_i+1)*session.summary_m))
for r in rows:
# do something with r
session.summary_overallcount+=1
list_walkthrough_result=db(db.t_walkthrough_result.walkthrough==r.id).select()
for w_record_result in list_walkthrough_result:
if not type(w_record_result.f_result) is NoneType:
if w_record_result.f_result==None:
#do nothing
testvalue='none'
else:
if w_record_result.f_result in session.dictGraphVariables:
session.dictGraphVariables[w_record_result.f_result]+=1.0
else:
session.dictGraphVariables[w_record_result.f_result]=1.0
if len(rows)
<a href="summary_reports_complete">Click Here to View Graphs</a>'
else:
return_value='Report generation complete - there are ' + str(session.summary_overallcount) + ' results.
'
return jq(summary_report_show).html(return_value)()
else:
return
The core of the the iterative process is the query below, it pulls a chunk of rows (size defined by summary_m) each cycle.
rows = db(summary_report_query).select(limitby=(session.summary_i*session.summary_m,(session.summary_i+1)*session.summary_m))
This performs our operation in chunks, once there are no more rows the function changes state and then returns a link to the summary_report_show view page which will parse the results and present them to the user, if no results are produced then the operation tells the user.
This approach will let you perform queries or db updates against an very large set of records without triggering the timeout.