Introduction
This starts with a bug report from a client that relates to an API that returns a list of the content in their account. They complained that because they have a large number of pieces of content the performance of the API is quite poor, taking at least 20 seconds to serve each request.
First impressions
After confirming the problem (it was horribly slow) I jumped into the code. Below is a simplified example of how the code worked, with comments to help explain what each part does.
def get_content_api(limit=1000): # Fetch all data from the database content = get_all_content() # Remove hidden content content = filter(lambda c: not c["hidden"], content) # Apply the result size limit content = content[:limit] # Prepare the data ready to be returned for item in content: item["date"] = item["date"].isoformat() item["url"] = "%s/content/%s" % (get_domain(), item["id"]) # Return the data return JsonResponse(content)
There is an imediate problem that jumped out at me right away. The function get_all_content fetches all the content from the database for that client (in this case that is over 100,000 items) then it applies a filter and page limit reducing the number of items to return to just the first 1,000 items.
This is clearly a huge problem as we have to wait while the database fetches all this data and transfers it over the network to the application server only for 99% of it to be immediately thrown away. Databases are designed to do this kind of filtering efficiently using various complex data structures and algorithms so it should be doing this task. Happily it's a fairly quick fix.
Benchmarking
But the most important rule when it comes to performance optimisations is to benchmark the performance of the code first to ensure the performance improvements are applied to the overall cause of the problem. Even if the problem seems obvious.
Especially if the problem seems obvious.
Below is the results, showing how long was spent on each part of the code.
def get_content_api(limit=100): # Fetch all data from the database content = get_all_content() # Remove hidden content content = filter(lambda c: not c["hidden"], content) # Apply the result size limit content = content[:limit] # Prepare the data ready to be returned for item in content: item["date"] = item["date"].isoformat() item["url"] = "%s/content/%s" % (get_domain(), item["id"]) # Return the data return JsonResponse(content)
Much to my surprise get_all_content takes only 100ms to fetch 10,000+ items from the database. That could definitely be improved but clearly there is something else that needs to be looked at first.
The real problem
Once all the data had been fetched the code then did some simple data transformations on each item to prepare the data to be returned. Or as it turns out not that simple as it takes multiple seconds to run through all the items.
On further inspection the problem became apparent, the issue was the call to get_domain(). This function makes a call to the database to lookup the domain that the client uses so that it can build the "url" parameter. The amount of data it is fetching is tiny but the network latency multiplied by the number of items being processed was enough to completely ruin the performance.
Much to my joy this is probably the easiest performance enhancement I have ever made. The domain for each item is the same, so it only needs to called once!
def get_content_api(limit=1000): # Fetch all data from the database content = get_all_content() # Remove hidden content content = filter(lambda c: not c["hidden"], content) # Apply the result size limit content = content[:limit] # Prepare the data ready to be returned domain = get_domain() for item in content: item["date"] = item["date"].isoformat() item["url"] = "%s/content/%s" % (domain, item["id"]) # Return the data return JsonResponse(content)
With that small change performance went from 20 seconds to just 120 milliseconds.
Result: Another happy customer.
Conclusion
So the moral of the story? Always check you are applying the performance improvements to the right part of the code. No matter how sure you are.
And yes the API still fetches all the data and filters it in the code. As much as I'd like to go do it better there isn't the time to go around making performance improvements to save a few milliseconds when everyone is happy to wait 120ms.