11K rows, 83 columns, many joins, export as excel/csv over the edge
11K rows, 83 columns, many joins, export as excel/csv over the edge
My main table seems to have grown beyond the reach of the current way to dump the whole table to a spreadsheet as the client needs for their weekly report.
Increasing memory limits in PHP, execution time in apache to 10 minutes has worked for a few increases in the size of the table, but no longer works.
Am using a lengthMenu item of up to 15,000 items to draw the whole table in on DT in the browser and export to XLS to generate the text CSV with all joined sub tables exploded as text.
Seems that Join.php is the culprit, a large dataset being the trigger.
Server is Amazon ec2 t2-micro, 1Gb RAM, Bitnami HHVM
php.ini: memory_limit = 768M
apache via php-fpm-apache.conf has increased script execution time
<Proxy "unix:/opt/bitnami/php/var/run/www.sock|fcgi://www-fpm" timeout=600>
The program "top" shows the used memory run very close to 1Gb when the request to draw the whole table executes.
Even at 10 minutes I still get
[proxy_fcgi:error] [pid 8086:tid 139767471523584] (70007)The timeout specified has expired:
which is seen as a 503 error page to the user and invalid json.
So I don't think increasing the execution time past 10 minutes will help the server any.
Looking for some ideas of how to get the server side csv generation to succeed, is it simply a matter of needing more RAM ?
thanks.
Replies
I will send a link to the debug data via email.
Hi,
Thanks for the details and e-mail. The issue here, as you say, is that the database is getting a little too large to easily dump out to the client-side with the PHP libraries for Editor.
The
Join
class is always going to be the one that will run out of memory first due to how it operates (rather than having the SQL server perform the join, for portability between the various SQL engines that the libraries support it pulls the information from the database and does the join in PHP. That of course costs memory - I hadn't expected it to be quite as few as 15k records (although you do have a lot of columns), but it depends on what the join information is and how many joins, etc.If possible, the leftJoins are the way to go since that is done in SQL but it only allows a single item, rather than the one-to-many relationship of the
Join
class.I think this is one where the pre-built libraries are going to be pushed beyond what they were designed for - indeed, creating a 15k row Excel file on the client-side isn't going to be fast either.
It is around this point that I think you'll need to start looking into generating the Excel / CSV file on the server and providing a way for the client-side to download it. You could send the parameters from the last server-side processing draw request so the file could be created with the same sorting etc if that is required.
Even if we were to find a way to improve the Join memory performance (I will take a look at this), sooner rather than later I think you'll just hit exactly the same issue as the data continues to expand. For server-side processing tables, this really needs a server-side solution.
Does that sound do-able?
Allan
Thanks Allan,
Not quite sure how to build what you are proposing - capturing the query that the server is attempting for the operation that is timing out, and convert it to a leftJoin based query that will execute directly by mysql and get it to export the csv via that direct query ? I could have a button that executes this query and generates the csv file on the server for download later ?
So it would be a "dump all" button that performed the export on the server.
A server only solution would be fine, just not quite sure how to do the `generate the query' bit.
thanks.
I have created a sql query that is the same as the big join the table would be doing, and that stalls mysql as well. So I might just be showing my poorly growing db schema.
The query has joins to 28 tables for 28 of the 83 columns, so that will hurt memory for sure. This is not DT territory, but would rewriting as left joins be more efficient enough?
Just to quickly update the conversation here so it doesn't look truncated - we talked by e-mail and it sounds like left join rewrite of the query was spot on. "Blindingly fast" :-)
In general if you have a one-to-one relationship then left join is the way forward for optimal performance.
Regards,
Allan
Yes just to complete the thread.
I moved from using the browser to generate the CSV file - by getting the whole table to be drawn in one browser window (which ended up using Join.php in the PHP backend libraries) to a custom php script that used a hand coded equivalent sql query with 'left join on' and a loop with fputcsv to handle special characters generate the CSV on the fly, and it was very fast indeed - an instant download of the whole table.