Speeding up large tables without serverside processing.

Speeding up large tables without serverside processing.

ThefinaleofseemThefinaleofseem Posts: 11Questions: 0Answers: 0
edited August 2013 in DataTables 1.9
I've been involved in a project that has made use of Datatables 1.9.4. We've meshed it together with Angularjs and have been happy with the results. However, the datasets being loaded into it have continued to grow to the point where it's getting a bit ugly. We're looking at close to 10k records in 10-15 columns in some cases, maybe even more.

As you can imagine, that has bogged down the loading of tables pretty badly, sometimes requiring 2-3 clicks on the "slow script" warning until it loads. I've made a few tweaks, such as enabling deferred rendering, removing the bSortClasses, and going through the whole thing with the Firefox JS profiler to fix some slow functions in our formatting code. It has certainly helped, but larger tables are still bringing up the slow script warning. The profiler shows the vast majority of the time is being spent in jQuery, but it's difficult to tell exactly why or how.

Server-side processing might be viable down the road, but would take far to much work and the gutting of much of the work that has been done until now. It's not going to be viable for some time. I played with setTimeout() and adding data to the table a bit at a time after loading an initial chunk, ie, if this table will have 500+ rows, load up the first 500, render the table with them, and then use setTimeout to load up the remainder in 100 row chunks to avoid blocking the browser. This sort of worked, but adding rows after the table is built is quite slow compared to simply building it outright and the browser was chugging a bit between additions. I'm wondering if I can work around this a bit by using a Web Worker, but doing so without seriously overhauling the implementation doesn't seem possible.

Are there any further tips to speed this thing up?

For a bit of additional detail, here are the the settings and such we're using (I can't post code as there's simply too much around it):

Columns/ColumnDefs/Column data: Loaded in from preprocessed JSON
Sorting: Tables have default sorting, tried presorting with the setTimeout, still a bit chuggy
bJQueryUI: true
bAutoWidth: false
bSortClasses: false
bDeferRender: true
sPagingationType: "full_numbers"
iDisplayLength: 20
oLanguage: ...
oTableTools {...}
fnDrawCallback: function() { //a few checks and tweaks depending on the table shown}
fnRowCallback: function() {//Styles applied depending on data to conditionally include/exclude some rows after the table is built, ie, a checkbox to hide certain rows}

There are some additional methods afterward to insert custom elements and styles as well once those are set.

Replies

  • allanallan Posts: 63,498Questions: 1Answers: 10,470 Site admin
    Depending on what you have in fnRowCallback and fnDrawCallback, they can sometimes be very slow. However, ultimately, yes a web-worker probably is a good approach to adding chucks of data. Having said that 10k records shouldn't bog it down too much. This example has 50'000 rows and is quite fast: http://datatables.net/release-datatables/extras/Scroller/large_js_source.html .

    So I'm not too sure what is slowing your table down.

    Allan
  • ThefinaleofseemThefinaleofseem Posts: 11Questions: 0Answers: 0
    edited August 2013
    It depends on the browser version, really. Modern browsers don't do too bad, but we also have to support some older ones as well (thankfully, not quite so old that they won't support a web worker), which is generally where we're seeing the slow script warnings. Some of these tables are close to 20 columns, too.

    In this table, fnDrawCallback only executes if a particular table is selected, so it's not especially relevant. fnRowcallback is much the same. The only thing these would do in many cases is an equality check against a string.

    There are a number of post-construction actions (attach listeners, classes, etc, custom ui element to the header perhaps, add per-column filters, etc), but these don't seem to be where the worst of it is going. Commenting them out does little to help. It should also be noted that the table is not built when the page is loaded in most cases. Usually the table is empty until the user clicks on an element, which will then destroy the existing table and create a new one with the data. This is done whenever a user clicks on an element to show a new table.

    I'm still tossing the web worker around in my head, but I'm still not sure how it could work out unless I want to build the entire table in a separate thread and keep a loading indicator up until it's complete, then pass it back and insert it. It would keep the browser from blocking, at least. Adding data in chunks to an existing table is pretty slow (definitely slower than building and formatting rows prior to insertion), and it seems to get worse as the table gets larger. I don't know if a web worker can get around that without pulling the entire table into a new thread, and then the question remains of how to deal with that...
  • allanallan Posts: 63,498Questions: 1Answers: 10,470 Site admin
    Out of interest, could you try the 1.10-pre development version of DataTables and see if that helps you at all? There are a number of changes that will speed performance up (and also likely a couple which will slow it back down - but overall it should be an increase!). I expect the DOM interaction is the slowest part of the table though (you don't have any custom filters or sorting or anything do you?).

    1.10-pre is here: https://github.com/DataTables/DataTables/tree/1_10_wip/media/js

    Allan
  • ThefinaleofseemThefinaleofseem Posts: 11Questions: 0Answers: 0
    The table has a few default sort columns, but the difference in times between those and presorting the data before building the table is built is nonexistent. We're not using custom sort types. A few tables have default filters, but not all of them, and that doesn't seem to make a great deal of difference, either. Again, they're pretty vanilla filters based on a class attached during construction. Tables without this filtering still stall out on giant datasets. The profiler shows the vast majority of time being eaten up within jQuery.

    I tried the dev version. There's a minor speed bump, but just from the quick check I did, I don't think it goes very far into the double-digit percentage range. Definitely helpful, but still not the full picture. I suppose that comes with the territory when dealing with older and slower Javascript engines. I'll have to do some more tinkering with a web worker to see if I can figure out how to break this thing up.
  • allanallan Posts: 63,498Questions: 1Answers: 10,470 Site admin
    It would be interesting to know why it is spending a lot of time in jQuery. DataTables is obviously a jQuery plug-in but I've tried to optimise it where possible in its use of DataTables.

    I think I'd need a link to the page to be able to offer any more help.

    Thanks,
    Allan
  • ThefinaleofseemThefinaleofseem Posts: 11Questions: 0Answers: 0
    edited August 2013
    Unfortunately, I don't have a page I can link to with this. However, after going through profilers with the unminified versions of everything I think I may have found a potential culprit. Bear in mind that these appear to be pretty negligible on a modern browser (A 5k+ table will render in less than 1.5 seconds), but can bog down older ones:

    _fnBuildSearchRow: This will strip out any HTML when constructing the search array. However, you could potentially have HTML in every row of a table. 5k+ calls to jQuery's .html() function can add up. Commenting this bit out improved rendering speed by about 20% or so. Many tables may have HTML in every row, but the original dataset may not have that HTML formatting, ie, aoColumnDefs/aoColumns is used to add the HTML. Would it be possible to pass a separate dataset for the search array? Doing so and excluding any HTML-adding functions could considerably speed things up. Maybe this option already exists and I've missed it.

    In addition, I'm seeing quite a bit of jQuery.extend() in the profiler as part of the _fnAddData() method. This is taking a bit over 10% of the overall time (along with another .extend() in that each() loop at line 6366 that's eating 10% of the time, but I can't exactly pinpoint where). I'm looking through the source and it appears that the data is being copied so that it can be manipulated and eventually placed into the table without disturbing the original dataset. However, in my particular case, I don't believe that any data is changed. The formatting functions we're using don't directly change the dataset, but return new data based on the original dataset. I suppose it would depend on whether or not there's a clear enough use case to justify it, but would it be possible to simply draw directly from the original dataset without copying the data, perhaps set with a flag or with some checks beforehand?

    I could be completely off the rails with these as I am not hugely familiar with the Datatables source and I'm not an expert in Javascript, but I figured they might be worth mentioning. This sort of thing may also be relatively unique to the specific project I'm working on, too.

    Another possibility that springs to mind is to break up the table initialization/construction in the Datatables plugin itself with web workers. I would think that there are a number of processes that could be run in parallel. Again, that's just me speculating.
  • allanallan Posts: 63,498Questions: 1Answers: 10,470 Site admin
    edited August 2013
    - _fnBuildSearchRow - this whole function has been removed in 1.10, but there is still an HTML decoder present. It now uses a statically created `div` so it won't need to create a new element each time, which will save a bit of time, but it still does an HTML write and then read. I'm not aware of a faster way to decode HTML entities in Javascript unfortunately.

    > Would it be possible to pass a separate dataset for the search array?

    Yes - you can use mRender / mData to send back filter specific data: http://datatables.net/blog/Orthogonal_data

    - .extend() in _fnAddData - the extend has been completely removed in 1.10 and not replaced - the original data is used and not copied as it was in 1.9-.

    - Web-workers - yes it is an option, but since most people's browser's don't have support for web-workers, its unlikely to be in DataTables any time soon.

    Allan
  • allanallan Posts: 63,498Questions: 1Answers: 10,470 Site admin
    edited August 2013
    Following on from the HTML decode aspect, I've been looking at this a little bit more. I'm still not aware of any way other then running it through a DOM element (a regex I think it nigh on impossible (*edit* - talking rubbish - see below), since <> can be valid in the search string, but tag stripping would typically remove them. So named entities and numeric entities make it a bit more difficult).

    I've put this little text case together to see what the best way is: http://jsperf.com/html-decode . It currently only uses two methods, jQuery and direct DOM. Obviously the direct DOM is much faster. I've just committed that change into DataTables 1.10.

    Allan
  • allanallan Posts: 63,498Questions: 1Answers: 10,470 Site admin
    A bit more on this topic... :-).

    I've added to option options to the jsperf test case:

    3. Using a bit of regex to remove tags and then decoding via a text box. This is massively faster than DOM / jQuery in Chrome / Safari, and a bit faster in Firefox. It is faster in IE also, but not my much.

    4. Using pure regex - a function to decode numbers and another to decode entities. This is very fast in all browsers.

    So putting option 4 in is very tempting, but there are 252 named HTML entities (according to wikipedia - http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references ). That would be bad news to have defined in the DataTables source since it takes up a fair amount of space.

    I guess one option would be to have a list of some known entities (the common ones) and then fall back to another method if an unknown entity is found (probably just using jQuery in that case since the code required is tiny).

    Need to think about this a bit more before committing it in though.

    Any thoughts on the subject are very welcome!

    Regards,
    Allan
  • ThefinaleofseemThefinaleofseem Posts: 11Questions: 0Answers: 0
    Gah, I didn't make the connection between the type variable in the formatting functions and the search array. Applying a check for sort or filter just solved that problem! It still gets a few errant results with "&" in them, but it's much improved! As a side benefit, this can also work around a few formatting functions that might be a tad slow themselves as the function may only need to return a few relevant bits of text.

    I would think that with the considerable speed benefit of regex, keeping a certain number of more common entities would definitely be preferable. One or two dozen will do little to affect the size of a minified plugin. That's just one person's opinion, though.

    Thanks for all the help! Between this and the search classes and deferred rendering, the display of the tables has been dramatically improved. It's still a few seconds on large ones, but I think the slow script warning might just go away completely at this point.
This discussion has been closed.