Tips For Using DataTables with VERY Large Data Sets

Tips For Using DataTables with VERY Large Data Sets

elite-robelite-rob Posts: 26Questions: 0Answers: 0
edited February 2012 in General
Hello All,

DataTables is amazing... we all know that. :)

But, my developers are telling me they are running into some performance issues when using DataTables with some really large data sets (> 100,000).

We generally use the "Infinite Scrolling" function, but apparently that just isn't working too well.

My gut tells me there are people in this forum who have used DataTables with really large data sets, so I'm wondering if anyone has any advice, tips, tricks, best practices or a general push in the right direction with regards to something we should try.

In case it makes a difference, the application is in .Net and uses MS SQL as the DB.

Any insight is greatly appreciated!

I really need our application to be blazing fast so anyone we can get this working with the "need for speed" would be great!

Cheers! Thanks in advance!

Replies

  • allanallan Posts: 63,439Questions: 1Answers: 10,459 Site admin
    edited February 2012
    With really large data sets it is worth considering server-side processing: http://datatables.net/release-datatables/examples/data_sources/server_side.html . With this method, since all the processing is done on the server by an SQL engine which is designed for exactly this sort of thing (unlike Javascript!) it is super fast and can cope with massive data sets (I've seen DataTables using 20 million+ records). The tradeoff is that you need an XHR request for each table draw, which might or might not be acceptable given your setup.

    I'd also, personally, tend to avoid infinite scrolling - I don't like it as an interface paradigm for a table - personally I much prefer Scroller: http://datatables.net/extras/scroller/ . That can be used with large data sets with data being loaded by Ajax and deferred rendering enabled.

    Also turn off bSortClasses as that adds a lot of DOM overhead to large tables.

    Allan
  • elite-robelite-rob Posts: 26Questions: 0Answers: 0
    Thanks Allan!

    As always, your help is so greatly appreciated.

    I've passed your note along to my developers and I hope this is exactly what we needed!

    Cheers!
  • elite-robelite-rob Posts: 26Questions: 0Answers: 0
    Hi Allan,

    I just had a long meeting with my development team because they are a little stuck and despite their very best efforts, they can't seem to figure it out.

    As per your suggestion (because we are dealing with large data sets) they successfully implemented Server-Side Processing using the documentation (http://datatables.net/usage/server-side) and example (http://datatables.net/release-datatables/examples/data_sources/server_side.html).

    That is all working well.

    The issue we run into is when we try to combine Server-Side Processing with Scroller (http://datatables.net/extras/scroller/).
    [Previously we were trying Infinite Scrolling (http://datatables.net/release-datatables/examples/basic_init/scroll_y_infinite.html) but switched our focus to "scroller" based on your previous post.]

    My developers are basically saying that with Scroller, it seems like they have to pre-populate the data into a text file. This way as it scrolls it pulls data from the text file.

    But, with "server-side processing" it is not generating an actual file, it is simply pulling from the DB as the data is required.

    If we use a paging function (with previous and next buttons) then they can get this to work. This is because on-click, it triggers the server-side processing script to go get the next batch of data.

    The core issue is that we cannot figure out how to trigger the server-side processing without a paging function. No matter what we try, we cannot seem to get server-side processing to load the next set of data "on SCROLL" as opposed to "on CLICK".

    Is it possible that server-side processing and scroller just can't be mixed?

    Or, are we just missing something?

    We've certainly been trying our best but if you have any ideas or can give us a push in the right direction, that would be great.

    As always, I appreciate your help!

    Cheers!
  • allanallan Posts: 63,439Questions: 1Answers: 10,459 Site admin
    Have you looked at this example: http://datatables.net/release-datatables/extras/Scroller/server-side_processing.html - Scroller and server-side processing should play nicely together. In fact there should be no change needed from a non-Scroller table - just enable Scroller.

    Allan
  • elite-robelite-rob Posts: 26Questions: 0Answers: 0
    Hi Allan,

    Thanks for the super fast response. As always, you are the *best*! =)

    I'm not sure if my developers have seen that example, but I'm shooting it over to them right now.

    It really does seem like there isn't much of a change to combine Scroller and Server-Side Processing, so I wonder what roadblock they were hitting.

    I did want to ask you one other thing that I'm still a bit confused about...

    My understanding is that with Server-Side Processing, it doesn't have to generate the entire report in full on the server-side but rather can pass the data to the user's browser as needed.

    I thought this was the reason we get a big speed boost... so instead of having to require the server to generate the full list of data for 100,000 rows, it begins to generate the data, but as soon as it gets to 25 rows, it starts passing that to the user's browser (in DataTables).

    If that is the case, then what I don't understand is how in this example (http://datatables.net/release-datatables/extras/Scroller/server-side_processing.html) it knows that there are a total of 10,000 entries.

    Conceivably the server is still compiling the data even as it's passing information to DataTables, so it wouldn't actually know the total amount of data in the data set.

    Am I thinking about this the wrong way?
    Does the server actually have to generate the FULL report prior to starting to send information to DataTables?

    Thanks again...

    Cheers!
  • allanallan Posts: 63,439Questions: 1Answers: 10,459 Site admin
    > what I don't understand is how in this example it knows that there are a total of 10,000 entries.

    Because of the iTotalRecords and iTotalDisplayRecords parameters reported back by the server. Have a look at: http://datatables.net/usage/server-side . Also this one: http://datatables.net/usage/#data_sources .

    > it begins to generate the data, but as soon as it gets to 25 rows, it starts passing that to the user's browser (in DataTables).

    Is this data in a static file, or in a DB?

    Allan
  • elite-robelite-rob Posts: 26Questions: 0Answers: 0
    Hi Allan,

    Thank you for your continued assistance. I know I always say it, but your help is greatly appreciated!

    I've seen "iTotalRecords" and "iTotalDisplayRecords" in the documentation, so I'm familiar with those.

    To answer your other question, the data is in a DB, and NOT in a static file.

    My developers did suggest that we first query the DB to create a static file, and then let DataTables pull (using server-side processing) from that file. The issue with that is sometimes we have big data sets, and then we have to wait for our server to first build the static file, and then wait again for the data to appear inside DataTables. So, it ultimately results in a lot of 'waiting'.

    That's why we were hoping to somehow pull the data directly from the DB and feed it to the user's browser even while the DB is still crunching out the remainder of the data set. This way we cut out the whole step of having to generate the whole file first, so the user can start to see data a lot faster.

    Truthfully, I'm open to any ideas you may have. I just know that we want to put our best foot forward on this so that the data gets into the user's hands (via DataTables) as quickly as possible.

    Thanks in advance!
  • allanallan Posts: 63,439Questions: 1Answers: 10,459 Site admin
    > even while the DB is still crunching out the remainder of the data set

    This is the bit that I don't really get. What would the DB be doing after it has fulfilled the request from the client? Each request should only be getting 10, 25 (or whatever) rows at a time. 100'000 isn't that huge to be honest, I've seen DataTables with server-side processing working with 20 million records no problem at all (although obviously the complexity of the query will effect the performance).

    Allan
  • elite-robelite-rob Posts: 26Questions: 0Answers: 0
    Hi Allan,

    Sorry for the confusion...

    Let me try to elaborate, but please do keep in mind that I'm certainly open to all feedback and suggestions about our process.

    The thinking was as follows:

    We run a query on the DB that will generate 100,000 rows of data
    (In order to run this query in full, it's going to take 60 seconds)

    Instead of waiting the full 60 seconds for the DB to compile all the data (and therefore making the customer wait with a 'loading' screen) we wanted to start sending data to DataTables even while the DB is still compiling everything.

    So, after 3 seconds, when the DB has compiled 500 rows of data, we can immediately send that to DataTables so it starts displaying. This way the user can begin to see the first bits of data, even while our DB is still preparing the rest of the results.

    When we do this, however, we do not necessarily know the total number of records that will be compiled in the DB query so we cannot pass that value to DataTables. (This is because we're starting to show data before the DB has finished running the query.)

    Does that make sense?
    Or, if my thinking sort of flawed?

    Do we need the DB to finish running the query in full and only then start passing data to DataTables using the server-side method and Scroller like you showed me here: http://datatables.net/release-datatables/extras/Scroller/server-side_processing.html

    This way we know the total number of records before anything starts.

    Let me know... thanks!
  • allanallan Posts: 63,439Questions: 1Answers: 10,459 Site admin
    > We run a query on the DB that will generate 100,000 rows of data

    Do you run that query for each user, each time the load a page? What is that query needed for - is it combining multiple sources, or pulling information from a log file or something?

    The problem with not waiting for your DB table to be ready is that it is going to be working with incomplete information - i.e. a filter or a sort might not provide the expected results.

    Allan
  • elite-robelite-rob Posts: 26Questions: 0Answers: 0
    Hi Allan,

    Sorry for the slow reply. I've been trapped in meetings all day.

    When an end-user loads the page containing datables, we query our database to compile the requested report. Sometimes the reports get pretty big and can take a while to build completely.

    That's why I was asking about sending information to DataTables even as the report is being generated by the DB. This way the user wouldn't get stuck waiting for the DB to complete processing the data set and then for the data to be sent to DataTables.

    But, after reading your comment:
    [quote]The problem with not waiting for your DB table to be ready is that it is going to be working with incomplete information - i.e. a filter or a sort might not provide the expected results.[/quote]

    I do understand why any effort to jump the gun and display data before we have the whole data set could pose serious problems because we won't know the total row count, be able to perform proper searching, sorting, etc.

    So, I guess the best move is to try to optimize our DB query as much as possible so it loads the data set quickly. Then, use the Server-Side Processing functionality of DataTables so that the heavy lifting is handled by our MS SQL server and not the end-user's browser.

    Does that seem like the best plan and implementation?
    Sometimes our reports only have a couple hundred rows and other times it can be a couple hundred thousand.... from everything I read on your site, it seems like for the bigger data sets, doing the server side processing will really make the whole thing run better (without the risk of crashing the user's browser).

    Assuming that makes sense and my developers can make server-side processing work, then after that we'll try to enhance the UI with scroller (http://datatables.net/release-datatables/extras/Scroller/server-side_processing.html).

    I know I always say it, but thank you so much for your help! Hopefully you saw my donation, since I didn't like seeing $0 for this week. I'll for sure donate more in Q2 because I want to help this project continue any way I can.

    Cheers!
  • allanallan Posts: 63,439Questions: 1Answers: 10,459 Site admin
    Yes, I think that sounds like the way I would suggest doing it - it might be that you can use triggers etc to update the data set based on other events in the database, so it can update its data then, and perform super fast lookups when then required by the user viewing a DataTable. The more you can optimise the read that DataTables does, the faster it will be for the end user.

    Thanks very much for the donation - hugely appreciated :-)

    Allan
This discussion has been closed.