Server-side record count & pagination performance with large datasets

Server-side record count & pagination performance with large datasets

usedlensusedlens Posts: 15Questions: 5Answers: 1
edited August 2020 in Free community support

OnKeyUp Search and pagination is extremely fast here https://usedlens.co.uk using MariaDB

But instead of a straight 55,000 records I'm going to have 1,000,000 records and join to another table with 50,000,000 records.

I've only just started this expansion, I'm already noticing performance issues.

I have three queries

  1. Get results limit 10/25/100 - scaling well so far
  2. Get count of filter results - gets faster as search refines, but not great
  3. Get count of all results (a large subset of the table that the user is searching) - doesn't scale well

I wonder how people have handled large data sets / pagination and record counts.

I see the example on the site has 5,000,000 records with scrolling
https://datatables.net/extensions/scroller/examples/initialisation/server-side_processing.html

Answers

  • allanallan Posts: 63,213Questions: 1Answers: 10,415 Site admin

    That particular example is a bit misleading in terms of performance for getting the data count since it is just using a client-side function to generate the data if you look at the code.

    That said, I have seen DataTables used with 20 million + records. It all comes down to how well the database can perform the required count. In MariaDB I thought 3 would actually be a really fast operation since it stores information about the row count, which some other databases do not. What query are you using to perform that count?

    Allan

  • usedlensusedlens Posts: 15Questions: 5Answers: 1
    edited August 2020

    Hi @allan

    The "total count of records" is not based on the entire table record set but a subsection determined by the country you've selected and whether you're showing local or international results.

    Here's an example of the three queries in order (dynamic options would change things)

    WITH productids AS (select t2.productid from ul_product t2 where ( productorigin = ? ) ORDER BY productname asc )
    SELECT t1.*, retailername, retailersite, retailervarname, TIMESTAMPDIFF(HOUR, productdate, ?) as timeinhours, countryname, countryisotwo, region
    FROM ul_product t1
    INNER JOIN ul_country ON t1.productorigin = ul_country.countryid
    INNER JOIN ul_retailer ON t1.retailerid = ul_retailer.retailerid
    WHERE 1=1
    AND productid IN (select productid from productids)
    AND replace(concat_ws(' ', productname, productkeywords), '/', '') REGEXP '(\bleica)|(leica\b)'
    LIMIT 99 OFFSET 0

    WITH productids AS (select t2.productid from ul_product t2 where ( productorigin = ? ) )
    SELECT COUNT(*) as total
    FROM ul_product t1
    INNER JOIN ul_country ON t1.productorigin = ul_country.countryid
    INNER JOIN ul_retailer ON t1.retailerid = ul_retailer.retailerid
    WHERE 1=1
    AND productid IN (select productid from productids)
    AND replace(concat_ws(' ', productname, productkeywords), '/', '') REGEXP '(\bleica)|(leica\b)'

    WITH productids AS (select t2.productid from ul_product t2 where ( productorigin = ? ) ) SELECT COUNT(*) as total
    FROM ul_product t1
    WHERE 1=1
    AND productid IN (select productid from productids)

    I was trialing the scolling implementation, which I quite like - but having a filtered count is still important to it's functionality.

  • usedlensusedlens Posts: 15Questions: 5Answers: 1

    I could actually find ways to cache the third query, but not the second one - there's too many variations as you type search terms.

    If the final table size is reasonable I could consider create a memory table out of ul_product for use on the front end to speed it up.

  • allanallan Posts: 63,213Questions: 1Answers: 10,415 Site admin

    What I'd suggest is doing the same as in this post - enable debug mode (.debug(true) in PHP, since that thread is about .NET) and then EXPLAIN the queries that are being run. Hopefully it will indicate where an index might make all the difference.

    Allan

  • usedlensusedlens Posts: 15Questions: 5Answers: 1

    For debugging I go direct to my data.cfc and get it through ColdFusion.

    I'm feeling hopeful anyway, I made some SQL improvements this morning.

    Query 1) The ordering didn't work in this position, then indexed the one of the sorting columns that was a performance issue
    Query 2) WITH works great for the first query, but not for the count in the second. Updated second count query to just use joins. (For certain queries this went from 4 seconds to 100ms)
    Query 3) Also removed the WITH and included as joins, also cached the 'total' query count for a minute.

This discussion has been closed.