fuzzySearch on 44,000 rows by 3 columns table gets too slow to be usefull

fuzzySearch on 44,000 rows by 3 columns table gets too slow to be usefull

tiger1955tiger1955 Posts: 2Questions: 1Answers: 0

Link to Test: https://apis.zadapi.com/ICD10
Description of problem:

I have a table w/ 44,000 rows by 3 columns (1st column is 3-6 chars, and 2nd is upto about 80 or so chars, and 3rd column is either a single 4 letter word or blank.)

DataTables worth great for me, but I wanted to use fuzzySearch to enhance the hits esp when search filter typing contains some typos or inexact match. I have found that fuzzySearch is too slow (even when using threshold=0.5). Is there anything can be done to speed this up? Perhaps use some simpler algorithm than Damerau-Levenshtein algorithm or some other preprocessing step?

Finally, when fuzzySearch is used why is the table not automatically sorted by the 'similarity' column?

It can be tried at link above, type in 'breats' (rather than "breast" ... ), it is too slow...

Answers

  • colincolin Posts: 15,240Questions: 1Answers: 2,599

    Is there anything can be done to speed this up? Perhaps use some simpler algorithm than Damerau-Levenshtein algorithm or some other preprocessing step?

    Possibly, we'd be happy to take a push request if you're able to find a faster algorithm.

    Finally, when fuzzySearch is used why is the table not automatically sorted by the 'similarity' column?

    The sorting is either done when the table is initialised (order) or when the user clicks on the column - a column wouldn't be automatically selected for sorting because of a search.

    Colin

  • tiger1955tiger1955 Posts: 2Questions: 1Answers: 0

    As for sorting, when fuzzySearch is not enabled it is appropriate to just sort by default (natural table order), but when fuzzySearch is in use the sorting would be better if it automatically resorted to "similarity" column sorting. And then again, once fuzzySearch is disabled, sorting can return to default behavior. Just a small usability improvement, perhaps can be achieved if I just pick "order" to use similarity, I am not sure.

    For me w/ 44,000 rows fuzzySearch is proving slow, I will have to revisit it once I get basic usage done.

  • allanallan Posts: 63,831Questions: 1Answers: 10,518 Site admin

    Looks like it has been disabled in your example, but I'm not too surprised that it would be slow with that many rows due to the amount of additional processing involved. With that many rows it would be time to start thinking about server-side processing if your data set is going to continue growing.

    That said, if you were able to enable the fuzzy search on your example again, I could profile it and check that we are doing things as efficiently as we can.

    Allan

This discussion has been closed.