De-duplicating entries from JS-sourced data

De-duplicating entries from JS-sourced data

GregPGregP Posts: 500Questions: 10Answers: 0

Context: I have a web worker that "subscribes" to data updates from the back end via a websocket, which are then published to a view rendered via DataTables.net To increase efficiency, added data is pushed down the websocket as a partial record (ie. one row only).

Since the data is decoupled from the DataTable (ie. it's not using the Ajax API; it's basically a JavaScript source), I cannot just add the row and render the table. In the event that two pushes of the same row are triggered, I end up with duplicate rows.

So, I whipped up a little plugin:

$.fn.dataTable.Api.register('deDupe', function (dataSet, idColumn) {
  idColumn = idColumn || 0;
  var columnData = this.columns().data()[idColumn];

  for(var i=0; i < dataSet.length; i++) {
    var thisId = dataSet[i].id;
    var matchedIndex = columnData.indexOf(thisId);
    if(matchedIndex > -1) {
      var matchedRow = $(this.rows().nodes()[matchedIndex]);
      this.row(matchedRow).remove();
    }
  }

  // return "this" for chaining
  return this;
});

(Don't laugh at me for taking comfort in old fashioned "for" loops!)

And then in usage, I do this:

myTable.deDupe(dataSet).rows.add(dataSet).draw();

The method that receives the data and then updates the table already has the incoming row, which is contained in "dataSet", and then I pass a COLUMN index for the column which is meant to contain the unique ID for de-duping purposes. Note that in code execution, rather than dropping an incoming duplicate, I am removing the previous row, then adding the incoming one. There are cases where the row contains updated data, and I really only care about de-duping at the ID level, not the contents level... always just blindly use the latest data.

So a few questions:

  1. Is there a better way of doing this? I would've loved to see a core option, but I didn't come across one. It seems like relatively common functionality. Am I crazy here, re-inventing the wheel? (in other words, an initialization option?)

  2. In the conditional checking if there's a match, you see this line: var matchedRow = $(this.rows().nodes()[matchedIndex]); This strikes me as wacky, but I couldn't get my head around the other ways of identifying a row to be used in the .remove() chain. In my mind, I already know the index of the row, so I thought something like matchedRow = this.rows(matchedIndex) would work, but I can't find that or something similar. So not only am I digging into rows().nodes() but then the whole thing is wrapped up in jQuery before then being passed into .row(matchedRow)'. It seems so heavily nested and wrapped for something I feel like I should already have direct access to.

Don't get me wrong; the code "works" and fixing it is probably a micro-optimization. But it strikes me as awfully convoluted and gives me some code smell, which I want to avoid for the sake of maintenance developers.

Answers

  • kthorngrenkthorngren Posts: 21,303Questions: 26Answers: 4,947

    You might be able to achieve the same by using the filter() api. This simple example shows getting the row indexes of the matching rows using `-api filter(). It then removes the row. In this case the match is looking for "Ashton Cox".
    http://live.datatables.net/gimefafe/1/edit

    Kevin

  • GregPGregP Posts: 500Questions: 10Answers: 0

    I think you're right that there's probably a solution to be found somewhere inside filter. I need to have a function that accepts the current API instance as well as the incoming data, then filter out any existing rows based on the incoming data, then add the incoming data.

    I think the sample code also addresses the jQuery-wrapped 'nodes[index]' line. Will have to dig a bit. Thanks for your input!

This discussion has been closed.