De-duplicating entries from JS-sourced data
De-duplicating entries from JS-sourced data
Context: I have a web worker that "subscribes" to data updates from the back end via a websocket, which are then published to a view rendered via DataTables.net To increase efficiency, added data is pushed down the websocket as a partial record (ie. one row only).
Since the data is decoupled from the DataTable (ie. it's not using the Ajax API; it's basically a JavaScript source), I cannot just add the row and render the table. In the event that two pushes of the same row are triggered, I end up with duplicate rows.
So, I whipped up a little plugin:
$.fn.dataTable.Api.register('deDupe', function (dataSet, idColumn) {
idColumn = idColumn || 0;
var columnData = this.columns().data()[idColumn];
for(var i=0; i < dataSet.length; i++) {
var thisId = dataSet[i].id;
var matchedIndex = columnData.indexOf(thisId);
if(matchedIndex > -1) {
var matchedRow = $(this.rows().nodes()[matchedIndex]);
this.row(matchedRow).remove();
}
}
// return "this" for chaining
return this;
});
(Don't laugh at me for taking comfort in old fashioned "for" loops!)
And then in usage, I do this:
myTable.deDupe(dataSet).rows.add(dataSet).draw();
The method that receives the data and then updates the table already has the incoming row, which is contained in "dataSet", and then I pass a COLUMN index for the column which is meant to contain the unique ID for de-duping purposes. Note that in code execution, rather than dropping an incoming duplicate, I am removing the previous row, then adding the incoming one. There are cases where the row contains updated data, and I really only care about de-duping at the ID level, not the contents level... always just blindly use the latest data.
So a few questions:
Is there a better way of doing this? I would've loved to see a core option, but I didn't come across one. It seems like relatively common functionality. Am I crazy here, re-inventing the wheel? (in other words, an initialization option?)
In the conditional checking if there's a match, you see this line:
var matchedRow = $(this.rows().nodes()[matchedIndex]);
This strikes me as wacky, but I couldn't get my head around the other ways of identifying a row to be used in the.remove()
chain. In my mind, I already know the index of the row, so I thought something likematchedRow = this.rows(matchedIndex)
would work, but I can't find that or something similar. So not only am I digging intorows().nodes()
but then the whole thing is wrapped up in jQuery before then being passed into.row(matchedRow)
'. It seems so heavily nested and wrapped for something I feel like I should already have direct access to.
Don't get me wrong; the code "works" and fixing it is probably a micro-optimization. But it strikes me as awfully convoluted and gives me some code smell, which I want to avoid for the sake of maintenance developers.
Answers
You might be able to achieve the same by using the
filter()
api. This simple example shows getting the row indexes of the matching rows using `-api filter(). It then removes the row. In this case the match is looking for "Ashton Cox".http://live.datatables.net/gimefafe/1/edit
Kevin
I think you're right that there's probably a solution to be found somewhere inside filter. I need to have a function that accepts the current API instance as well as the incoming data, then filter out any existing rows based on the incoming data, then add the incoming data.
I think the sample code also addresses the jQuery-wrapped 'nodes[index]' line. Will have to dig a bit. Thanks for your input!