CRM strips out ASCII characters in Zip file creation leading to corrupt Excel file

CRM strips out ASCII characters in Zip file creation leading to corrupt Excel file

billmalcolmbillmalcolm Posts: 5Questions: 2Answers: 0

Test case:
Use the attached datables.min.js version named "datatables-no-ascii.js" and test an Excel download.
Attached version is using JSZip 3.10.1, DataTables 2.3.2, Buttons 3.2.4, Column visibility 3.2.4, HTML5 export 3.2.4, DateTime 1.5.6, FixedHeader 4.0.3, Responsive 3.0.5, SearchBuilder 1.8.3, SearchPanes 2.3.4, Select 3.0.1

Result:
Excel Download file is corrupted

Likely cause:
CRM application is stripping out non utf-8 encoded characters which are used to create zip file headers

Code section from the DataTables CDN version:
23: [ function (e, t, r) { "use strict"; (r.LOCAL_FILE_HEADER = "PK"), // control characters after PK (they're also stripped out in this preview, but if you // look at the js file, they are there ...
Example from our version after being sanitized:
(r.LOCAL_FILE_HEADER = "PK"), // after being sanitized, the control characters are removed causing a corrupt download

Suggested Solution:
Store these as byte arrays instead of string literals
Example (node.js):
r.LOCAL_FILE_HEADER = Buffer.from([0x50, 0x4B, 0x03, 0x04]); // PK[ETX][EOT]

Context:
We are bundling DataTables into a CDN we are already leveraging from our CRM and found that at some point, the CRM strips out all unescaped ASCII and non UTF-8 characters, leading to a corrupted Excel file. We do not have control over that part of the application and I could add that in myself, but would rather not have to re-add it every time we perform an upgrade.

Finally:
It looks like there were other small substitutions made that can be found by comparing this with the version supplied by the DataTables CDN. Not sure if these are affecting other areas, yet. I fully realize we can just use the DataTables CDN instead, but was hoping this may be an easy enough fix for us to continue the way we've been going. Thanks for all you do!

Answers

  • allanallan Posts: 64,918Questions: 1Answers: 10,751 Site admin

    Can you give me a link to the file with unescaped UTF-8 code points, or otherwise tell me how to load it? I've just had a look at this file and its downloaded equivalent and they contain:

    exports.LOCAL_FILE_HEADER = "PK\x03\x04";
    exports.CENTRAL_FILE_HEADER = "PK\x01\x02";
    exports.CENTRAL_DIRECTORY_END = "PK\x05\x06";
    exports.ZIP64_CENTRAL_DIRECTORY_LOCATOR = "PK\x06\x07";
    exports.ZIP64_CENTRAL_DIRECTORY_END = "PK\x06\x06";
    exports.DATA_DESCRIPTOR = "PK\x07\x08";
    

    I don't immediately see any unescaped multibyte character in the file, which I suspect is what is causing you problems.

    For info, this part of the code is from JSZip, an external library, which I use to get able to create the XLSX file (which is just a zip). If you are using Excel export, it will typically be included in datatables.js (unless you have an external reference for it).

    Allan

  • billmalcolmbillmalcolm Posts: 5Questions: 2Answers: 0

    Hi Allan!
    Thank you for such a quick response, and good call. I didn't realize that was part of a separate extension. This is really an issue on our end and it may be unreasonable to ask you to address it, but if you're curious....

    Here's a link to the file on our cdn.

    If you search for one of those consts, you'll see r.LOCAL_FILE_HEADER="PK" with the last two chars looking like these [][].

    I may be completely out of my depths here, so if I have mispoke or gotten something completely wrong, I apologize.

  • allanallan Posts: 64,918Questions: 1Answers: 10,751 Site admin

    Interesting - I rather suspect that is part of the minification process! The non-min file has ASCII only characters:

    The minified file however, used UTF-8 multi-byte characters:

    It will save a byte per character, which is why the minifier will be doing that. In fact, I've just found this issue on the topic. It looks like that's how CDNJS are serving it as well.

    It looks like I could modify my minification options to stop that from happening, but at the cost of a slightly larger file.

    If your CRM is stripping out multi-byte characters (which as I say, seems like an error to me), then one quick solution would be for you to load JSZip unminified and then just drop the jszip-3.10.1/ from the DataTables CDN url. That should work around the problem without loading all files non-minified.

    Allan

Sign In or Register to comment.