Excel/CSV export for long number-like strings (large numbers)

Excel/CSV export for long number-like strings (large numbers)

lisarushlisarush Posts: 102Questions: 17Answers: 0

We have had issues with large number-like values in export to Excel. Specifically, we have values which are all digits -- but they are really more a string than a number. When the export happens, these "numbers" are rounded by Excel and the data is changed. For example:
* 8335201430007628 gets changed to 8335201430007630 (rounding on last digit causes last 2 digits to change)
* 8335201430210032 gets changed to 8335201430210030

I modified the source code to handle our case as below. Thoughts?

in button.excelHtml5 action:

   if ( ! cell ) {
      // If number is large, treat it as a string so it won't get rounded by Excel.
      var treatNumberAsText = false;
      if ( row[i].match && $.trim(row[i]).match(/^-?\d{15,30}$/) ) {
         treatNumberAsText = true;
      }

      // Detect numbers...
      if ( !treatNumberAsText && ... ) {
      ...
   }

in buttons.csvHtml5:

DataTable.ext.buttons.csvHtml5 = {
   ...
   extension: '.txt',   // .csv (use .txt extension to force user to process column types on opening in Excel -- this prevents Excel from rounded large strings that look like numbers)
}

Answers

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    That's interesting - thanks for posting this! I'll experiment a little bit locally, but the Excel patch shouldn't be an issue. I would have assumed that Excel would use an IEEE 754 number, but perhaps that isn't the case...

    For the CSV one - rather than changing the default, it would probably be better to do:

    {
      extend: 'csv',
      extension: '.txt'
    }
    

    in your use of the CSV export button. I feel that the CSV button should use the .csv extension by default.

    I'll post back once I've looked into the Excel one.

    Regards,
    Allan

  • colincolin Posts: 15,240Questions: 1Answers: 2,599

    Hi @lisarush ,

    Another fix is to use this from this thread, this shows the data then as expected in Excel. Oddly enough, it worked fine on LibreOffice as it was!

    Cheers,

    Colin

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    I've just been looking into this, and the number is written into the Excel file that is generated by Buttons at full precision. The issue comes when opening the file in Excel.

    It would appear that Excel has a 15 number precision limit for large numbers. That exactly matches what you are seeing.

    A workaround for this would be for the export to check the size of the number and export it as a string rather than a number, but I'm not sure that is the right thing to do or not since it is formatting a number as a string, which is wrong, but this limitation in Excel is rounding the number.

    @lisarush - given that you are working with this size of numbers, what has been your experience on using them in Excel?

    Thanks,
    Allan

  • lisarushlisarush Posts: 102Questions: 17Answers: 0

    I know the issue is with opening in Excel -- and not dataTables writing it, per se. However, since Excel has this 15-digit limitation -- I do think that dataTables needs to consider this when writing the values. [[The question is: what data type should the data be? Is the inherent data a number (where maybe it's ok to round it)? or is the inherent data a string value (where it's not ok to round)? maybe a way to specify data type?]]

    In our case, the numbers are not really numbers -- they are customer account numbers which are essentially a string of digits -- and every digit is important. Losing precision in our case in just simply wrong.

    My solution could maybe be improved -- but it does specifically handle strings of digits longer than 15 digits.

  • colincolin Posts: 15,240Questions: 1Answers: 2,599

    Hi @lisarush ,

    That workaround I mentioned above then would work for you - did you give that demo a try?

    Cheers,

    Colin

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    I agree with Colin - since you know you want the number to be treated as a string, using the customize callback to basically set the cell to be a string would be a good way to do it.

    I'm not certain if Buttons own export should do that automatically or not. As far as I can tell from the spec it is doing the right thing - writing the number as a number. LibreOffice appears to open it without the 15 digit issue, so its Excel specific, which I why a customize callback might be the best option.

    Allan

  • lisarushlisarush Posts: 102Questions: 17Answers: 0

    Using a workaround similar to what Colin suggested is something that I considered -- way back when I added my workaround a couple years ago (yes, I should I have mentioned it then). However, I did not feel it was the best solution to customize the specific table. But to be fair, I just went and played with it again. Respectfully, I still don't feel it's the best way to handle this situation.

    1) I do still want the other number columns to be treated as numbers (could just add the '\u200C' to appropriate column -- but this would not be generic)
    As a note, when I tried this... when you open the spreadsheet in Excel, it looks like the other real # columns (not my account# column) -- are essentially the same (different # of decimal digits shown in spreadsheet -- but same raw value when you click in the cell). However, if you try to do "Format cells" on the data to change the # of decimal places -- Excel doesn't complain but it doesn't actually do anything. So, it appears as if that character (which as a user I don't see anywhere in Excel) is interfering & keeping me from formatting it as I wish. Not desirable.

    2) I have numerous tables with this data & it seems sloppy/repetitive to have to add 'customizeData' like this for each table (In my case, I could put this in a central location as I already extend dataTables to provide consistent configuration across our website. But only if I applied it to all cells, which is not what I want.)

    3) Even though this is essentially Excel's (documented) limitation, it's a little too easy to just say "not my problem". (When developing a website, that's like saying "well, IE is a pain, so I'm not going to support it -- although wouldn't that be nice! and something I've wanted to say many times!) Even though someone could open the document in a different spreadsheet program, I'm guessing that the majority of users will be using Excel. Yes, I can fix it, but it seems like the root of the issue is knowing that Excel has this limitation and providing the best experience to your dataTables users. How many people have to discover this limitation on their own? and then figure out a workaround?

    My two cents.

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    All good points!

    I'm not actually a fan of modifying the data in this way either - its a formatting issue, not a data one, so we should be using the customize method to change the formatting - for example:

                customize: function( xlsx ) {
                    var sheet = xlsx.xl.worksheets['sheet1.xml'];
     
                    $('row c[r^="A"]', sheet).attr( 's', '0' );
                }
    

    Style 0 is the built in plain text option in Excel. Interestingly Excel sees that number in the string as shows it in exponential notation - although it does retain the precision! This is perhaps the best of the evil options...

    I take your point as well about the fact that Excel is popular and we should work towards its limitations (I'm surprised that we've hit any actually!). That said, although not as populate, we shouldn't punish users of other packages such as LibreOffice with a sub-par file that needs to work for limitations in another package.

    I'm going to experiment with the styles a bit to see if I can get it to stay in plain text.

    Allan

  • lisarushlisarush Posts: 102Questions: 17Answers: 0

    You're absolutely right. It is a formatting issue. (my workaround isn't great either)
    Going back to the idea of... maybe a way/option to specify data type on the column? something that could be added just to the affected column's <td>?

    Let me know if you have any good ideas. Thanks.

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    The Excel output has the option of matching patterns which is used for the current formatting output. That isn't currently plug-in extendable, although longer term I think it probably should be (its just that adding custom styles is a bit of a nightmare since you need to do it in the XML directly).

    I think the trick is going to be to find the correct formatter for the string value to have it look like a number, but actually be a string.

    Allan

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    I've just tried a little experiment in Excel - if you simply type "1234567890123456" then it automatically gets rounded and the precision lost (rather than detecting it as a string and formatting it as such).

    As Excel itself has this behaviour, for the moment I'm going to leave it as is. There is a workaround using the customize callback if you know you are going to be working with such large numbers.

    That said, if this crops up for anyone else, please post back here to let us know so we can see how many people it does effect.

    Regards,
    Allan

  • Jim LIJim LI Posts: 3Questions: 0Answers: 0

    It seem that the function is working but the exported excel need to be repaired.

    customize: function( xlsx ) {
    var sheet = xlsx.xl.worksheets['sheet1.xml'];
    $('row c[r^="A"]', sheet).attr( 't', 's' );
    }

  • allanallan Posts: 63,678Questions: 1Answers: 10,498 Site admin

    If you are using IE, please try the nightly version which should fix it. If you aren't, please link to a test case showing the issue.

    Allan

  • Jim LIJim LI Posts: 3Questions: 0Answers: 0

    It's good to hear that Datatables has finally provided a solution.

    BTW, I come here to say that after tried a lot of times, I've figured out myself.

    Edit datatables.js , search for

    <xf numFmtId="2" fontId="0" fillId="0" borderId="0" applyFont="1" applyFill="1" applyBorder="1" xfId="0" applyNumberFormat="1"/>
    

    add a line after that with

    '<xf numFmtId="49" fontId="0" fillId="0" borderId="0" applyFont="1" applyFill="1" applyBorder="1" xfId="0" applyNumberFormat="1"/>'+
    

    Then scroll up a little and find <cellXfs count="67"> ,change 67 to 68

    Customize code:

    customize: function( xlsx ) {
                                var sheet = xlsx.xl.worksheets['sheet1.xml'];
                                $('row c[r^="M"]:gt(0)', sheet).attr( 't', 'inlineStr' );  //add :gt(0) to skip column title
                                $('row c[r^="M"]:gt(0)', sheet).attr( 's', '67' );  //apply for new style
                                $('row c[r^="M"]:gt(0)', sheet).append('<is><t xml:space="preserve"></t></is>');
                                $('row c[r^="M"]:gt(0) > is > t', sheet).each(function () {
                                    $(this).text($(this).parent().siblings("v").text());  //paste text to new node
                                    $(this).parent().siblings("v").remove();  //remove useless v node
                                });
                            }
    
This discussion has been closed.