Record Export Size Limitation - solution?

Record Export Size Limitation - solution?

tharristharris Posts: 15Questions: 0Answers: 0
edited June 2010 in TableTools
I've got a web app that has around 2000 records and a dozen columns. After trying to save using TableTools it's pretty obvious I've hit the flash limitation on the amount of data you can process (as discussed in the "TableTools v1.0.2 - Save as Excel, CSV, copy and print!" announcement thread). I've been looking on the forums for solutions (edit- for keeping things client-side, though allen's suggestion on the aforementioned thread about hacking tabletools to post to a server is potentially doable), but haven't seen anything yet. Are there any known work-arounds for this issue?

Replies

  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    Hmm, it would appear this is only an issue on IE7, since Mozilla works fine. I see 1.1.4 has chunking, which should fix the javascript/flash bridge, so I'm at a bit of a loss here.

    Edit - It's looking increasingly like IE7 is just slow as dirt going through the fnGetDataTablesData function in tabletools. Still, that doesnt explain why it saves empty excel sheets for larger datasets. I've noticed that sometimes it will output the previous set of rows if I filter, export, remove filter, and export again, or just switch from a smaller output to a larger output. Also, if I just restart IE7 and try straight for the output it has 1 cell populated - the contents are "ÿþ" for some reason. Ideas, anyone?
  • allanallan Posts: 63,489Questions: 1Answers: 10,470 Site admin
    Hi tharris,

    Are you using 1.1.4 now? UTF support wasn't present in 1.0.x.

    One thing you will find with saving to XLS (and CSV but to a much latter extent) is that the flash processing is _VERY_ slow. This is because it needs to deal with the strings bit by bit - since this isn't a built in function for flash to do what is needed. So 2000 records is likely to crush most computers when doing an XLS export - sorry to say this!

    So I think a server-side option might be quite a good answer for dealing with so many records. I'm working on TableTools 2 at the moment which will allow for a lot more flexibility with buttons, including an option to send the DataTables data to the server-side for processing there (PDF for example, or proper XLS creation, rather than a UTF16-LE CSV file with an XLS extension!). This scripts will need to be created (and won't be part of the TableTools distribution), although I am thinking about setting up a site just for TableTools to promote its flexibility. If you fancy having a look at an early development version, please let me know: http://datatables.net/contact .

    Regards,
    Allan
  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    Hmm, I hadn't realized that flash might be at fault, since on FireFox the save worked fine (took a second or two for the dialog to appear, but even that wasn't very appreciable to users). As a result, I simply assumed it was an inefficiency in the javascript engine for IE7. I'm currently running 1.1.4.

    The interesting thing that I've noticed so far is that while at the end of fnGetDataTablesData the function returns sData, which when sent to an alert box has all the data for excel, but when you open the saved file you get "" in A1 as the only saved value for .csv files and "
  • allanallan Posts: 63,489Questions: 1Answers: 10,470 Site admin
    You might well be right that IE will be going slow - I had forgotten that I put in a change a little while ago which 'decodes' HTML entities before sending them to flash - this is quite a slow process, but I tried to have it occur only when an ampersand is found in a string - so if yu have lots of entities it might be that. If not, then a JS profiler might be of some use.

    The 'odd' characters that you are seeing are the Byte Order Mark (BOM) which indicated the file's character encoding. CSV is saved as UTF8 and XLS as UTF16-LE. The text editor you are using isn't UTF aware I guess - or the file has been corrupted - it might be something funny going on in flash when using large data sets.

    The approach I've taken with TableTools 2 for the server-side option is that the data will be generated and collected on the client-side from DataTables (for taking account of filtering etc), but instead of sending it to an SWF file, it sends it to the server using Ajax. It might mean you've got a large POST request - but hopefully it would be worth it.

    Regarding the speed up question - for large data sets, server-side processing really is the way to go. Although how big a 'large' data set is, depends on the browser. I have on problem dealing with many thousands of rows in Safari etc, but IE gets crunched. Processor speed on each machine makes a difference as well. If it's unacceptable in IE7 and that's a key browser for you, I'd say look at server-side processing. DataTables is fairly optimised in how it works already - although if anyone has any suggestions they are fairly welcome, and I'm always looking for ways to improve it myself.

    The way DataTables works at the moment is that it reads in the HTML from the page, putting it into it's cache and manipulates the DOM on each draw - adding and removing elements as required. It's sort of 'just in time' for each draw - although the initial read does happen 'online'. It might be possible to take that offline - an interesting thought for the future.

    Regards,
    Allan
  • tharristharris Posts: 15Questions: 0Answers: 0
    I've run the code through firebug's profiler, and it looks like most of the time taken is in the nested for loops in fnGetDataTablesData that sanitize the html in the visible table columns for conversion.

    If I had to guess the data is somehow corrupted in flash, though I haven't looked at what tabletools does outside the tabletools.js for 1.1.4. I'm using Excel 2007 to open the saved files. Curiously, the save feature works fine in FireFox even for larger (2000ish rows with around 12-18 columns) datasets, so it may be a quirk unique to IE7 and/or it's version of Flash 10. Since I have no development experience with Flash, I'll leave the speculation as to how, why, or even if this may be to others.

    It does look like server-side processing is the way to go. I'm planning on setting up some test code to work out the how-to stuff for getting tabletools to post back to the server, the server to generate excel or a csv file, and then send it back to the user, which hopefully shouldnt be too bad, but I'm not sure how I'll handle the print and copy to clipboard options, since they run into the same problem in fnGetDataTablesData. I might just disable everything but send to excel.

    I'm loading the page using AJAX and JSON, so there isn't any html beyond the table headers, though I may try generating some and then initializing the datatable to see how the performance differs at page load. I doubt there will be a big difference though. Thanks for your input!
  • tharristharris Posts: 15Questions: 0Answers: 0
    Making a bit of progress. The code below is client side only and fairly preliminary, and the send to excel stuff only works on IE (which doesnt matter, since that's the only browser I have to worry about). I'm also working on another version that uses server side processing,the beginnings of which are included below(ie, a converson to JSON in JavaScript and how to deserialize in C#), but until I can sort out an alternative to Response.write for use with webmethods, or do something like open a new window and pass it the JSON so it can return an excel file, this is all i've got.
    [code]
    <%@ Page Language="C#" AutoEventWireup="true" CodeFile="testTableExcel.aspx.cs" Inherits="testTableExcel" %>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


    Test Table - Excel









    var oTable;
    $(document).ready(function(){
    populateTable(10);
    TableToolsInit.sSwfPath = "swf/ZeroClipboard.swf";
    oTable = $('#tblData').dataTable({
    "bJQueryUI": true,
    "bAutoWidth": false,
    //Make it look decent and change the default sort to column 1
    "sPaginationType": "full_numbers",
    "iDisplayLength": 25,
    "aaSorting": [[ 1, 'asc' ]],
    /* Fix formatting issues related to TableTools plugin */
    "sDom": 'T<"clear"><"H"lfr>t<"F"ip>'
    });});
    //Populate #tblData with generic example data from 1 to numRows
    function populateTable(numRows){
    var rows = "";
    for(var i = 1; i
  • tharristharris Posts: 15Questions: 0Answers: 0
    Example SendToExcel via webmethod - This stuff only covers how to get the data onto the server. I still have to work out how to use a webmethod to turn it into an XML file and send it back.
    [code]

    //server side requires a webmethod "SendToExcel" function at testTableExcel.aspx code behind as well as
    //a serializeable Employee object with properties matching the JSON field names. See below for an example.
    function sendToServer(){
    var sData = [];
    var oSettings = oTable.fnSettings();
    //Generate JSON string representation
    for ( var i=0, iLen=oSettings.aiDisplay.length ; i
  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    Well, in case anyone else runs into similar problems, here's my completed example solution (code in the posts below). After a lot of research, I finally tracked down everything I needed. The process is fairly simple and it should be pretty flexible, and with a bit more work you could turn it into something very generic and extremely reusable - ie, plug and play. The end result is an xml file returned as an XLS attachment in a new window.

    The aspx page formats the data associated with the filtered table rows into JSON and makes an AJAX post back to it's code-behind on the server to turn it into standard XML, which is saved as a session variable.

    When the AJAX is successfully completed, the success function opens a new window and opens XMLHandler.ashx/ProcessRequest (a generic request handler's ProcessRequest method).

    The ProcessRequest function inside XMLHandler.ashx takes the session variable and attaches the various headers before returning the XML as a XLS attachment for excel.

    A few potential weaknesses here: you could probably run into problems if you try outputting more than 65536 rows and open in Office 2003 (1,048,576 if you're in 2007). Also, since the current setup is just a generic XML file, office throws up a few messages while it figures out what to do with the data. Finally, you lose the plug-and-play nature of tabletools 1.1.4. On the plus side, Excel 2007 adds column filters and row highlighting automatically, which is nice.

    I'm sure there are ways to make this considerably more elegant, but until I see someone post such a solution, this works for now. Load times on IE7 (after the initial table load, which is painful but workable up to a few thousand records) are great - tabletools took a few minutes and failed to output the file, whereas this takes maybe 5 seconds. Still, on browers with faster javascript engines I'd say stick with tabletools and save yourself the trouble - the latest version of firefox (3.6.4, as I recall) takes about the same amount of time with tabletools, and you don't have to muck about getting everything set up. Also, for small datasets the overhead far outweighs the benefits, so if it doesnt take at least 3-5 seconds with tabletools this will just slow you down.

    Edit - Just a side note, keep in mind that this doesnt cover all the cases such as for nested arrays or objects (ie, Employee with another object inside - ProjectHistory, for example). I'll post a more general case below.
  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    [code]
    <%@ Page Language="C#" AutoEventWireup="true" CodeFile="testTableExcel.aspx.cs" Inherits="testTableExcel" %>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


    Test Table - Excel









    var oTable;
    $(document).ready(function(){
    populateTable(6000);
    TableToolsInit.sSwfPath = "swf/ZeroClipboard.swf";
    oTable = $('#tblData').dataTable({
    "bJQueryUI": true,
    "bAutoWidth": false,
    //Make it look decent and change the default sort to column 1
    "sPaginationType": "full_numbers",
    "iDisplayLength": 25,
    "aaSorting": [[ 1, 'asc' ]],
    /* Fix formatting issues related to TableTools plugin */
    "sDom": 'T<"clear"><"H"lfr>t<"F"ip>'
    });
    }
    );
    //Populate #tblData with generic example data from 1 to numRows
    function populateTable(numRows){
    var rows = "";
    for(var i = 1; i
  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    Code-behind

    Edit - This is the final version. generateXML is generic - pass in a class and it will generate an xml representation of the class properties for each item in the list (only for initialized properties, which is nice if you only occasionally use some of the properties or want to exclude them by not passing them back in the JSON) and return it as a string. From here just extract generateXML into a shared class and you're good to go.
    [code]
    using System;
    using System.Web;
    using System.Web.Services;
    using System.Collections.Generic;
    using System.Xml;
    using System.IO;
    using System.Web.Script.Serialization;
    using System.Reflection;

    public partial class testTableExcel : System.Web.UI.Page
    {
    protected void Page_Load(object sender, EventArgs e)
    {

    }

    [WebMethod(true)]
    public static void SendToExcel(object containers)
    {
    //extract JSON objects - Employee is a container class with properties matching the JSON
    JavaScriptSerializer ser = new JavaScriptSerializer();
    List items = ser.Deserialize(containers.ToString());

    //convert the JSON into an XML file, then save it to a session variable
    HttpContext.Current.Session["XLSOutput"] = generateXML(items);

    }

    //Takes a List of a generic class and extracts an XML representation of its initialized properties
    //Returns XML as a string
    public static string generateXML(IEnumerable inputList)
    {
    //originally based on the example from http://www.primaryobjects.com/CMS/Article76.aspx
    using (MemoryStream stream = new MemoryStream())
    {
    // Create an XML document. Write our specific values into the document.
    XmlTextWriter xmlWriter = new XmlTextWriter(stream, System.Text.Encoding.UTF8);
    xmlWriter.WriteStartDocument();
    xmlWriter.WriteStartElement("Output");

    //Extract the list of properties
    Type t = typeof(T);
    PropertyInfo[] pi = t.GetProperties();

    //output XML representation of each class
    foreach (T item in inputList)
    {
    //if the value doesnt exist, skip it. might not catch all edge cases though...
    xmlWriter.WriteStartElement(t.Name);
    foreach (PropertyInfo p in pi)
    {
    if (p.GetValue(item, null) != null)
    {
    xmlWriter.WriteElementString(p.Name, p.GetValue(item, null).ToString());
    }
    }
    xmlWriter.WriteEndElement();
    }

    xmlWriter.WriteEndElement();
    xmlWriter.WriteEndDocument();

    // To be safe, flush the document to the memory stream.
    xmlWriter.Flush();

    //extract the xml from the stream
    stream.Position = 0;
    StreamReader sr = new StreamReader(stream);
    return sr.ReadToEnd();
    }
    }
    }
    [/code]
  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    generic handler file: XMLHander.ashx
    [code]
    <%@ WebHandler Language="C#" Class="XMLHandler" %>

    using System;
    using System.Web;
    using System.Web.Script.Serialization;
    using System.Collections.Generic;
    using System.Xml;
    using System.IO;
    using System.Web.SessionState;

    public class XMLHandler : IHttpHandler, IRequiresSessionState
    {
    //Output the contents of the session variable XLSOutput as an xls attachment for downloading
    //XLSOutput should be updated to match table contents BEFORE this request is made!!!
    public void ProcessRequest(HttpContext context)
    {
    context.Response.ContentType = "xml";
    context.Response.AppendHeader("Content-Disposition", "attachment; filename=MyExportedFile.xls");
    context.Response.ContentType = "application/octet-stream";
    context.Response.ContentEncoding = System.Text.Encoding.UTF8;
    context.Response.Write(context.Session["XLSOutput"]);
    }

    public bool IsReusable
    {
    get
    {
    return false;
    }
    }

    }
    [/code]
  • tharristharris Posts: 15Questions: 0Answers: 0
    edited June 2010
    More general case of the generate XML code. Still doesnt hit everything, like enums and arrays, but still much improved, and it's where I plan to leave this chunk of code. One thing I don't like is that it doesnt have any way to specify output format, but that's a problem for another day.

    [code]
    public static string GenerateXML(IList inputList)
    {
    //originally based on the example from http://www.primaryobjects.com/CMS/Article76.aspx
    using (MemoryStream stream = new MemoryStream())
    {
    // Create an XML document. Write our specific values into the document.
    XmlTextWriter xmlWriter = new XmlTextWriter(stream, System.Text.Encoding.UTF8);
    xmlWriter.WriteStartDocument();
    xmlWriter.WriteStartElement("Output");

    WriteElementList(inputList as IList, xmlWriter);

    xmlWriter.WriteEndElement();
    xmlWriter.WriteEndDocument();

    // To be safe, flush the document to the memory stream.
    xmlWriter.Flush();

    //extract the xml from the stream
    stream.Position = 0;
    StreamReader sr = new StreamReader(stream);
    return sr.ReadToEnd();
    }
    }

    private static void WriteElementList(IList itemList, XmlTextWriter xmlWriter)
    {
    Type t = typeof(T);
    foreach (T item in itemList)
    {
    xmlWriter.WriteStartElement(t.Name);
    //may not handle arrays. In theory, treat them as lists...
    if (typeof(System.Collections.IList).IsAssignableFrom(t))
    {
    foreach (Type interfaceType in t.GetInterfaces())
    {
    if (interfaceType.IsGenericType && (interfaceType.GetGenericTypeDefinition() == typeof(IList<>)))
    {
    WriteElementList(item as IList, xmlWriter); // is a match
    }
    }
    }
    else
    {
    PropertyInfo[] pi = t.GetProperties();
    WriteElement(item, pi, xmlWriter);
    }
    xmlWriter.WriteEndElement();
    }
    }

    private static void WriteElement(T item, PropertyInfo[] pi, XmlTextWriter xmlWriter)
    {
    //Extract the list of properties
    Type t = typeof(T);
    foreach (PropertyInfo p in pi)
    {
    Type t2 = p.PropertyType;
    if (p.CanRead && p.GetValue(item, null) != null)
    {
    //either output a primitive/string/datetime or handle it as an object/list of objects
    if (t2.IsPrimitive || t2.UnderlyingSystemType.Name == "String" || t2.UnderlyingSystemType.Name == "DateTime")
    {
    //don't write if the content is String.Empty
    if (!(t2.UnderlyingSystemType.Name == "String" && p.GetValue(item, null).Equals(String.Empty)))
    {
    xmlWriter.WriteElementString(p.Name, p.GetValue(item, null).ToString());
    }
    }
    else
    {
    //wrap in a new element tag
    xmlWriter.WriteStartElement(t2.Name);
    //arrays can be treated similarly, though probably with less safety checks
    if (typeof(System.Collections.IList).IsAssignableFrom(t))
    {
    foreach (Type interfaceType in t.GetInterfaces())
    {
    if (interfaceType.IsGenericType && (interfaceType.GetGenericTypeDefinition() == typeof(IList<>)))
    {
    WriteElementList(item as IList, xmlWriter); // is a match
    }
    }
    }
    else
    {
    PropertyInfo[] pi2 = t2.GetProperties();
    WriteElement(p.GetValue(item, null), pi2, xmlWriter);
    }
    xmlWriter.WriteEndElement();
    }
    }
    }
    }
    [/code]
  • allanallan Posts: 63,489Questions: 1Answers: 10,470 Site admin
    That's awesome! Thanks very much for posting your code! :-)

    Regards,
    Allan
  • kanngardkanngard Posts: 1Questions: 0Answers: 0
    IE 7 don't like concatenating strings. One solution is to rewrite the methods used (_fnGetDataTablesData) to use arrays instead. I.e. use:
    [code]
    var sData = [];
    //...
    sData.push(string);
    //...
    return sData.join('');
    [/code]
    instead of:
    [code]
    var sData = '';
    //...
    //...
    sData = sData + string;
    //...
    return sData;
    [/code]
This discussion has been closed.