Potential Bug report: DataTables Editor does not decode all HTML entities

Potential Bug report: DataTables Editor does not decode all HTML entities

ps@raytionps@raytion Posts: 4Questions: 0Answers: 0

(Potential because one could discuss whether this is a bug or expected behavior.)

When using DataTables Editor (tested with versions 1.5.6 and 1.7.3, probably goes back to 1.5.3), the HTML entities decoding for (non-DOM) data to be displayed in input fields only works on a very narrow set of entities (the ones that the provided server-side libraries output?). Variants of the decoded entities and other entities are left as is.

Are there any plans to extend the decoding function or make it pluggable? If not, could someone mention the restriction in the documentation (at least on Manual / Security and Reference / Options / fields.entityDecode), for those customers who cannot or do not want to use the provided server-side libraries?

Steps to reproduce the problem:

  1. Open http://live.datatables.net/befuxafo/5/ or
    1. download the attached HTML file,
    2. put editor.dataTables.css plus dataTables.editor.js in the same folder (except for the usual CDNs, no external servers are needed),
    3. and open the HTML file.
  2. Check that the single table row contains the string "I'm a tëst string"
  3. Select the table row
  4. Click the Edit button

Expected result

The input field contains the string "I'm a tëst string"

Actual result

The input field contains the string "I'm a tëst string"

Browsers tested in

  • Firefox 60 (Ubuntu and Windows Server 2012R2)
  • Chrome 66 (Ubuntu and Windows Server 2012R2)
  • Internet Explorer 11 (Windows Server 2012R2)
  • Microsoft Edge 16 (Windows 10)

(Looking at the code, I assume the browser version does not actually matter.)

Remarks

  • Sending decoded data from the server to Editor, while still sending encoded data to DataTables to avoid problems with XSS, is not really an option for us.
  • In our application, the data come from an AJAX call to a Java server. I chose a JS object for the example to simplify the example and make it less dependent on network access; the error happens as long as the data does not come from the DOM.
  • The entities simulate a string escaping library that always outputs ASCII-safe content, even if the final output will use UTF-8. Avoiding such a library (and only using the entities DataTables Editor decodes) is possible for us, but may not be possible for other customers.

Replies

  • allanallan Posts: 63,455Questions: 1Answers: 10,465 Site admin

    The problem here is that I think you are using the .NET libraries for Editor (rather than the PHP or Node.JS ones, but I don't think you explicitly state that), and they use Microsoft's AntiXSS library which is really aggressive about encoding entities. I do't believe that encoding ë as an entity is actually all that useful, since UTF8 is used virtually everywhere now.

    One option is to disable the XSS protection in the .NET libraries for Editor, and instead use the text renderer in DataTables to make sure that there isn't a potential attack vector there.

    The other would be to modify the Editor libraries to expand the entity decoding. That currently isn't pluggable I'm afraid.

    Regards,
    Allan

  • ps@raytionps@raytion Posts: 4Questions: 0Answers: 0

    We are using Java (with our own backend library), sorry for not mentioning that earlier.

    I agree that encoding non-ASCII characters is mostly pointless with the near-universal UTF-8 support. I thought I had seen an encoding library that assumes US-ASCII as the target charset in the last three years, but I cannot find it anymore. So the encoded ë may or may not be a problem anymore. (For us, it definitely is not.)

    Outputting unencoded strings and using a non-standard renderer may be a solution, but I'll have to discuss that with the other developers involved. Our backend library encodes strings by default and requires a bit of work from the developer to output unencoded strings, especially for multiple columns. (This is intentional; it makes doing the right thing (do not forget to encode output) easy, while still making the more difficult thing (output HTML to be used in the table directly) possible.) For the frontend, we do not have anything comparable (to add the non-standard renderer by default so nobody can forget it) yet.

    Modifying dataTables.editor.js to expand the entity decoding (e.g., by using a library like he) seems to be the favored solution in the team so far. From a technical perspective, it does not look too difficult, either. But we are pretty sure the Editor commercial license (v1.3) does not allow such modifications (since that would materially change the contents of dataTables.editor.js and therefore the Software). So that option is out of the question for the moment.

  • allanallan Posts: 63,455Questions: 1Answers: 10,465 Site admin

    You can modify the software to suit your needs. I see your point related to the license, but I don't consider that a material change and you have access to the source code for a reason :). Only if you were to resell those changes would there be an issue.

    Its a fair point - perhaps it should attempt to decode all entities (or have an option to). There are some cases where people might want to keep them as entities though.

    Thanks for posting this!

    Allan

  • ps@raytionps@raytion Posts: 4Questions: 0Answers: 0

    We do not plan on reselling Editor itself to anyone, but we do plan to use it in web applications that we sell to customers and that run on the customers' servers; and we take the licenses of all third-party libraries seriously. Assuming our customers may not modify the software we deliver to them, does the normal license cover the distribution of the modfied Editor with our web applications or would we need an OEM license?

This discussion has been closed.