prior to 2007-08? there were several XML libraries that dealt with canonicalization very differently.
(and the community does understand that quite a bit of the current XML importer handling might have to be refactored to deal properly with non-compliant C14N XML documents and give them more options in our UI) I hope this helps you and others on this issue to understand what the community generally is asking for, especially concerning XML importer options. For those XML documents in the wild that are not compliant, we want users to have as many options as necessary in order to parse and import the data so that they can extract and use it in our grid to analyze, convert, transform, export, etc. Is basically about those smart defaults.and that our XML importer might need to have changes done to the existing defaults.but my opinion is that the preserve Empty Strings and Whitespace defaults are better as-is because MOST of the XML encountered in the wild will be C14N compliant and not have an issue. I think that's basically what we'd get with preserveEmptyStrings=False, trimWhitespace=True. Making the default behavior match might be a reasonable approach.
EXCEL IMPORT XML BLANK ROWS HOW TO
More broadly, the OpenRefine philosophy of letting users decide how to import and giving them more options with smart defaults that can be changed or toggled. We don't try to force users and tell them "you must supply valid XML to import that strictly conforms to all C14N rules", instead we want to allow users to ignore some of those rules.which is what Jackson's various options allow and that we don't fully expose yet but want to. In OpenRefine, we let users decide how things are normalized, that's our philosophy.giving users more power and controls to make the best decisions for themselves. Jackson will try to normalize attribute values when "validating" (per C14N rules). a lot of the existing smart defaults won't work when you toggle some of those other options for Jackson XML parser when its asked to be a "validating" parser. So 1st thing is we always check for valid formatting (JSON, XML, etc.) and in the case of XML, there's the strictness concerns, where we actually want to provide MORE options, not less, (Jackson allows lots of options we don't currently expose in the UI and would like to) to the user and not changing the existing smart defaults for preserving Empty Strings, trimming Whitespace, etc. The problem exposed in this issue is that around non-compliant C14N XML documents (non valid) sometimes found in the wild. In general, we provide smart defaults, but always try to expose more options so that users can work with their messy data within valid formats. We need to improve the importer so that it can deal with those default settings nicely, as per my previous paragraph. Please correct me, if I'm going anywhere wrong. Moreover Just for confirmation, as per the above discussion, along with these changes, the new default parser configuration settings (in case of XML) will be preserveEmptyStrings=False, trimWhitespace=True.
This changed after 3.0 and we give users much more choice of null and blank cell handling as well as new importer and facet for treating null and blanks differently and as well as new display options ( All menu over the first column) to show or hide null value in cells. Prior to OpenRefine 3.0 you would see treatment of null/blank cells as equal. When a user unchecks the preserve empty strings, they are asking not to store blanks, but instead not assign a value to a cell, which is then just a null cell. Those empty values will automatically display as blank cells in preview, and will thus be stored as blank, not null (empty string values) in cells in our grid after Project creation. Users are allowed to preserve and keep empty string values if they want, no matter if it was JSON, XML, CSV, etc. If a particular importer has some trouble creating or preserving empty strings, then we need to change and improve that. The default is and has always been that we preserve empty string values during import. The empty check on value should only be performed in the case when parser is an instance of XmlImporter.XmlParser.