xmlsave — Export or import dataset in XML format 3
doctype(excel) specifies that an XML file will be exported using Microsoft’s SpreadsheetML
DTD. SpreadsheetML is the term given by Microsoft to the Excel XML format. Specifying this
document type produces a generic spreadsheet with variable names as the first row, followed by
data. It can be imported by any version of Microsoft Excel that supports Microsoft’s SpreadsheetML
format.
dtd when combined with doctype(dta) embeds the necessary DTD into the XML file so that a
validating parser of another application can verify the dta XML format. This option is rarely used,
however, because it increases file size with information that is purely optional.
legible adds indents and other optional formatting to the XML file, making it more legible for a person
to read. This extra formatting, however, is unnecessary and in larger datasets can significantly
increase the file size.
replace permits xmlsave to overwrite existing filename.xml.
Options for xmluse
doctype(dta | excel) specifies the DTD to use when loading data from filename.xml. Although it
is optional, use of doctype() is encouraged. If this option is omitted with xmluse, the document
type of filename.xml will be determined automatically. When this occurs, a note will display the
document type used to translate filename.xml. This automatic determination of document type is
not guaranteed, and the use of this option is encouraged to prevent ambiguity between various
XML formats. Specifying the document type explicitly also improves speed, as the data are only
passed over once to load, instead of twice to determine the document type. In larger datasets, this
advantage can be noticeable.
doctype(dta) specifies that an XML file will be loaded using Stata’s dta format. This document
type follows closely Stata’s binary .dta format (see [P] file formats .dta).
doctype(excel) specifies that an XML file will be loaded using Microsoft’s SpreadsheetML DTD.
SpreadsheetML is the term given by Microsoft to the Excel XML format.
sheet("sheetname") imports the worksheet named sheetname. Excel files can contain multiple
worksheets within one document, so using the sheet() option specifies which of these to load.
The default is to import the first worksheet to occur within filename.xml.
cells(upper-left:lower-right) specifies a cell range within an Excel worksheet to load. The default
range is the entire range of the worksheet, even if portions are empty. Often the use of cells()
is necessary because data are offset within a spreadsheet, or only some of the data need to be
loaded. Cell-range notation follows the letter-for-column and number-for-row convention that is
popular within all spreadsheet applications. The following are valid examples:
. xmluse filename, doctype(excel) cells(A1:D100)
. xmluse filename, doctype(excel) cells(C23:AA100)
datestring forces all Excel SpreadsheetML date formats to be imported as strings to retain time
information that would otherwise be lost if automatically converted to Stata’s date format. With
this option, time information can be parsed from the string after loading it.
allstring forces Stata to import all Excel SpreadsheetML data as string data. Although data type
information is dictated by SpreadsheetML, there are no constraints to keep types consistent within
columns. When such inconsistent use of data types occurs in SpreadsheetML, the only way to
resolve inconsistencies is to import data as string data.