XML File Format (Extensible Markup Language)
XML (Extensible Markup Language) is a text-based markup language for storing and exchanging structured data in a way that is both human-readable and machine-readable. It was developed by a W3C working group and published as a W3C Recommendation in February 1998, drawing its design from the older SGML standard. Unlike HTML, XML defines no fixed tags — authors create their own element and attribute names to describe whatever data they need, which is what "extensible" means. Data is organized as a strict tree of nested elements, and a single root element wraps the entire document. For two decades XML was the dominant format for web APIs, configuration files, and document interchange before JSON overtook it for lightweight web messaging. XML remains deeply entrenched wherever validation, namespaces, and rich document structure matter: RSS and Atom feeds, SOAP web services, SVG and XHTML, Office Open XML (the basis of DOCX and XLSX), Android layouts, Maven and Spring configuration, and countless enterprise and government data standards. Its accompanying ecosystem — XSD schemas, DTDs, XPath, XSLT, and XQuery — lets you validate, query, and transform XML with precision that simpler formats cannot match.
Quick Facts
- Extension: .xml
- MIME Type: application/xml
- Category: document
Advantages
- Strict, well-defined tree structure that is both human- and machine-readable
- Powerful validation via DTD and XML Schema (XSD) enforces structure and data types
- Namespaces allow mixing vocabularies without naming collisions
- Rich tooling for querying and transforming: XPath, XSLT, and XQuery
- Self-documenting custom tags and broad, mature support across every platform
Disadvantages
- Verbose — opening and closing tags make files larger than JSON or CSV
- More complex to parse and write than JSON for simple data exchange
- Slower to parse, and DOM parsing of large files is memory-heavy
- Overkill for lightweight web APIs, where JSON is now preferred
- Security pitfalls such as XXE (XML External Entity) and billion-laughs entity-expansion attacks if parsers are misconfigured
Common Use Cases
- RSS and Atom feeds for blogs, podcasts, and news syndication
- Configuration files (Maven pom.xml, Spring, Android layouts, web.config)
- SOAP web services and enterprise system integration
- Office Open XML documents (the XML inside DOCX, XLSX, and PPTX)
- Data interchange standards in finance, healthcare, and government (e.g. XBRL, HL7, GPX)
Technical Details
An XML document begins with an optional prolog — the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) and any DTD or processing instructions — followed by exactly one root element. Elements may nest, carry attributes, and contain text, CDATA sections (raw text exempt from markup parsing), comments, or child elements. A document that obeys the syntax rules (properly nested, single root, quoted attributes, escaped reserved characters < &) is "well-formed"; one that also conforms to a DTD or XSD is "valid." Five predefined entities (< > & ' ") escape reserved characters, and numeric character references encode any Unicode codepoint. Namespaces, declared with xmlns, qualify names with URIs to prevent collisions. Parsers come in two main styles: DOM, which loads the whole tree into memory, and SAX/StAX, which stream events for large files. XPath addresses nodes, XSLT transforms documents into other formats, and XQuery queries them.
Frequently Asked Questions about XML
What is the difference between XML and HTML?
XML carries data with custom, author-defined tags and enforces strict syntax, while HTML displays content using a fixed set of predefined tags and is forgiving of errors. XML describes what data is; HTML describes how a page looks. XHTML is HTML reformulated to follow XML's strict rules.
Is XML still used in 2026?
Yes. While JSON has replaced XML for most new web APIs, XML remains the backbone of RSS/Atom feeds, SOAP services, Office documents (DOCX, XLSX), Android and Java configuration, SVG, and many regulated data standards in finance, healthcare, and government.
What does well-formed versus valid XML mean?
Well-formed XML follows the basic syntax rules: one root element, properly nested and closed tags, quoted attributes, and escaped reserved characters. Valid XML is well-formed and also conforms to a schema (DTD or XSD) that defines which elements, attributes, and data types are allowed.
How do I convert XML to JSON?
FileChange parses the XML tree and maps elements to JSON keys, repeated elements to arrays, and attributes to keys (often prefixed with @) directly in your browser. Note that XML features like namespaces, comments, and mixed content do not always map cleanly to JSON.
Why is XML considered verbose?
Every piece of data is wrapped in an opening and closing tag (for example <name>Alice</name>), so the same dataset is usually larger than its JSON or CSV equivalent. The upside of that verbosity is self-documenting structure and strong validation.
What are XSLT and XPath used for?
XPath is a query language that selects nodes within an XML document using path expressions. XSLT uses XPath to transform XML into another format — such as HTML, plain text, or a differently structured XML — making them the standard tools for reshaping and presenting XML data.