How to parse XML in JavaScript (step-by-step guide)

Learn how to parse XML with JavaScript using methods like DOMParser, xml2js, and streams. Explore techniques for handling large files and writing XML data.

Content

JSON is the go-to format for most web applications today, but XML (Extensible Markup Language) is still alive and well. Many services continue to use this long-standing format to store and share structured data.

XML processing is often necessary when working with this format in your applications. Parsing converts raw XML into a structure your code can easily work with. JavaScript offers several approaches to XML parsing, including built-in browser APIs, Node.js solutions, and third-party libraries.

This guide explains these methods, helping you choose the right tool for your project's needs. We'll cover their strengths, use cases, and how to address common XML parsing challenges in JavaScript projects.

What is XML?

XML is a markup language for storing and sharing structured data. It's still widely used for data exchange between different systems and platforms.

XML and HTML share similarities in structure. Both use tags, attributes, and values to define elements in a document, creating a hierarchy. But they serve different purposes. HTML is designed to display data in web browsers, while XML is meant for storing and transferring data between systems.

XML is both human-readable and easy for machines to parse. This makes it useful for exchanging data between services, especially in Simple Object Access Protocol (SOAP) apps. It also creates sitemaps and writes configuration files, like pom.xml, for Maven projects.

XML allows developers to create custom tags, making it adaptable to many data structures. This flexibility has helped XML stay relevant in tech. While JSON is now more common in web apps, XML remains important in many areas of software development.

Understanding the structure of XML files

Before we get into the details of how to parse XML files, let’s first understand the different parts of an XML document.

An XML file typically includes the following elements:

  • Root element: The top-level element in an XML document. It serves as the starting point for navigating the XML structure, which contains all other elements as its children.
  • Namespace: XML namespaces are unique identifiers for elements and attributes that prevent naming conflicts when combining XML from different sources or creating complex documents. They keep elements distinct when similar names are used for various purposes. Typically represented as URLs, namespaces enhance understanding and interoperability across systems with context and information about the underlying organization or standard. For example:
<book xmlns="http://example.com/books">
  • Attributes: Provide additional information about an element. They are specified within the start tag of an element and consist of a name-value pair.
  • Elements: The primary building blocks of an XML document, representing the data or structure being described. They represent a structured piece of information in an XML file. Elements can be nested within other elements to create a hierarchical structure.
  • Text content: Refers to the textual data enclosed within an element’s start and end tags, which can include plain text, numbers, or other characters.

Here’s an example of an XML file:

<?xml version="1.0" encoding="UTF-8"?>
<vehicles xmlns="http://example.com/vehicles">
  <item>
    <name>Cars</name>
    <description>I love cars.</description>
    <models>
      <model>Range Rover</model>
      <model>Corolla</model>
      <model>BMW</model>
      <model>Toyota</model>
      <model>Tesla</model>
    </models>
  </item>
</vehicles>

These elements are organized hierarchically, making the document easy for humans and computers to understand.

You can use a tool like JSONformatter to view the tree-like structure of XML elements. This works for our example or any XML data you have. Seeing XML this way helps you grasp how elements are organized.

Now that we have covered the basic information about the XML file components, we can use this information to help parse an XML file with various libraries.

parse XML in JavaScript - Jsonformatter XML viewer

Parsing XML in JavaScript

XML parsing is the process of analyzing an XML document and extracting the required data.

Generally, XML parsers perform the following key steps:

  1. Syntax checking: The parser checks the XML document for compliance with the XML syntax rules. If any syntax errors are found, such as missing closing tags or incorrectly formatted attribute values, the parser will report these errors and may fail to continue processing the document.
  2. Tokenization: The parser breaks down the XML document into individual tokens, such as start tags, end tags, attribute names and values, and text content.
  3. Building the document structure: The parser constructs a hierarchical representation of the XML document, typically in a tree structure. From this structure, we can traverse this tree, query specific elements or attributes, and manipulate the data as needed.

JavaScript XML parsers

Creating a custom XML parse is generally challenging, particularly if you are implementing a custom solution.

XML files enforce stricter rules; for instance, a missing tag or attribute without quotes can make the file unusable. Moreover, Its self-describing nature makes it difficult to split an XML document into chunks that can be parsed in parallel.

Unless you have a compelling reason, it is more efficient to resort to standard and tested parser libraries and APIs to avoid the complexity of the parsing process.

JavaScript, like other languages, provides several APIs and libraries that you can use to parse XML files. Each one of them has its unique tradeoffs.

Some are optimized for speed, and some for memory; whichever you choose to work with will largely depend on your project requirements.

In this guide, we'll cover the following:

  1. DOMParser API
  2. xml2js
  3. Using Streams
  4. SAX Parser

Method 1: Using the DOMParser API

The Document Object Model (DOM) represents the structure of a document as a tree of nodes, where each node corresponds to elements, attributes, or values in the markup.

This tree structure allows parsers to programmatically access, manipulate, and modify XML content.

JavaScript provides the DOMParser API, which provides a method to parse XML content to XML documents that can be easily traversed and accessed for further processing.

Now, let's take a look at how we can parse XML using this API. First, create a local development environment that we'll use to work on the code examples while following along with this guide.

To do that, in your terminal, run the following commands to create a demo project directory and two files: index.js and a test.xml.

mkdir parsing-xml
cd parsing-xml

# For Linux/Unix systems:
touch index.js test.xml

# For Windows:
_echo. > index.js_
echo. > test.xml

Then, in the test.xml, paste in the XML example we looked at as follows:

<?xml version="1.0" encoding="UTF-8"?>
<vehicles xmlns="http://example.com/vehicles">
  <item>
    <name>Cars</name>
    <description>I love cars.</description>
    <models>
      <model>Range Rover</model>
      <model>Corolla</model>
      <model>BMW</model>
      <model>Toyota</model>
      <model>Tesla</model>
    </models>
  </item>
</vehicles>

To parse XML data using the DOMParser API, there are a few important points to note.

The DOMParser API is a native Web API supported in modern browsers. Even so, when working with XML files, you cannot directly parse them using the DOMParser API instance method, that is, parseFromString()— the method expects XML strings as input, rather than a file path or URL.

In other words, it can't read files from the file system or make requests to a server to retrieve the XML files. Instead, it's designed to work with strings of XML source code that are already available in memory.

And since the expectation is that you would be working with large XML files that require parsing before any additional processing, it's important to properly load the files. To do that, you'll need to:

  1. Fetch the XML file using the fetch API or another method.
  2. Convert the fetched XML file into a string.
  3. Lastly, pass the string to the DOMParser API instance method, parseFromString() for parsing.

Once you have covered these steps, you can access the parsed XML using DOM methods.

Let's look at an example of how the code might look. In the index.js file, paste in the following code:

async function loadXML() {
    try {
        const response = await fetch('test.xml');
        const xmlString = await response.text();
        const parser = new DOMParser();
        const xmlDoc = parser.parseFromString(xmlString, 'text/xml');

        const name = xmlDoc.querySelector('name').textContent;
        console.log(name);
    } catch (error) {
        console.error('Error loading XML:', error);
    }
}

loadXML()

Now, in the same project directory, create an index.html file and add this code:

<!DOCTYPE html>
<html>
<head>
    <title>XML Parser</title>
</head>
<body>
    <script src="index.js"></script>
</body>
</html>

At this point, to display the parsed name, open this index.htmlfile in the browser, navigate to the browser's console in the developer tools, and hopefully view the output.

This approach is technically correct. However, since we are using Fetch API to load the XML file in your local environment, it won't work, and the browser will most likely throw an error. This is because the files haven’t been served by a web server.

Modern browsers use a Same-Origin security policy. This policy prevents web pages from making requests to domains different from the one serving the page. We run into this policy when we use the Fetch API to load an XML file locally. It's a security measure, but it can trip up local development.

When you open an HTML file directly from your file system, the browser treats it as having a unique origin. Any attempt to fetch local files (including the XML file) using JavaScript's Fetch API will be blocked by the Same-Origin Policy. The browser will likely throw a Cross-Origin Request (CORS) Blocked error or a similar security-related error.

To overcome this limitation, we need to serve our files from a web server. This approach makes sure that all our files (HTML, JavaScript, and XML) are served from the same origin, satisfying the Same-Origin Policy requirements.

There are several ways to achieve this:

  1. Using a local development server: Tools like Node.js with Express can be used to set up a simple local server.
  2. Alternatively, run npx http-server in your terminal to serve your files.
  3. Code editor extensions such as Live Server extension can spin up a local server with minimal configuration. This will simulate the same server-side environment in your code editors as an alternative to serve your files.

For this demo, we'll use the Live Server extension in Visual Studio Code. This extension provides a quick and easy way to serve our files locally without complex server setups.

To use Live Server:

  1. Open Visual Studio Code and navigate to the Extensions tab.
  2. Search for Live Server in the extensions marketplace.
  3. Click on the Install button for the Live Server extension.
  4. Once installed, you'll see a Go Live button in the status bar at the bottom of your VS Code window.
  5. Click on the Go Live button to start the local server and automatically open your index.html file in your default web browser.
parse XML in JavaScript - Live server

With the local server running, your browser can now fetch the XML file.

To open the browser's console, you need to access the browser's developer tools. To do that, right-click, select Inspect, and click the Console tab.

Here, you should see the output of your XML parsing script.

parse XML in JavaScript - output xml parsing script

This setup lets us use the DOMParser API in a browser to parse XML files directly. This basic step sets the stage for more complex XML processing tasks.

The example above has solely focused on parsing XML content in the browser environment. It's worth noting that the same process is also feasible in server-side environments — you can still parse XML data using the same DOMParser API method.

This is possible using packages like the XMLDOM — it provides a DOMParser implementation, allowing you to parse XML in Node.js server environments.

To use xmldom, we need to first set up a local server-side development environment. To do so, you'll need to first install Node.js and the Node package manager, npm.

First, create a new project directory for this example, or you can use your current working directory. Then, initialize a new Node.js project by running the following command in your terminal:

npm init --y

Next, install xmldom.

npm install @xmldom/xmldom

Now, in the root directory of your project folder, create a new app.js file.

# For Linux/Unix systems:
touch app.js

# For Windows:
_echo. > app.js_

Then, paste in the following code:

import fs from 'fs';
import { DOMParser } from '@xmldom/xmldom';

const xml = fs.readFileSync('test.xml', 'utf8');
const parser = new DOMParser();
const doc = parser.parseFromString(xml, 'application/xml');

console.log(doc.getElementsByTagName('name')[0].textContent);

In this example, we perform two main operations:

  1. Use the Node.js fs module to synchronously read the entire XML file (test.xml) into memory as a string.
  2. Create an instance of DOMParser and use its parseFromString method to parse the XML data.

Once parsed, we can access individual elements using DOM traversal methods. In this case, we retrieve the text content of the first element using getElementsByTagName and textContent.

Make sure to update the package.json file by adding the "type": "module" property to use ES6 syntax in your files.

You can run this command on your terminal to test this minimal setup.

node app.js

The output will be the text content of the first element in the XML file, that is: Cars.

Method 2: Using XML2JS

The xml2js library is a popular Node.js package for parsing XML data. Unlike the DOMParser, which provides a DOM representation, xml2js focuses on transforming XML into a more usable format, making it easier to work with in JavaScript applications.

It's particularly useful for scenarios where you want to quickly access data in your APIs or other services without necessarily dealing with the complexities of manipulating the DOM to access the required data.

Essentially, xml2js allows you to parse XML directly into JSON format. Since JSON is a common format used in applications, your consumers ( APIs, clients, etc.) can easily ingest and use the parsed data.

To use xml2js, first, install the package:

npm install xml2js

Once installed, to parse XML files, start by reading the file using the fs module and then use xml2js to convert the XML string into a JavaScript object.

Here's an example of how to do this:

import fs from 'fs';
import { parseString } from 'xml2js';

const xml = fs.readFileSync('test.xml', 'utf8');

parseString(xml, (err, result) => {
    if (err) {
        console.error('Error parsing XML:', err);
        return;
    }
    console.log(result);
});

In this code snippet, we read the contents of test.xml and then use the parseString function to convert the XML string into a JavaScript object (JSON object).

When you run this code, you will get an output like:

{
vehicles: { '$': { xmlns: '[http://example.com/vehicles](http://example.com/vehicles)' }, item: [ [Object] ] }
}

This is not entirely an error. Don't worry! The reason for this kind of output is that the parseString method utilizes Node.js's default behavior, that is, logging large or nested objects often limits the depth of inspection, resulting in the [Object] outputs (this behavior is highlighted in xml2js GitHub repository).

To see the complete structure of the parsed XML, you can use the util.inspect method. Now, you can get a more detailed structure of the JSON object.

Go ahead and import the util module, wrap the result object in the util.inspect method, and then log it in the terminal.

import util from 'util';
console.log(util.inspect(result, { depth: null, colors: true }));

This will give you a detailed view of the parsed XML object, including all nested properties.

Let's say you want to access individual items from this parsed object. To do that, you can loop through each of the nested items and log each one of them as follows:

    const items = result.vehicles.item;
    items.forEach(item => {
        console.log('Name:', item.name[0]);
        console.log('Description:', item.description[0]);
        console.log('Models:', item.models[0].model[0]);
    });

XMLDOM vs XML2JS: what's the difference?

Now, we have looked at two great parsing libraries, xmldom and xml2js. Nonetheless, there are important tradeoffs worth considering while working with the two in your parsing processes.

For starters, xmldom provides a DOM representation of XML as an output. To access and manipulate the data, you need to traverse through the DOM tree using standard DOM methods.

In contrast, xml2js transforms XML directly into JavaScript objects. This simplifies data access and manipulation. This approach is more intuitive, allowing straightforward interaction with the data without needing to navigate a DOM tree.

Performance matters in all software processes, including XML parsing. With large XML documents, parsing into a DOM can use many resources. Managing a full DOM structure may slow things down, especially with deeply nested XML.

On the other hand, xml2js is generally faster for parsing and accessing data, as it directly converts XML into JavaScript objects without the additional overhead of a DOM representation. This can lead to better performance in applications that primarily read XML data.

Method 3: Parsing large XML files using the Stream module

When dealing with large XML files, loading the entire document into memory can be inefficient and cause performance bottlenecks. To handle large XML files efficiently, you can use the Stream module to process the XML data in chunks instead of loading it all at once.

The Stream module is a built-in Node.js module that provides a way to handle streaming data. It allows you to read or write data in smaller chunks, which can be particularly useful when working with large files or data streams.

To test the Stream module, go ahead and create a new large.xml file and include this content. It's not ideally a large XML file, but for this demo, we'll work with it:

<?xml version="1.0" encoding="UTF-8"?>
<vehicles xmlns="http://example.com/vehicles">
    <vehicle type="http://example.com/car">
        <name>Cars</name>
        <description>I love cars.</description>
        <models>
            <model>Range Rover</model>
            <model>Corolla</model>
            <model>BMW</model>
            <model>Toyota</model>
            <model>Tesla</model>
        </models>
    </vehicle>
    <vehicle type="http://example.com/motorcycle">
        <name>Motorcycles</name>
        <description>I also enjoy riding motorcycles.</description>
        <models>
            <model>Harley-Davidson</model>
            <model>Honda</model>
            <model>Yamaha</model>
            <model>Ducati</model>
            <model>Triumph</model>
        </models>
    </vehicle>
    <vehicle type="http://example.com/minivan">
        <name>Minivans</name>
        <description>Minivans .</description>
        <models>
            <model>Honda Odyssey</model>
            <model>Toyota Sienna</model>
            <model>Chrysler Pacifica</model>
            <model>Kia Carnival</model>
            <model>Dodge Grand Caravan</model>
        </models>
    </vehicle>
</vehicles>

Next, create a new file called test-stream.js in the same directory. Copy and paste the following code into that file.

Also, since xml2js does a great job of handling large XML data, we can easily combine the two like this:

import fs from 'fs';
import { Parser } from 'xml2js';
import util from 'util';

const parser = new Parser();

const readStream = fs.createReadStream('large.xml', 'utf8');

readStream.on('data', (chunk) => {
     parser.parseString(chunk, (err, result) => {
        if (err) {
            console.error('Error parsing XML:', err);
            return;
        }
        console.log('Parsed data:', util.inspect(result, { depth: null, colors: true }));
    });
});

readStream.on('end', () => {
    console.log('process completed');
});

readStream.on('error', (err) => {
    console.error('Error:', err);
});

In this example, we import the fs module and the parseString function from the xml2js library. We create a new parseString.Parser instance, which will be used to parse the XML data.

Next, we create a ReadStream using fs.createReadStream and pass the file path ('large.xml') and encoding ('utf8'). This stream will read the file in chunks.

We then listen for the data event on the readStream, which is emitted whenever a new chunk of data is received. We can then call the parseString method on the parser instance, passing the chunk of data for parsing. If everything goes smoothly, we take the parsed result and log it out.

Spin up the node server by using node test-stream.js command to view the results in your terminal.

This implementation efficiently processes the example large XML file without loading the entire file into memory at once. The data is read in smaller chunks, processed, and parsed usingxml2js.

For production apps, XML files can be much larger than our example. These often need processing before use. If you're not careful, this can overwhelm your app and slow everything down. Handling large XML files efficiently is important to keep your system running smoothly.

Using streams to read and parse data can help avoid these issues. You can also use packages like Sax.js, which processes XML as a stream. These methods let you handle large XML files more efficiently, keeping your app responsive.

Writing to XML files

Let's switch gears and look at creating XML files from scratch. This means writing data to XML files. The process involves building an XML structure and then serializing it into a file.

There are a few libraries you can use to achieve this, including, xml2js and xmldom for more complex XML generation.

Let's take a look at an example using xml2js. This library provides a Builder class that makes it easy to convert JavaScript objects into XML.

For this example, create a new file called write-xml.js. We'll use this file to write our XML data.

First, import the fs module and the Builder class from xml2js.

import fs from 'fs';
import { Builder } from 'xml2js';

Next, create a JavaScript object that represents the XML structure you want to create. This object should have the same structure as the XML file we want to generate.

const obj = {
  vehicles: {
    item: [
      {
        $: { name: 'Car' },
        carDetails: 'Details about the car'
      },
      {
        $: { name: 'Bike' },
        bikeDetails: 'Details about the bike'
      }
    ]
  }
};

Now, use the Builderinstance to convert the JavaScript object into an XML string as follows:

const builder = new Builder();
const xml = builder.buildObject(obj);

Finally, write the XML string to a file using fs.writeFileSync:

fs.writeFileSync('output.xml', xml, 'utf8');
console.log('XML file written.');

And that's It! That's how you create an XML file from scratch using xml2js. This is a simple example but highlights the basic steps in writing data to an XML file.

Common issues and solutions for parsing XML in JavaScript

There are many tools for parsing XML. Whether you're working with small or large files, here are a few key points in mind, along with some common issues with XML parsing and how to fix them:

  1. Choose the right library: The first step is to choose the right XML parsing tool for your project. Different libraries arguably have strengths and weaknesses, some of which overlap. Make sure to consider the important tradeoffs and the target parsing requirements.
  2. Handling namespaces: XML namespaces add complexity by uniquely identifying elements. When parsing namespaced XML, choose a library with namespace support and validate the XML structure against its schema. These steps prevent errors and ensure correct data handling. Online tools can assist with schema validation, improving your application's XML processing capabilities.
  3. Parsing ambiguous or incomplete XML: This often happens due to missing elements or attributes, or when the XML isn't properly structured. Add error-handling tools like try-catch blocks or error callbacks to your code to handle these issues. These tools catch and manage unexpected XML data, which helps prevent your app from crashing.
  4. Performance issues with large files: This can happen when the file is too large to fit into memory, or when the parsing process takes too long. To fix this, use streaming approaches. These let you parse XML data in chunks instead of loading the whole file at once.

Where to go from here

XML parsing has many practical uses. You might explore web scraping with JavaScript and Node.js. Or you could work on more complex applications that process XML data. The skills you've learned here will be a solid foundation for handling structured data in your future JavaScript projects.

You can also check out these additional resources to learn more:

Brian Wachira
Brian Wachira
Building, learning, and writing about different technologies. And yes, occasionally sipping coffee too!

Get started now

Step up your web scraping and automation