Until 2003, the word selenium was known only as a chemical element, but in 2004, it became the name of one of the most popular software testing frameworks. Initially designed for cross-browser end-to-end tests, Selenium is a powerful open-source browser automation platform that supports Java, Python, C#, Ruby, JavaScript, and Kotlin.
The name Selenium came from a joke in an email by its creator, Jason Huggins. Wishing to mock his competitor, Mercury Interactive Corporation, Huggins quipped that you could cure mercury poisoning by taking selenium supplements. Thus, the name Selenium caught on, and the rest, as they say, is history.
To build on your understanding of Selenium, it's important to familiarize yourself with XPath.
Introduction to XPath and its relevance to Selenium
XML Path Language (XPath) is a query language designed for selecting nodes from an XML document, which also applies to navigating HTML documents for web automation and testing. In the case of Selenium, XPath allows you to precisely locate elements on web pages, regardless of their position in the document structure. This precision is crucial for interacting with dynamically generated elements that lack unique identifiers.
XPath syntax and types of paths
XPath syntax can be used to navigate through the webpage or locate elements uniquely on the webpage. Below are the basic syntax, methods, and expressions using XPath.
Basic syntax
For you to be able to use XPath, there is some basic syntax you need to understand, such as:
//: This is used to start the path (like the root of your path)
tagName: This specifies the element you want to interact with. It can be input, div, button, etc.
@attribute: This helps you target a specific attribute of an element, such as, @id, @class.
[attribute='value']: You use this to match an element with the specified attribute value (e.g. [@id='username']).
Simple example
Locate an input element with the id "username": //input[@id='username']
Methods are not directly used with basic expressions; they provide XPath the capability of handling complex manipulations. A common method is contains(), which is used to find an element that contains a text or parameter specified in the query. Example: [contains(text(), 'Search')]. This will find the element that contains the text “search” within their content.
Expression
The major capability of XPath lies in its expression. The expression combines the syntax (tags, attributes, and methods) to find an element on the webpage. Understanding expressions gives you the strength to write your desired script to interact with web pages.
Types of paths in XPath
There are two main types of XPath paths:
Absolute XPath: Specifies the exact path from the root node to the desired element. Example: html/body/div[0]/input.
Relative XPath: Most commonly used, it starts with a double slash // and searches for the matching element anywhere in the document. Example: //input[@type='text']
Now that you know about the syntax and have an understanding of the structure, the next section will introduce you to using XPath in Selenium.
Locating elements using XPath
Basic syntax: Use driver.findElement(By.xpath("your_xpath_here")) to locate an element.
Using attributes: Locate an element by its attributes, e.g., //tagname[@attribute='value'].
Using text: Find elements with specific text using //tagname[text()='your_text'].
Contains: Use the contains() function to find elements containing specific attributes or text, e.g., //*[contains(@class, 'your-class-name')].
How to use XPath in Selenium
Using XPath with Selenium lets you locate web elements on a web page. It allows for both simple and complex queries to find elements by their attributes, text content, or hierarchical position. What follows is a guide to using XPath in Selenium, from basics to real-world examples.
Setting up your Selenium environment
Suppose you have a folder created already, and the folder is opened in your code editor or terminal. In that case, you can skip the process of changing the directory and install Selenium and Mocha but if you are opening the terminal to start afresh, you can follow the instructions below:
ℹ️
You will need to change your directory to your desired workspace on your system. For this article, you will use your desktop. Open your terminal and change your directory to the desktop folder, I assume opening the terminal takes you to the root location of your system, hence, cd desktop will change your directory to the desktop. Still in your terminal, use mkdir xpath-selenium to create a folder on the desktop called “xpath-selenium”. After creating the folder, change your directory to the folder you just created, cd xpath-selenium. You should be in the new folder now, right in your terminal, run npm init -y to set up node in the folder.
Install Selenium
Open your folder in your code editor, and using npm, using the code editor terminal install Selenium and Mocha with the command below:
npm install selenium-webdriver mocha
Mocha is a testing library that is great for Behaviour Driven Development (BDD).
Your package.json file should look like this:
{
"name": "xpath-selenium",
"version": "1.0.0",
"description": "This code is supporting my article on xPath in selenium",
"main": "index.js",
"type": "module",
"scripts": {
"test": "echo \\"Error: no test specified\\" && exit 1"
},
"keywords": ["xPath", "Selenium"],
"author": "ayodele",
"license": "ISC",
"dependencies": {
"selenium-webdriver": "^4.18.1",
"mocha": "^10.2.0"
}
}
Real-world examples
Finding a login button by ID:
Suppose the login button has an ID attribute of "loginButton".
XPath: //button[@id='loginButton'].
Use in Selenium: driver.findElement(By.xpath("//button[@id='loginButton']")).click();. This syntax will find the element and click.
Locating an input field by placeholder text:
If an input field has a placeholder attribute, "Enter your username".
XPath: //input[@placeholder='Enter your username'].
This locates the input field where users can enter their username.
Selecting Links by Text:
To find a link with the text "Forgot Password?".
XPath: //a[text()='Forgot Password?'].
Useful for locating and clicking on links within a page.
Finding Elements by Partial Text:
For elements that contain the text "Welcome" anywhere.
XPath: //*[contains(text(), 'Welcome')].
This is helpful when the full text may change or only part of it is known.
Using XPath Axes for Complex Relationships:
To find a button labeled "Submit" that is directly following the label "Email".
This uses XPath axes to navigate relationships between elements.
Practice
Create a file, test.js, inside the folder you created above.
import {Browser, Builder, By, Key} from 'selenium-webdriver'
let driver = await new Builder().forBrowser(Browser.CHROME).build();
try {
await driver.get("https://apify.com/store");
// Find the search field
const searchBar = await driver.findElement(By.xpath("//input[@data-test='actor-store-search']"));
// Type "web scrapper" in the search bar
await searchBar.sendKeys('web scraper', Key.ENTER);
let textToFind = 'Scraper';
const elements = await driver.findElements(By.xpath(`//*[contains(text(), '${textToFind}')]`));
// Step 3: Check if the text is present on the page
if (elements.length > 2) {
console.log(`Text "${textToFind}" is present on the page.`);
} else {
console.log(`Text "${textToFind}" is NOT present on the page.`);
}
} catch (error) {
console.error("Error:", error);
} finally {
await driver.quit();
}
To run the script in your terminal, run node test.js.
The script above will open a Chrome browser. Visit apify.com/store, find the search field by its data-test attribute, and type the word web scraper in the search field. The script will automatically click enter. It will then find if Scraper appears on the page
If successful, it will print a successful message in the console.
XPath functions and expressions
XPath functions and expressions provide different ways to query and interact with XML and HTML documents. They are useful in web automation with Selenium for locating elements in complex document structures. Here's an overview of some commonly used XPath functions and expressions, along with examples.
XPath functions
1. text(): Selects the text content of a node (element).
Example: //p[text()='Apify Blog'] selects elements with the text "Apify Blog".
2. contains(): Checks if a node contains a specific string.
Example: //div[contains(@class, 'error-message')] selects the* elements whose class attribute contains "error-message".
3. starts-with(): Checks if a node's text starts with a specific string.
Example: //input[starts-with(@name, 'user')] selects elements whose name attribute starts with "user".
4. normalize-space(): Strips leading and trailing whitespace from a string and replaces sequences of whitespace characters with a single space.
Example: //td[normalize-space(text())='Login'] selects elements with the text "Login", ignoring leading/trailing spaces.
5. not(): Returns true if its argument is false, and vice versa.
Example: //input[not(@disabled)] selects elements that are not disabled.
6. position(): Returns the position of a node in a set of nodes.
Example: //(//ul)[position()=1]//li selects <li> elements from the first <ul> in the document.
7. last(): Selects the last node in a set of nodes.
Example: //(//ul)//li[last()] selects the last <li> element in each <ul>.
XPath axes
XPath axes define the relationship between the current node and the nodes to be selected. They are used in combination with XPath functions to navigate the page structure.
1. ancestor: Selects all ancestors (parent, grandparent, etc.) of the current node.
Example: //input[@id='login']/ancestor::form selects the element that contains an input with the id "login".
2. descendant: Select all descendants (children, grandchildren, etc.) of the current node.
Example: //div[@class='container']/descendant::input selects all elements within a <div> with the class "container".
3. following: Selects everything in the document after the closing tag of the current node.
Example: //h2/following::p selects all <p> elements that follow an <h2> element.
4. preceding: Selects all nodes that appear before the current node in the document.
Example: //h2/preceding::p selects all elements that precede an <h2> element.
Using these functions and expressions in XPath queries helps you to precisely locate in Selenium, helping you handle simple and complex web automation tasks.
Pros and cons of using XPath in Selenium
While XPath is a useful tool for interacting with web elements in Selenium, it's important to know its disadvantages as well as its benefits.
Pros
XPath can target any element on a webpage because it leverages the underlying HTML structure. If an element exists, there's an XPath expression to find it.
XPath offers filtering capabilities. You can target elements based on various attributes, text content, position, or relationships with other elements. This is useful for complex web pages and dynamic content interaction.
Well-written XPath expressions can be clear and understandable, indicating which element is targeted. This helps code readability and maintainability.
Cons
A major drawback is that XPath expressions, especially absolute ones, can fail because a minor UI change can break your XPath if it relies on specific elements in the path.
Complex XPath expressions can be difficult to understand and maintain, especially for those unfamiliar with XPath syntax.
XPath expressions that are long and complex can slow down test execution compared to simpler selectors like IDs.
Useful tips
It's recommended to use IDs or CSS selectors for locating elements. These are more maintainable in the long run as they rely less on the specific HTML structure. You should also note that CSS selectors should only be relied on if it is human readable example: class=”primaryBtn” compared to class=”x938372sna”.
When using XPath, go for relative XPaths instead of absolute ones. Relative Xpaths target elements based on their position within a smaller section of the HTML tree, making them less prone to breaking UI changes.
Use XPath for situations where other selectors are unavailable or insufficient. For example, XPath can be useful for targeting elements based on text content or for complex filtering within web tables.
Use specific attributes to identify elements uniquely whenever possible.
Test your XPath expressions for performance and accuracy.
By understanding the strengths and limitations of XPath and employing it strategically, you can utilize its capabilities to effectively automate web interactions in Selenium while minimizing maintenance challenges. As you harness XPath's potential, it's essential to have the right tools at your disposal to streamline your workflow and enhance accuracy.
XPath tools
Below are some helpful tools for working with XPath expressions in Selenium. These can facilitate easier generation, testing, and debugging of your XPath selectors.
Browser extensions
Chrome DevTools (Chrome/Edge): Modern browsers like Chrome and Edge have built-in developer tools with element inspection capabilities. You can right-click on an element, select "Inspect," and then use the Elements panel to examine the HTML structure and copy XPath expressions.
XPath Helper (Firefox/Chrome): This browser extension for Firefox and Chrome simplifies XPath generation. It highlights elements on hover, allows you to copy XPath expressions, and even offers basic testing functionalities.
Online tools
FreeXPath Tester: This online tool lets you paste your HTML code and experiment with different XPath expressions. It highlights matching elements and provides feedback on the validity of your expressions.
XPath alternatives
XPath is an amazing tool but it's not the only option for interacting with web elements in Selenium. Here are some alternatives to consider:
Locator Strategy
Pros
Cons
CSS Selectors
It is widely supported by all browsers. It is often faster than XPath expressions.
It has limited filtering capabilities compared to XPath.
ID Selectors
With its unique and efficient way of locating elements, it is guaranteed to target the exact element.
Not all elements have unique IDs assigned.
Name Selectors
It is easier to understand than complex XPath expressions.
This is not always reliable, as names might not be unique to an element or readily available.
Class Name
This can easily be used to target multiple elements with the same class; which is more simpler than XPath.
Overusing this can lead to unintended element selection if multiple elements share the same class.
Link Text
This is good for targeting links based on their visible text.
For a dynamic web page where the link text may change. * Not suitable for links without displayed text.
📝 Note:
Prioritize IDs whenever available for the most efficient and reliable targeting.
Reserve XPath for situations where CSS selectors lack the necessary filtering capabilities.
Always strive for clear and concise locators, regardless of the chosen method.
Reasons to consider alternatives to XPath
These tools often provide built-in methods for interacting with web elements, eliminating the need for complex XPath expressions. They frequently offer features like:
Automatically adjusting locators if the underlying HTML structure changes slightly, reducing maintenance headaches compared to XPath.
Natural language-like commands for interacting with elements, improving readability compared to raw XPath syntax.
Handling asynchronous elements easily and page loads automatically, simplifying test code compared to manual waiting logic needed with XPath in Selenium.
Ultimately, these tools offer different strengths and weaknesses. Evaluating your needs will help you determine the best alternative to Selenium.
Alternatives to Selenium
Selenium is widely used, alternatives like Cypress, Playwright, and Puppeteer offer different approaches and capabilities, such as improved performance, simpler syntax, or better integration with modern development workflows.
Cypress: focuses on end-to-end testing within the browser, making it ideal for front-end focused automation. Cypress avoids the need for a complicated WebDriver setup and offers a visual test runner for better debugging. However, it primarily supports Chrome and lacks features like mobile device testing available in Selenium. You can learn more about the key differences between Selenium and Cypress here.
Playwright: Playwright supports multiple browsers (Chromium, Firefox, WebKit) natively and offers a recorder for generating test scripts. It leverages modern browser DevTools protocols for faster execution and integrates well with Node.js development environments. If you're wondering which tool to use for web scraping (Playwright or Selenium), you can read this article for more context.
Puppeteer: Puppeteer is strong at controlling Chrome DevTools features through Node.js, making it ideal for scraping or complex browser interactions. It has a strong ecosystem and tight integration with Chrome. If you are learning automation and deciding between Puppeteer and Selenium, you can read this guide to learn about the pros and cons of the two tools.
Conclusion and further reading
Selenium is a useful tool for automation and web scraping. It allows you to interact with web pages like a user would, such as navigating the page, clicking buttons, filling out forms, and collecting the specific data you need.
Ayodele is a Developer Relations engineer with experience in few other tech skills such as frontend, technical writing, early stage startup advisory, product management and consulting.
I used to write books. Then I took an arrow in the knee. Now I'm a technical content marketer, crafting tutorials for developers and conversion-focused content for SaaS.