DEV Community

Cover image for Practical Puppeteer: How to evaluate XPath expression
Sony AK
Sony AK

Posted on

Practical Puppeteer: How to evaluate XPath expression

Today I will share about how to evaluate XPath expression in Puppeteer using $x API and in addition we will also use waitForXPath API.

Before I learn Puppeteer, I mostly use XPath on PHP through their DOMXPath class and I found it very useful for doing element selector things. I feel comfortable and easy when using XPath expression rather than using CSS selector, it's just my personal opinion, sorry :)

For those who don't know XPath, here is according to Wikipedia

XPath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

In Puppeteer there are two API that related to XPath. One is waitForXPath that same like waitForSelector. The purpose is the same, it wait for element to appear based on our XPath expression. The second is $x method that useful for evaluating XPath expression. The $x will return array of ElementHandle and I will show you the sample later.

Stop the boring things. Let's start with a scenario. I have a website it's called Lamudi in Indonesia https://www.lamudi.co.id/newdevelopments/ and I want to get/scrape the value based on selector show below.

Alt Text

Our target is this selector. I want to get the 160 value.

<span class="CountTitle-number">160</span>
Enter fullscreen mode Exit fullscreen mode

Usually we can use CSS selector like document.querySelector('span[class="CountTitle-number"]') but alternatively now we are using XPath expression like this //span[@class="CountTitle-number"].

On Developer tools console we can get this selector easily. Try type this on Developer tools on your browser.

$x('//span[@class="CountTitle-number"]');  
Enter fullscreen mode Exit fullscreen mode

The image result is like below.

Alt Text

OK nice, now we already get the ElementHandle from that XPath expression. OK now let's create the script on that use Puppeteer to get this selector text content.

Preparation

npm i puppeteer
Enter fullscreen mode Exit fullscreen mode

The code

The code is self explanatory and I hope you can adjust, expand or improvise for your specific needs later.

File puppeteer_xpath.js

const puppeteer = require('puppeteer');

(async () => {
    // set some options (set headless to false so we can see 
    // this automated browsing experience)
    let launchOptions = { headless: false, args: ['--start-maximized'] };

    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to the target web
    await page.goto('https://www.lamudi.co.id/newdevelopments/');

    // wait for element defined by XPath appear in page
    await page.waitForXPath("(//span[@class='CountTitle-number'])[1]");

    // evaluate XPath expression of the target selector (it return array of ElementHandle)
    let elHandle = await page.$x("(//span[@class='CountTitle-number'])[1]");

    // prepare to get the textContent of the selector above (use page.evaluate)
    let lamudiNewPropertyCount = await page.evaluate(el => el.textContent, elHandle[0]);

    console.log('Total Property Number is:', lamudiNewPropertyCount);

    // close the browser
    await browser.close();
})();
Enter fullscreen mode Exit fullscreen mode

Run it

node puppeteer_xpath.js
Enter fullscreen mode Exit fullscreen mode

If everything OK it will display the result like below.

Total Property Number is: 160
Enter fullscreen mode Exit fullscreen mode

Conclusion

I think Puppeteer support for XPath will be very useful for data scraping, since sometimes it's hard to write CSS selector for specific use case.

Thank you and I hope you enjoy it. See you again on next Practical Puppeteer series.

Source code of this sample is available on GitHub https://github.com/sonyarianto/xpath-on-puppeteer.git

Reference

Top comments (8)

Collapse
 
tohodo profile image
Tommy

You actually don't need this line:

let elHandle = await page.$x("(//span[@class='CountTitle-number'])[1]");
Enter fullscreen mode Exit fullscreen mode

More concise:

const element = await page.waitForXPath("(//span[@class='CountTitle-number'])[1]");
const lamudiNewPropertyCount = await page.evaluate(el => el.textContent, element);
Enter fullscreen mode Exit fullscreen mode
Collapse
 
sonyarianto profile image
Sony AK

Thanks @tohodo nice, noted

Collapse
 
sabberworm profile image
Raphael Schweikert • Edited

Thanks for this. I love XPath for these kinds of use-cases.
Yes, CSS selectors can be simpler and well-understood but they are also restricted on purpose to have good run-time characteristics to not bog down the browser for dynamic updates.
So there’s lots of things you can do with XPath that’s simply not possible with selectors (like finding text nodes or using axes to select up the tree instead of down.

Collapse
 
sonyarianto profile image
Sony AK

totally agree with this, XPath to the rescue and full flexibility :)

Collapse
 
djkramnik profile image
David K Gurr

thanks, this was helpful

Collapse
 
sonyarianto profile image
Sony AK

you are welcome :)

Collapse
 
ztesterparadise2 profile image
ztesterparadise2

Thank you so much, good sir!
Struggled to find so well arranged and simply put infromation for days

Collapse
 
alucian profile image
Alucian Corrêa • Edited

Thank you sir.

But if the XPATH does not exist, is it possible to fix this? To tract that... Can you help me?

Like

if doenst exists do this

If exista do that

Thank you.