Identify css selector string vs XPath string

Question

I'm working on a small querying module (in js) for html and I want to provide a generic query(selector) function supporting both, css selectors and XPath selectors as string argument.

Regardless of how each kind of selection is done, my problem here is how to identify whether a given string is an xpath or a css selector. We can assume that the function would be something like this:


function query(selector){
   selectorKind = identifySelectorKind(selector); // I want to know how to code this particular function

   if(selectorKind==="css") return queryCss(selector);
   if(selectorKind==="xPath") return queryXPath(selector); //Assume both functions exists and work
}

My first approach (given my limited knowledge of xPath queries) was to identify the query kind by checking if the first character is / (here I am assuming all relevant xPath queries begin with /)

So, identifySelectorKind would go a bit like this:

function identifySelectorKind(selector){
    if (selector[0] === "/") return "xPath";
    else return "css";
}

Note that I don't need to validate neither css nor xpath selectors, I only need an unambiguous way to differentiate them. Would this logic be enough? (in other words, all xPath selectors begin with / and no css selector begins the same way?), if not, is there a better way or some considerations I may want to know?

Alohci · Accepted Answer · 2019-04-14 23:02:09Z

2

You can't necessarily. For example, * is a valid xpath and a valid css selector, but it matches a different set of elements in each.

answered Apr 14, 2019 at 23:02

Alohci

84.2k16 gold badges120 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

angrykoala Over a year ago

What's the difference between css *, xpath * and //* ?

Alohci Over a year ago

css * and xpath //* match every element in the tree, xpath * matches child elements of the context node.

angrykoala Over a year ago

Would my assumptions be valid if all the queries are intended to work like in css (without context node)?

Alohci Over a year ago

An xpath that matched the same elements independently of what the context node is, would have to start with a /, yes.

Alohci Over a year ago

On reflection, there may be one case where a context node independent xpath doesn't start with a /. When it starts with ancestor-or-self::. For example, in an HTML document, ancestor-or-self::node()//body will match the body element regardless of what the context node is.

|

Thomas Di G · Accepted Answer · 2023-06-17 11:40:26Z

Searching only for / won't be enough, for sure!

Exemple CSS selector (that will be a false positive):
nav [itemtype="https://schema.org/BreadcrumbList"]

I'm writing also a utility function to either use querySelector or xpath, and need to differenciate the 2.

The problem here is that both syntax can have arbitrary strings in it:
xpath: //*[contains(text(),"string")]
css: *[some-attr="string"]

...so it's always possible to have, whatever char you use to descriminate, in both syntax. (A xpath string in css is valid, and so a css string in xpath):
xpath: //*[contains(text(),"a:hover:not(xpath)")]
css: *[xpath-attr="fuuu/xpath/also//here/*"]

The quick and dirty solution I found is to cut out first all the quoted strings, and then test for xpath only char (actually / or @).

const isXpath = str=>
    /[\/@]/.test(                     // find / or @ in
        str.split(/['"`]/)            // cut on any quote
            .filter( (s,i)=> !(i%2) ) // remove 1 on 2
            .join('')                 // string without quotes
    )


isXpath( 'nav [itemtype="https://schema.org/BreadcrumbList"] [itemtype="https://schema.org/ListItem"]' )
//> false 
// Actually search chars on "nav [itemtype=] [itemtype=]"

/!\ Note this is not perfect, and some cases will be confusing like the exemples given in this discussion * or div will fall back to CSS (isXpath = false). You may perfect quoted string cut out (what about escaped quotes?) and then xpath chars...

Jack Bashford · Accepted Answer · 2019-04-14 22:35:58Z

0

If you're absolutely sure your XPath selector will always begin with /, then yes, it's fine. Note that an XPath selector doesn't have to begin with a /, but if yours always selects from the root, then it's fine.

answered Apr 14, 2019 at 22:35

Jack Bashford

44.3k11 gold badges56 silver badges84 bronze badges

2 Comments

angrykoala Over a year ago

Well, I'm not 100% sure, I would like to know the cases on which an xpath queries wouldn't start by / or //, considering all are DOM queries

Zack Morris Over a year ago

@angrykoala, Xpath selectors can also start with ( so like: duckduckgo.com and open web inspector console, enter:

$x(`//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), ${"Duck".toLowerCase()})]`)

which matches all elements containing "duck" (case-insensitive). but when elements are nested in others, we often want the deepest (last matching) by wrapping the query in () and appending [last()] like

$x(`(//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), ${"Duck".toLowerCase()})])[last()]`)[0]

Collectives™ on Stack Overflow

Identify css selector string vs XPath string

3 Answers 3

8 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related