0

I'm working on a small querying module (in js) for html and I want to provide a generic query(selector) function supporting both, css selectors and XPath selectors as string argument.

Regardless of how each kind of selection is done, my problem here is how to identify whether a given string is an xpath or a css selector. We can assume that the function would be something like this:


function query(selector){
   selectorKind = identifySelectorKind(selector); // I want to know how to code this particular function

   if(selectorKind==="css") return queryCss(selector);
   if(selectorKind==="xPath") return queryXPath(selector); //Assume both functions exists and work
}

My first approach (given my limited knowledge of xPath queries) was to identify the query kind by checking if the first character is / (here I am assuming all relevant xPath queries begin with /)

So, identifySelectorKind would go a bit like this:

function identifySelectorKind(selector){
    if (selector[0] === "/") return "xPath";
    else return "css";
}

Note that I don't need to validate neither css nor xpath selectors, I only need an unambiguous way to differentiate them. Would this logic be enough? (in other words, all xPath selectors begin with / and no css selector begins the same way?), if not, is there a better way or some considerations I may want to know?

3 Answers 3

2

You can't necessarily. For example, * is a valid xpath and a valid css selector, but it matches a different set of elements in each.

Sign up to request clarification or add additional context in comments.

8 Comments

What's the difference between css *, xpath * and //* ?
css * and xpath //* match every element in the tree, xpath * matches child elements of the context node.
Would my assumptions be valid if all the queries are intended to work like in css (without context node)?
An xpath that matched the same elements independently of what the context node is, would have to start with a /, yes.
On reflection, there may be one case where a context node independent xpath doesn't start with a /. When it starts with ancestor-or-self::. For example, in an HTML document, ancestor-or-self::node()//body will match the body element regardless of what the context node is.
|
1

Searching only for / won't be enough, for sure!

Exemple CSS selector (that will be a false positive):
nav [itemtype="https://schema.org/BreadcrumbList"]

I'm writing also a utility function to either use querySelector or xpath, and need to differenciate the 2.

The problem here is that both syntax can have arbitrary strings in it:
xpath: //*[contains(text(),"string")]
css: *[some-attr="string"]

...so it's always possible to have, whatever char you use to descriminate, in both syntax. (A xpath string in css is valid, and so a css string in xpath):
xpath: //*[contains(text(),"a:hover:not(xpath)")]
css: *[xpath-attr="fuuu/xpath/also//here/*"]

The quick and dirty solution I found is to cut out first all the quoted strings, and then test for xpath only char (actually / or @).

const isXpath = str=>
    /[\/@]/.test(                     // find / or @ in
        str.split(/['"`]/)            // cut on any quote
            .filter( (s,i)=> !(i%2) ) // remove 1 on 2
            .join('')                 // string without quotes
    )


isXpath( 'nav [itemtype="https://schema.org/BreadcrumbList"] [itemtype="https://schema.org/ListItem"]' )
//> false 
// Actually search chars on "nav [itemtype=] [itemtype=]"

/!\ Note this is not perfect, and some cases will be confusing like the exemples given in this discussion * or div will fall back to CSS (isXpath = false). You may perfect quoted string cut out (what about escaped quotes?) and then xpath chars...

Comments

0

If you're absolutely sure your XPath selector will always begin with /, then yes, it's fine. Note that an XPath selector doesn't have to begin with a /, but if yours always selects from the root, then it's fine.

2 Comments

Well, I'm not 100% sure, I would like to know the cases on which an xpath queries wouldn't start by / or //, considering all are DOM queries
@angrykoala, Xpath selectors can also start with ( so like: duckduckgo.com and open web inspector console, enter: $x(`//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), ${"Duck".toLowerCase()})]`) which matches all elements containing "duck" (case-insensitive). but when elements are nested in others, we often want the deepest (last matching) by wrapping the query in () and appending [last()] like $x(`(//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), ${"Duck".toLowerCase()})])[last()]`)[0]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.