0

I am working on a node application, i need a regex to match the url pattern and get information out of the url, suggest the possible solutions.

This are the url patterns:
1) www.mysite.com/Paper/cat_CG10
2) www.mysite.com/White-Copy-Printer-Paper/cat_DP5027
3) www.mysite.com/pen/directory_pen?
4) www.mysite.com/Paper-Mate-Profile-Retractable-Ballpoint-Pens-Bold-Point-Black-Dozen/product_612884
5) www.mysite.com/22222/directory_22222?categoryId=12328

These is what is want from the above url:
1) name= "cat" value="CG10"
2) name= "cat" value="DP5027"
3) name= "directory" value ="pen"
4) name="product" value ="612884"
5) name="directory" value="22222" params = {categoryId : 12328}

I want a regex which can match the url pattern and get the values like name, value and params out of the urls.

3 Answers 3

1

This function does the trick for the urls and desired matches you've provided. It will also parse out an infinite number of query parameters.

Fiddle: http://jsfiddle.net/8a9nK/

function parseUrl(url)
{
    var split = /^.*\/(cat|directory|product)_([^?]*)\??(.*)$/gi.exec(url);
    var final_params = {};
    split[3].split('&').forEach(function(pair){
       var ps = pair.split('=');
       final_params[ps[0]] = ps[1];
    });
    return {
        name: split[1], 
        value: split[2], 
        params: final_params
    };
}

Explanation

^ Start from the beginning of the string
.* Match any number of anything (The beginning of the url we don't care about)
\/ Match a single backslash (The last one before the things we care about)
(cat|directory|product) Match and capture the word cat OR directory OR product (This is our name)
_ Match an underscore (The character separating our name and value)
([^?]*) Match and capture any number of anything EXCEPT a question mark (This is our value)
\?? Match a question mark if it exists, otherwise don't worry about it (The start of a potential query string)
(.*) Match and capture any number of anything (This is the entire query string that we will split into param later)
$ Match the end of the string

Sign up to request clarification or add additional context in comments.

6 Comments

can you explain this part "/^.*\/(cat|directory|product)_([^?]*)\??(.*)$/gi"
Yep, the solution has been modified to add an explanation.
if isPartial is present how to get just that query parameter www.mysite.com/22222/directory_22222?categoryId=12328&isPartial=true
The function will return an object with the properties: name, value, and params. You can access the isPartial parameter like this var obj = parseUrl('www.mysite.com/22222/directory_22222?categoryId=12328&isPartial=true'); var isPartial = obj.params.isPartial
i have url like staples.com/Duracell-DL2430-30-Volt-Lithium-Battery/…', and my regex is /^.*\/(cat|directory|product)_([^?#]*)\??(.*)$/gi i am getting result like {"name":"product","value":"448696","params":{"#id":"'dropdown_37439'"}}, my expected result is {"name":"product","value":"448696","params":{"id":"'dropdown_37439'"}}
|
0

The regex below would have in its match groups 1 & 2 the desired values

/^\/[^\/]+\/([^_]+)_([^\/_?]+).*$/

Explained piece by peace on the string /HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌​h-Screen-Refurbished-Laptop/product_8000:

  • ^: from beginning
  • \/: match a /
  • [^\/]+: match everything until a / (HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌​h-Screen-Refurbished-Laptop)
  • \/: match a /
  • ([^_]+) match and capture the value before the _ (product)
  • _: match a _
  • ([^\/_?]+) match and capture the value after the _ stopped by a ?, _ or / (8000)
  • .* match until the end - if there is anything
  • $ end

Example:

var re = /^[^\/]+\/[^\/]+\/([^_]+)_([^\/_?]+).*$/;
var matches = re.exec('www.mysite.com/22222/directory_22222?categoryId=12328');
console.log(matches.splice(1));

output:

["directory", "22222"]

7 Comments

i tried the example you have given. var match = req.url.match(/^[^\/]+\/[^\/]+\/([^_]+)_([^\/_?]+).*$/); console.log(match); i am getting null in console, what could be the possible reason?
I should see the url that did not match as well to give you an appropriate answer
this is the url "/HP-ENVY-TouchSmart-m7-j010dx-173-Touch-Screen-Refurbished-Laptop/product_8000" which i get from req.url in nodejs request object.
well of course, since in your question the provided url examples were starting with www.mysite.com and my regex takes that into consideration as well (see example)
Here, updated the regex: /^\/[^\/]+\/([^_]+)_([^\/_?]+).*$/.exec("/HP-ENVY-TouchSmart-m7-j010dx-173-Touch-Screen-Refurbished-Laptop/product_8000" ).splice(1)
|
0

use the url module to help you, not everything needs to be done with a regex :)

var uri = require( 'url' ).parse( 'www.mysite.com/22222/directory_22222?categoryId=12328', true );

which yields (with other stuff):

{ 
  query: { categoryId: '12328' },
  pathname: 'www.mysite.com/22222/directory_22222'
}

now to get your last part:

uri.pathParams = {};
uri.pathname.split('/').pop().split('_').forEach( function( val, ix, all ){
    (ix&1) && ( uri.pathParams[ all[ix-1] ] = val );
} );

which yields:

{ 
  query: { categoryId: '12328' },
  pathParams: { directory: '22222 },

  ... a bunch of other stuff you don't seem to care about
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.