How to extract value from a url - node.js

I have a few requests coming in which follow the pattern below
contacts/id/
contacts/x/id/name
contacts/x/y/id/address
contacts/z/address/
I want to extract the value which follows right after 'contacts'
In above cases,
1. id
2. x
3. x
4. z
Here is my regex
(?<=contacts)\/[^\/]+
https://regex101.com/r/ePmv5Y/1
But it is matching along with the trailing '/' for eg. /id, /x etc
How do I optimize to get rid of this trailing slash?

We can use match() here:
var urls = ["contacts/id/", "contacts/x/id/name", "contacts/x/y/id/address", "contacts/z/address/"];
for (var i=0; i < urls.length; ++i) {
var output = urls[i].match(/\bcontacts\/(.*?)\//)[1];
console.log(urls[i] + " => " + output);
}

I have a few requests coming in
If you mean http requests, then this is likely the pathname of the requested URL, and they'll start with a /. (This is the value of req.url in a Node.js server.)
To match on a URL pathname, you can use this expression: ^\/contacts\/([^/?]+). Here's a link to another regular expression builder that demonstrates it and includes an explanation for every character: https://regexr.com/6tugf
The [^/?] is a negated set that matches any token which is not a / or a ? and the + means that it matches 1 or more of those tokens. It's important to include the ? because otherwise it could match into the query string portion of the URL — for example, in this URL:
https://domain.tld/contacts/x/id/name?filter=recent # URL
/contacts/x/id/name?filter=recent # req.url in Node.js
/contacts/x/id/name # pathname
?filter=recent # query string
And here's a runnable code snippet demonstrating the same expression, using String.prototype.match():
const contactIdRegexp = /^\/contacts\/([^/?]+)/;
const inputs = [
'/contacts/id/', // id
'/contacts/x/id/name', // x
'/contacts/x/y/id/address', // x
'/contacts/z/address/', // z
'/contacts/x/id/name?filter=recent', // x
];
for (const str of inputs) {
const id = str.match(contactIdRegexp)?.[1];
console.log(id);
}

You can add the / inside the lookbehind:
(?<=contacts\/)[^\/]+
See a regex demo.

If you like to continue without regex, You can try below.
//get the URL object.
const url = new URL(`${req.protocol}://${req.get('host')}${req.originalUrl}`);
//extract the pathname and split using "/"
const pathName= url.pathname.split("/");
//get the required value using array index.
const val = pathName[2];

Related

Is there a regex to be able to match two url's , one that has a wildcard and one that doesn't?

I am writing a program in Nodejs with the following scenarios.
I have an array of url's that include wildcards, such as the following:
https://*.example.com/example/login
http://www.example2.com/*/example2/callback
Secondly, I have an incoming redirect url that I need to validate matches what is in the array of url's above. I was wondering if there was a way using Regex or anything else that I can use something like arr.includes(incomingRedirectUrl) and compare the two.
I can match non-wildcard url's using array.includes(incomingRedirectUrl), but when it comes to matching the array that has wildcards, I cannot think of a solution.
For example,
https://x.example.com/example/login should work because it matches the first url in the above example, only replacing the "*" with the x.
Is there a way I can achieve this? Or do I have to break down the url's using something like slice at the "*" to compare the two?
Thanks in advance for any help.
for (let i = 0; i < arr.length; i++) {
if (arr[i].indexOf('*') !== -1) {
wildcardArr.push(arr[i]);
} else {
noWildcardArr.push(arr[i]);
}
}
***Note, the reason I check noWildcardArr first is because most of the validate redirect url's do not contain wildcard
if (noWildcardArr.includes(incomingRedirectUrl)) {
//Validated correct url, proceed with the next part of my code (this part already works)
} else if (wildcardArr.includes(incomingRedirectUrl)) {
//need to figure out this logic here, not sure if the above is possible without formatting wildcardArr but url should be validated if url matches with wildcard
} else {
log.error('authorize: Bad Request - Invalid Redirect URL');
context.res = {
status: 400,
body: 'Bad Request - Invalid Redirect URL',
};
}
You could compile your URL array into proper regex and then iterate over them to see if it matches. Similar to something like a web framework would do that allows URL path parameters such as /users/:id.
function makeMatcher(urls) {
const compiled = urls.map(url => {
// regex escape the url but dont escape *
let exp = url.replace(/[-[\]{}()+?.,\\^$|#\s]/g, '\\$&');
// replace * with .+ for the wildcard
exp = exp.replaceAll('*', '.+');
// the expression is used to create the match function
return new RegExp(`^${exp}$`);
});
// return the match function, which returns true, on the first match,
// or false, if there is no match at all
return function match(url) {
return compiled.find(regex => url.match(regex)) == undefined ?
false :
true;
};
}
const matches = makeMatcher([
'https://*.example.com/example/login',
'http://www.example2.com/*/example2/callback'
]);
// these 2 should match
console.log(matches('https://x.example.com/example/login'));
console.log(matches('http://www.example2.com/foo/example2/callback'));
// this one not
console.log(matches('http://nope.example2.com/foo/example2/callback'));

Pattern match in nodejs rest url

In my node app I am using the router.use to do the token validation.
I want to skip validation for few urls, so I want to check if the url matches, then call next();
But the URL I want to skip has a URLparam
E.g., this is the URL /service/:appname/getall.
This has to be matched against /service/blah/getall and give a true.
How can this be achieved without splitting the url by '/'
Thanks in advance.
The parameters will match :[^/]+ because it is a : followed by anything other than a / 1 or more times.
If you find the parameters in the template and replace them with a regex that will match any string you can do what you asked for.
let template = '/service/:appname/getall'
let url = '/service/blah/getall'
// find params and replace them with regex
template = template.replace(/:[^/]+/g, '([^/]+)')
// the template is now a regex string '/service/[^/]+/getall'
// which is essentially '/service/ ANYTHING THAT'S NOT A '/' /getall'
// convert to regex and only match from start to end
template = new RegExp(`^${template}$`)
// ^ = beggin
// $ = end
// the template is now /^\/service\/([^\/]+)\/getall$/
matches = url.match(template)
// matches will be null is there is no match.
console.log(matches)
// ["/service/blah/getall", "blah"]
// it will be [full_match, param1, param2...]
Edit: use \w instead of [^/], because:
The name of route parameters must be made up of “word characters” ([A-Za-z0-9_]). https://expressjs.com/en/guide/routing.html#route-parameters
I believe this is true for most parsers so I have updated my answer. The following test data will only work with this updated method.
let template = '/service/:app-:version/get/:amt';
let url = '/service/blah-v1.0.0/get/all';
template = template.replace(/:\w+/g, `([^/]+)` );
template = new RegExp(`^${template}$`);
let matches = url.match(template);
console.log(url);
console.log(template);
console.log(matches);
// Array(4) ["/service/blah-v1.0.0/get/all", "blah", "v1.0.0", "all"]

extract a string using regular expression in node

I'm trying to use exec for a regular expression in node. I know the expression works via testing it with an extension in VSCode but when I run the node app it keeps returning null.`
str = '\r\nProgram Boot Directory: \\SIMPL\\app01\r\nSource File: C:\\DRI\\DRI\\DRI Conf Room v2 20180419aj\r\nProgram File: DRI Conf Room v2 20180419aj.smw\r\n';
var regex = /\Program File:(.*?)\\/;
var matched = regex.exec(str);
console.log(matched);
I think you don't have to escape the \P at the beginning and the string ends with \r\n so you could match that instead of \\ which will match a backslash.
If you don't want the leading whitespace in the first capturing group you could add \s*to match zero or more whitespace characters: /Program File:\s*(.*?)\r\n/
For example:
str = '\r\nProgram Boot Directory: \\SIMPL\\app01\r\nSource File: C:\\DRI\\DRI\\DRI Conf Room v2 20180419aj\r\nProgram File: DRI Conf Room v2 20180419aj.smw\r\n';
var regex = /Program File:(.*?)\r\n/;
var matched = regex.exec(str);
console.log(matched[0]);
console.log(matched[1]);
Demo
You need to use a RegExp constructor:
var str = '\r\nProgram \r\nProgram File: DRI 0180419aj.smw\r\n'
.replace('[\\r,\\n]',''); // removes the new lines before we search
var pattern = 'Program File.+' // write your raw pattern
var re = new RegExp(pattern); // feed that into the RegExp constructor with flags if needed
var result = re.exec(str); // run your search
console.log(result)
Not really sure what your pattern should do, so I just put one there, that matches whatever starts with Program File. If you want all matches, not just the first, just change it to
var re = new RegExp(pattern,'g');
Hope that helps!
The regex syntax you use looks off. Try it like this:
const regex = /^Program File:\s*(.*?)$/gm;
const str = `
Program Boot Directory: \\\\SIMPL\\\\app01
Source File: C:\\\\DRI\\\\DRI\\\\DRI Conf Room v2 20180419aj
Program File: DRI Conf Room v2 20180419aj.smw
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

how to display title of websites in node.js

I want to get title of different site. like this.
localhost:1234/index/?url=google.com&url=www.yahoo.com/&url=twitter.com
if i got to this url it crawl on all the mention site in the url and display title of website.
- Google
- Yahoo
- Twitter
var Urls = 'localhost:1234/index/?url=google.com&url=www.yahoo.com/&url=twitter.com';
// remove all special characters like '/' '&' and '='
Urls = Urls.replace(/\&/g, '').replace(/\//g, '').replace(/\=/g, '');
// split it based on url
Urls = Urls.split('url');
//delete first element as its not required
delete Urls[0]
Urls.forEach(function (url) {
//split each element based on '.'
url = url.split('.');
url.forEach(function (ele) {
// if its not 'www' and 'com'
if (ele !== 'www' && ele !== 'com') {
// the title of url
console.log(ele);
}
})
})
you need to remove all special character as above using regular expression and if urls contains ".org" or ".in" .. etc, then that also need to include inside if condition

Sharepoint > Which characters are *not* allowed in a list?

I've tried looking this up and haven't come up with the answer I'm looking for; I've found what cannot be included in filenames, folder names, and site names... but nothing on actual fields in a list.
I noticed that the percent symbol (%) is one that's not allowed in files/sites/folders. But it also doesn't populate when I try to pro grammatically add the fields to the list. I am doing this by using a small C# application that sends the data via Sharepoint 2010's built-in web services. I can manually enter the character, but it messes up each field in the row if I try it through code.
I've tried some of the escape characters that I've found via Google (_x26), but these don't seem to work either. Has anyone else had an issue with this? If these characters are allowed, how can I escape them when sending the data through a web service call?
Thanks in advance!
Justin
Any characters that aren't allowed when you enter a field name get encoded in the internal name. The format is a little different to what you show - try "_x0026_".
I usually avoid issues with weird internal names by creating the field with no spaces or special characters in the name, then renaming it. When you rename a field, only the display name changes and you keep the simple internal name.
Characters not allowed in SharePoint file name:
~, #, %, & , *, {, }, \, :, <, >, ?, /, |, "
Pasted from http://chrisbarba.com/2011/01/27/sharepoint-filename-special-characters-not-allowed/
Am I saying something weird when I state that there usually is a reason for certain characters not being allowed. I don't know which or why, but there probably is a reason.
Since you control which fields need to be there you can also dictate their (internal) names. I'd say follow best practice and name your fields using Camel case. And because you created them, you can just map the fields to the according fields in your data source.
As a follow on to #Elisa's answer, here's some JavaScript / TypeScript code that helps to prevent users from uploading files into SharePoint with invalid characters in the file name implemented on Nintex forms.
Here's the gist of the JavaScript version (note you'll have to obviously adapt for your own needs since this was designed for Nintex) :
//------------------------------------------------------------------------
//JavaScript Version:
//Code from http://cc.davelozinski.com/tips-techniques/nintex-form-tips-techniques/javascript-typescript-for-nintex-forms-to-validate-file-names
//------------------------------------------------------------------------
function validateAttachmentNames(eventObject) {
var textbox$ = NWF$(this);
var attachrowid = this.id.substring(10, 47);
var fileUploadid = attachrowid;
var index = attachrowid.substring(36);
//console.log('index:' + index);
//console.log('attachrowid:' + attachrowid);
//console.log('fileUploadid:' + fileUploadid);
if (index == '') {
attachrowid += '0';
}
var fileName = NWF.FormFiller.Attachments.TrimWhiteSpaces(textbox$.val().replace(/^.*[\\\/]/, ''));
var match = (new RegExp('[~#%\&{}+\|]|\\.\\.|^\\.|\\.$')).test(fileName);
if (match) {
isValid = false;
setTimeout(function () {
NWF$("tr[id^='attachRow']").each(function () {
var arrParts = (NWF$(this).find(".ms-addnew")[0]).href.split('"');
var fileName = arrParts[5];
var attachRow = arrParts[1];
var fileUpload = arrParts[3];
var match = (new RegExp('[~#%\&{}+\|]|\\.\\.|^\\.|\\.$')).test(fileName);
if (match) {
console.log(fileName);
NWF.FormFiller.Attachments.RemoveLocal(attachRow, fileUpload, fileName);
alert('Invalid file: ' + fileName + ' You cannot attach files with the following characters ~ # % & * { } \ : < > ? / + | \n\nThe file has been removed.');
}
});
}, 500);
}
}

Resources