extract a string using regular expression in node

extract a string using regular expression in node - node.js

I'm trying to use exec for a regular expression in node. I know the expression works via testing it with an extension in VSCode but when I run the node app it keeps returning null.`
str = '\r\nProgram Boot Directory: \\SIMPL\\app01\r\nSource File: C:\\DRI\\DRI\\DRI Conf Room v2 20180419aj\r\nProgram File: DRI Conf Room v2 20180419aj.smw\r\n';
var regex = /\Program File:(.*?)\\/;
var matched = regex.exec(str);
console.log(matched);

I think you don't have to escape the \P at the beginning and the string ends with \r\n so you could match that instead of \\ which will match a backslash.
If you don't want the leading whitespace in the first capturing group you could add \s*to match zero or more whitespace characters: /Program File:\s*(.*?)\r\n/
For example:
str = '\r\nProgram Boot Directory: \\SIMPL\\app01\r\nSource File: C:\\DRI\\DRI\\DRI Conf Room v2 20180419aj\r\nProgram File: DRI Conf Room v2 20180419aj.smw\r\n';
var regex = /Program File:(.*?)\r\n/;
var matched = regex.exec(str);
console.log(matched[0]);
console.log(matched[1]);
Demo

You need to use a RegExp constructor:
var str = '\r\nProgram \r\nProgram File: DRI 0180419aj.smw\r\n'
.replace('[\\r,\\n]',''); // removes the new lines before we search
var pattern = 'Program File.+' // write your raw pattern
var re = new RegExp(pattern); // feed that into the RegExp constructor with flags if needed
var result = re.exec(str); // run your search
console.log(result)
Not really sure what your pattern should do, so I just put one there, that matches whatever starts with Program File. If you want all matches, not just the first, just change it to
var re = new RegExp(pattern,'g');
Hope that helps!

The regex syntax you use looks off. Try it like this:
const regex = /^Program File:\s*(.*?)$/gm;
const str = `
Program Boot Directory: \\\\SIMPL\\\\app01
Source File: C:\\\\DRI\\\\DRI\\\\DRI Conf Room v2 20180419aj
Program File: DRI Conf Room v2 20180419aj.smw
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Related

How to extract value from a url

I have a few requests coming in which follow the pattern below
contacts/id/
contacts/x/id/name
contacts/x/y/id/address
contacts/z/address/
I want to extract the value which follows right after 'contacts'
In above cases,
1. id
2. x
3. x
4. z
Here is my regex
(?<=contacts)\/[^\/]+
https://regex101.com/r/ePmv5Y/1
But it is matching along with the trailing '/' for eg. /id, /x etc
How do I optimize to get rid of this trailing slash?

We can use match() here:
var urls = ["contacts/id/", "contacts/x/id/name", "contacts/x/y/id/address", "contacts/z/address/"];
for (var i=0; i < urls.length; ++i) {
var output = urls[i].match(/\bcontacts\/(.*?)\//)[1];
console.log(urls[i] + " => " + output);
}

I have a few requests coming in
If you mean http requests, then this is likely the pathname of the requested URL, and they'll start with a /. (This is the value of req.url in a Node.js server.)
To match on a URL pathname, you can use this expression: ^\/contacts\/([^/?]+). Here's a link to another regular expression builder that demonstrates it and includes an explanation for every character: https://regexr.com/6tugf
The [^/?] is a negated set that matches any token which is not a / or a ? and the + means that it matches 1 or more of those tokens. It's important to include the ? because otherwise it could match into the query string portion of the URL — for example, in this URL:
https://domain.tld/contacts/x/id/name?filter=recent # URL
/contacts/x/id/name?filter=recent # req.url in Node.js
/contacts/x/id/name # pathname
?filter=recent # query string
And here's a runnable code snippet demonstrating the same expression, using String.prototype.match():
const contactIdRegexp = /^\/contacts\/([^/?]+)/;
const inputs = [
'/contacts/id/', // id
'/contacts/x/id/name', // x
'/contacts/x/y/id/address', // x
'/contacts/z/address/', // z
'/contacts/x/id/name?filter=recent', // x
];
for (const str of inputs) {
const id = str.match(contactIdRegexp)?.[1];
console.log(id);
}

You can add the / inside the lookbehind:
(?<=contacts\/)[^\/]+
See a regex demo.

If you like to continue without regex, You can try below.
//get the URL object.
const url = new URL(`${req.protocol}://${req.get('host')}${req.originalUrl}`);
//extract the pathname and split using "/"
const pathName= url.pathname.split("/");
//get the required value using array index.
const val = pathName[2];

How do you get the the string between two markers even if there is multiple for the string inside it in nodes

I was trying to make a node's program that takes a string, and gets all of the content inside it:
var str = "Hello {world}!";
console.log(getBracketSubstrings(str)); // => ['world']
It works, but when I do:
var str = "Hello {world{!}}";
console.log(getBracketSubstrings(str)); // => ['world{!']
It returns ['world{!}'], when I want it to return:
['world{!}']
Is there anyway to do this to a string in nodes?

You could use a pattern with a capture group, matching from { followed by any char except a closing curly using [^}]* until you encounter a }
{([^}]*)}
See a regex demo
const getBracketSubstrings = s => Array.from(s.matchAll(/{([^}]*)}/g), x => x[1]);
console.log(getBracketSubstrings("Hello {world}!"));
console.log(getBracketSubstrings("Hello {world{!}}"));

NodeJS RTF ANSI Find and Replace Words With Special Chars

I have a find and replace script that works no problem when the words don't have any special characters. However, there will be a lot of times where there will be special characters since it's finding names. As of now this is breaking the script.
The script looks for {<some-text>} and attempts to replace the contents (as well as remove the braces).
Example:
text.rtf
Here's a name with special char {Kotouč}
script.ts
import * as fs from "fs";
// Ingest the rtf file.
const content: string = fs.readFileSync("./text.rtf", "utf8");
console.log("content::\n", content);
// The string we are looking to match in file text.
const plainText: string = "{Kotouč}";
// Look for all text that matches the patter `{TEXT_HERE}`.
const anyMatchPattern: RegExp = /{(.*?)}/gi;
const matches: string[] = content.match(anyMatchPattern) || [];
const matchesLen: number = matches.length;
for (let i: number = 0; i < matchesLen; i++) {
// It correctly identifies the targeted text.
const currMatch: string = matches[i];
const isRtfMetadata: boolean = currMatch.endsWith(";}");
if (isRtfMetadata) {
continue;
}
// Here I need a way to escape `plainText` string so that it matches the source.
console.log("currMatch::", currMatch);
console.log("currMatch === plainText::", currMatch === plainText);
if (currMatch === plainText) {
const newContent: string = content.replace(currMatch, "IT_WORKS!");
console.log("newContent:", newContent);
}
}
output
content::
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Here's a name with special char \{Kotou\uc0\u269 \}.}
currMatch:: {Kotou\uc0\u269 \}
currMatch === plainText:: false
It looks like ANSI escaping, and I've tried using jsesc but that produces a different string, {Kotou\u010D} instead of what the document produces {Kotou\uc0\u269 \}.
How can I dynamically escape the plainText string variable so that it matches what is found in the document?

What I needed was to deepen my knowledge on rtf formatting as well as general text encoding.
The raw RTF text read from the file gives us a few hints:
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600...
This part of the rtf file metadata tells us a few things.
It is using RTF file formatting version 1. The encoding is ANSI, and specifically cpg1252, also known as Windows-1252 or CP-1252 which is:
...a single-byte character encoding of the Latin alphabet
(source)
The valuable piece of information from that is that we know it is using the Latin alphabet, this will be used later.
Knowing the specific RTF version used I stumbled upon the RTF 1.5 Spec
A quick search on that spec for one of the escape sequences that I was looking into revealed that it was an RTF specific escape control sequence, that is \uc0. So knowing that I was able to then parse what I was really after, \u269. Now I knew it was unicode and had a good hunch that the \u269 stood for unicode character code 269. So I look that up...
The \u269 (char code 269) shows up on this page to confirm. Now I know the character set and what needs done to get the equivalent plain text (unescaped), and there's a basic SO post I used here to get the function started.
Using all this knowledge I was able to piece it together from there. Here's the full corrected script and it's output:
script.ts
import * as fs from "fs";
// Match RTF unicode control sequence: http://www.biblioscape.com/rtf15_spec.htm
const unicodeControlReg: RegExp = /\\uc0\\u/g;
// Extracts the unicode character from an escape sequence with handling for rtf.
const matchEscapedChars: RegExp = /\\uc0\\u(\d{2,6})|\\u(\d{2,6})/g;
/**
* Util function to strip junk characters from string for comparison.
* #param {string} str
* #returns {string}
*/
const cleanupRtfStr = (str: string): string => {
return str
.replace(/\s/g, "")
.replace(/\\/g, "");
};
/**
* Detects escaped unicode and looks up the character by that code.
* #param {string} str
* #returns {string}
*/
const unescapeString = (str: string): string => {
const unescaped = str.replace(matchEscapedChars, (cc: string) => {
const stripped: string = cc.replace(unicodeControlReg, "");
const charCode: number = Number(stripped);
// See unicode character codes here:
// https://unicodelookup.com/#latin/11
return String.fromCharCode(charCode);
});
// Remove all whitespace.
return unescaped;
};
// Ingest the rtf file.
const content: string = fs.readFileSync("./src/TEST.rtf", "binary");
console.log("content::\n", content);
// The string we are looking to match in file text.
const plainText: string = "{Kotouč}";
// Look for all text that matches the pattern `{TEXT_HERE}`.
const anyMatchPattern: RegExp = /{(.*?)}/gi;
const matches: string[] = content.match(anyMatchPattern) || [];
const matchesLen: number = matches.length;
for (let i: number = 0; i < matchesLen; i++) {
const currMatch: string = matches[i];
const isRtfMetadata: boolean = currMatch.endsWith(";}");
if (isRtfMetadata) {
continue;
}
if (currMatch === plainText) {
const newContent: string = content.replace(currMatch, "IT_WORKS!");
console.log("\n\nnewContent:", newContent);
break;
}
const unescapedMatch: string = unescapeString(currMatch);
const cleanedMatch: string = cleanupRtfStr(unescapedMatch);
if (cleanedMatch === plainText) {
const newContent: string = content.replace(currMatch, "IT_WORKS_UNESCAPED!");
console.log("\n\nnewContent:", newContent);
break;
}
}
output
content::
{\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Here\'92s a name with special char \{Kotou\uc0\u269 \}}
newContent: {\rtf1\ansi\ansicpg1252\cocoartf1671\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural\partightenfactor0
\f0\fs24 \cf0 Here\'92s a name with special char \IT_WORKS_UNESCAPED!}
Hopefully that helps others that aren't familiar with character encoding/escaping and it's uses in rtf formatted documents!

Pattern match in nodejs rest url

In my node app I am using the router.use to do the token validation.
I want to skip validation for few urls, so I want to check if the url matches, then call next();
But the URL I want to skip has a URLparam
E.g., this is the URL /service/:appname/getall.
This has to be matched against /service/blah/getall and give a true.
How can this be achieved without splitting the url by '/'
Thanks in advance.

The parameters will match :[^/]+ because it is a : followed by anything other than a / 1 or more times.
If you find the parameters in the template and replace them with a regex that will match any string you can do what you asked for.
let template = '/service/:appname/getall'
let url = '/service/blah/getall'
// find params and replace them with regex
template = template.replace(/:[^/]+/g, '([^/]+)')
// the template is now a regex string '/service/[^/]+/getall'
// which is essentially '/service/ ANYTHING THAT'S NOT A '/' /getall'
// convert to regex and only match from start to end
template = new RegExp(`^${template}$`)
// ^ = beggin
// $ = end
// the template is now /^\/service\/([^\/]+)\/getall$/
matches = url.match(template)
// matches will be null is there is no match.
console.log(matches)
// ["/service/blah/getall", "blah"]
// it will be [full_match, param1, param2...]
Edit: use \w instead of [^/], because:
The name of route parameters must be made up of “word characters” ([A-Za-z0-9_]). https://expressjs.com/en/guide/routing.html#route-parameters
I believe this is true for most parsers so I have updated my answer. The following test data will only work with this updated method.
let template = '/service/:app-:version/get/:amt';
let url = '/service/blah-v1.0.0/get/all';
template = template.replace(/:\w+/g, `([^/]+)` );
template = new RegExp(`^${template}$`);
let matches = url.match(template);
console.log(url);
console.log(template);
console.log(matches);
// Array(4) ["/service/blah-v1.0.0/get/all", "blah", "v1.0.0", "all"]

How to use stringByAddingPercentEncodingWithAllowedCharacters() for a URL in Swift 2.0

I was using this, in Swift 1.2
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
This now gives me a warning asking me to use
stringByAddingPercentEncodingWithAllowedCharacters
I need to use a NSCharacterSet as an argument, but there are so many and I cannot determine what one will give me the same outcome as the previously used method.
An example URL I want to use will be like this
http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red Lion&location=19036&location=1090 N Charlotte St, Lancaster, PA
The URL Character Set for encoding seems to contain sets the trim my
URL. i.e,
The path component of a URL is the component immediately following the
host component (if present). It ends wherever the query or fragment
component begins. For example, in the URL
http://www.example.com/index.php?key1=value1, the path component is
/index.php.
However I don't want to trim any aspect of it.
When I used my String, for example myurlstring it would fail.
But when used the following, then there were no issues. It encoded the string with some magic and I could get my URL data.
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
As it
Returns a representation of the String using a given encoding to
determine the percent escapes necessary to convert the String into a
legal URL string
Thanks

For the given URL string the equivalent to
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
is the character set URLQueryAllowedCharacterSet
let urlwithPercentEscapes = myurlstring.stringByAddingPercentEncodingWithAllowedCharacters( NSCharacterSet.URLQueryAllowedCharacterSet())
Swift 3:
let urlwithPercentEscapes = myurlstring.addingPercentEncoding( withAllowedCharacters: .urlQueryAllowed)
It encodes everything after the question mark in the URL string.
Since the method stringByAddingPercentEncodingWithAllowedCharacters can return nil, use optional bindings as suggested in the answer of Leo Dabus.

It will depend on your url. If your url is a path you can use the character set
urlPathAllowed
let myFileString = "My File.txt"
if let urlwithPercentEscapes = myFileString.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) {
print(urlwithPercentEscapes) // "My%20File.txt"
}
Creating a Character Set for URL Encoding
urlFragmentAllowed
urlHostAllowed
urlPasswordAllowed
urlQueryAllowed
urlUserAllowed
You can create also your own url character set:
let myUrlString = "http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red Lion&location=19036&location=1090 N Charlotte St, Lancaster, PA"
let urlSet = CharacterSet.urlFragmentAllowed
.union(.urlHostAllowed)
.union(.urlPasswordAllowed)
.union(.urlQueryAllowed)
.union(.urlUserAllowed)
extension CharacterSet {
static let urlAllowed = CharacterSet.urlFragmentAllowed
.union(.urlHostAllowed)
.union(.urlPasswordAllowed)
.union(.urlQueryAllowed)
.union(.urlUserAllowed)
}
if let urlwithPercentEscapes = myUrlString.addingPercentEncoding(withAllowedCharacters: .urlAllowed) {
print(urlwithPercentEscapes) // "http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red%20Lion&location=19036&location=1090%20N%20Charlotte%20St,%20Lancaster,%20PA"
}
Another option is to use URLComponents to properly create your url

Swift 3.0 (From grokswift)
Creating URLs from strings is a minefield for bugs. Just miss a single / or accidentally URL encode the ? in a query and your API call will fail and your app won’t have any data to display (or even crash if you didn’t anticipate that possibility). Since iOS 8 there’s a better way to build URLs using NSURLComponents and NSURLQueryItems.
func createURLWithComponents() -> URL? {
var urlComponents = URLComponents()
urlComponents.scheme = "http"
urlComponents.host = "www.mapquestapi.com"
urlComponents.path = "/geocoding/v1/batch"
let key = URLQueryItem(name: "key", value: "YOUR_KEY_HERE")
let callback = URLQueryItem(name: "callback", value: "renderBatch")
let locationA = URLQueryItem(name: "location", value: "Pottsville,PA")
let locationB = URLQueryItem(name: "location", value: "Red Lion")
let locationC = URLQueryItem(name: "location", value: "19036")
let locationD = URLQueryItem(name: "location", value: "1090 N Charlotte St, Lancaster, PA")
urlComponents.queryItems = [key, callback, locationA, locationB, locationC, locationD]
return urlComponents.url
}
Below is the code to access url using guard statement.
guard let url = createURLWithComponents() else {
print("invalid URL")
return nil
}
print(url)
Output:
http://www.mapquestapi.com/geocoding/v1/batch?key=YOUR_KEY_HERE&callback=renderBatch&location=Pottsville,PA&location=Red%20Lion&location=19036&location=1090%20N%20Charlotte%20St,%20Lancaster,%20PA

In Swift 3.1, I am using something like the following:
let query = "param1=value1&param2=" + valueToEncode.addingPercentEncoding(withAllowedCharacters: .alphanumeric)
It's safer than .urlQueryAllowed and the others, because it this will encode every characters other than A-Z, a-z and 0-9. This works better when the value you are encoding may use special characters like ?, &, =, + and spaces.

In my case where the last component was non latin characters I did the following in Swift 2.2:
extension String {
func encodeUTF8() -> String? {
//If I can create an NSURL out of the string nothing is wrong with it
if let _ = NSURL(string: self) {
return self
}
//Get the last component from the string this will return subSequence
let optionalLastComponent = self.characters.split { $0 == "/" }.last
if let lastComponent = optionalLastComponent {
//Get the string from the sub sequence by mapping the characters to [String] then reduce the array to String
let lastComponentAsString = lastComponent.map { String($0) }.reduce("", combine: +)
//Get the range of the last component
if let rangeOfLastComponent = self.rangeOfString(lastComponentAsString) {
//Get the string without its last component
let stringWithoutLastComponent = self.substringToIndex(rangeOfLastComponent.startIndex)
//Encode the last component
if let lastComponentEncoded = lastComponentAsString.stringByAddingPercentEncodingWithAllowedCharacters(NSCharacterSet.alphanumericCharacterSet()) {
//Finally append the original string (without its last component) to the encoded part (encoded last component)
let encodedString = stringWithoutLastComponent + lastComponentEncoded
//Return the string (original string/encoded string)
return encodedString
}
}
}
return nil;
}
}

Swift 4.0
let encodedData = myUrlString.addingPercentEncoding(withAllowedCharacters: CharacterSet.urlHostAllowed)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

extract a string using regular expression in node - node.js

Related

How to extract value from a url

How do you get the the string between two markers even if there is multiple for the string inside it in nodes

NodeJS RTF ANSI Find and Replace Words With Special Chars

Pattern match in nodejs rest url

How to use stringByAddingPercentEncodingWithAllowedCharacters() for a URL in Swift 2.0

Categories

Resources