Change Image source in Markdown text using Node JS - node.js

I have some markdown text containing image references but those are relative paths & I need to modify those to absolute paths using NodeJS script. Is there any way to achieve the same in simple way ?
Example:
Source
![](myImage.png?raw=true)
Result
![](www.example.com/myImage.png?raw=true)
I have multiple images in the markdwon content that I need to modify.

Check this working example: https://repl.it/repls/BlackJuicyCommands
First of all you need the read the file with fs.readFile() and parse the Buffer to a string.
Now you got access to the text of the file, you need the replace every image path with a new image path. One way to capture it is with a regular expression. You could for example look for ](ANY_IMAGE_PATH.ANY_IMAGE_EXTENSION. We're only interested in replacing the ANY_IMAGE_PATH part.
An example regex could be (this could be improved!) (see it live in action here: https://regex101.com/r/xRioBq/1):
const regex = /\]\((.+)(?=(\.(svg|gif|png|jpe?g)))/g
This will look for a literal ] followed by a literal ( followed by any sequence (that's the .+) until it finds a literal . followed by either svg, gif,png, or jpeg / jpg. the g after the regex is necessary to match all occurences, not only the first.
Javascript/node (< version 9) does not support lookbehind. I've used a regex that includes the ]( before the image path and then will filter it out later (or you could use a more complex regex).
The (.+) captures the image path as a group. Then you can use the replace function of the string. The replace function accepts the regex as first argument. The second argument can be a function which receives as first argument the full match (in this case ](image_path_without_extension and as the following arguments any captured group.
Change the url with the node modules path or url and return the captured group (underneath I've called that argument imagePath). Because we're also replacing the ](, that should be included in the return value.
const url = require('url');
const replacedText = data.toString().replace(regex, (fullResult, imagePath) => {
const newImagePath = url.resolve('http://www.example.org', imagePath)
return `](${newImagePath}`;
})
See the repl example to see it live in action. Note: in the repl example it is written to a different file. Use the same file name if you want to overwrite.

I ran into the same issue yesterday, and came up with this solution.
const markdownReplaced = markdown.replace(
/(?<=\]\()(.+)(?=(\)))/g,
(url) => `www.example.com/${url}`,
);
The regex finds anything starts with ]( and ends with ), without including the capturing group, which is the original URL itself.

Related

How can I transform RegEx categories into plain RegEx?

This question is based on this question. During coding, I got some new things popping up and because the initial question is properly answered, I want to describe my issues in this question.
My goal is to have a RegEx which filters out everything, instead of some special requirements:
Alphanumeric allowed
non-Lating e.g. Chinese or Japanese allowed
.,-?!"'=$|<>[]{} allowed
Works with NodeJS 8.9.4
During implementation of the answer from the main question, I've found out, that this only works with newer Node versions (because of the supported ES version). Sadly, our project runs on 8.9.4 which can't be changed in any way. So upgrading is not an option.
I've started searching around and found this page: https://github.com/slevithan/xregexp/blob/master/tools/output/categories.js
With the help of another question, I've tried to build something together which matches my requirements. I came out with:
/[^\(?:[A-Za-z\xAA\xB5\xBA\xC0-\xD6\xD8-\xF6\xF8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377\u037A-\u037D\u037F\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u052F\u0531-\u0556\u0559\u0560-\u0588\u05D0-\u05EA\u05EF-\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-\u072F\u074D-\u07A5\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-\u0858\u0860-\u086A\u0870-\u0887\u0889-\u088E\u08A0-\u08C9\u0904-\u0939\u093D\u0950\u0958-\u0961\u0971-\u0980\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-\u09E1\u09F0\u09F1\u09FC\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0AF9\u0B05-\u0B0C\u0B0F\u0B10\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-\u0B90\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C39\u0C3D\u0C58-\u0C5A\u0C5D\u0C60\u0C61\u0C80\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CBD\u0CDD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D04-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D54-\u0D56\u0D5F-\u0D61\u0D7A-\u0D7F\u0D85-\u0D96\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E86-\u0E8A\u0E8C-\u0EA3\u0EA5\u0EA7-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-\u0EDF\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-\u138F\u13A0-\u13F5\u13F8-\u13FD\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u16F1-\u16F8\u1700-\u1711\u171F-\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3\u17D7\u17DC\u1820-\u1878\u1880-\u1884\u1887-\u18A8\u18AA\u18B0-\u18F5\u1900-\u191E\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19B0-\u19C9\u1A00-\u1A16\u1A20-\u1A54\u1AA7\u1B05-\u1B33\u1B45-\u1B4C\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-\u1C7D\u1C80-\u1C88]+/g
My current example string is:
Test=😕查看
°°^ Marting 10202029 Offline!"§$%&/()!"§$%&/()After this we want to keep the allowed special chars: .,-?!"'=$|<>[]{}
Somehow, the answer from the first question works better as the parsed categories from me. So there must be something wrong, which I'm unable to find.
At the end, I want to put everything inside a var.replace() command, to replace everything bad with a single whitespace.
For testing, I'm using: https://regexr.com/
You can either use regexpu to transpile the regex into an ES6-compliant regex, or you may go to the Unicode Utilities: UnicodeSet page and get the code point ranges manually.
In your case, paste [^\p{L}\p{N}] into the Input field, check Abbreviate and Escape, then click Show Set. Add the .,?!"'=$|<>[\]{}- at the end of the character class. Then, double the backslashes (also, escape the ' or ", your string literal delimiter char, I escaped ' below) and put inside the pattern_from_uu variable definition in this JavaScript code and then, all you need to define the regex is const reg = new RegExp(pattern, "gu") or const reg = new RegExp(pattern, "u"):
const pattern_from_uu = '[^0-9A-Za-z\\u00AA\\u00B2\\u00B3\\u00B5\\u00B9\\u00BA\\u00BC-\\u00BE\\u00C0-\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02C1\\u02C6-\\u02D1\\u02E0-\\u02E4\\u02EC\\u02EE\\u0370-\\u0374\\u0376\\u0377\\u037A-\\u037D\\u037F\\u0386\\u0388-\\u038A\\u038C\\u038E-\\u03A1\\u03A3-\\u03F5\\u03F7-\\u0481\\u048A-\\u052F\\u0531-\\u0556\\u0559\\u0560-\\u0588\\u05D0-\\u05EA\\u05EF-\\u05F2\\u0620-\\u064A\\u0660-\\u0669\\u066E\\u066F\\u0671-\\u06D3\\u06D5\\u06E5\\u06E6\\u06EE-\\u06FC\\u06FF\\u0710\\u0712-\\u072F\\u074D-\\u07A5\\u07B1\\u07C0-\\u07EA\\u07F4\\u07F5\\u07FA\\u0800-\\u0815\\u081A\\u0824\\u0828\\u0840-\\u0858\\u0860-\\u086A\\u0870-\\u0887\\u0889-\\u088E\\u08A0-\\u08C9\\u0904-\\u0939\\u093D\\u0950\\u0958-\\u0961\\u0966-\\u096F\\u0971-\\u0980\\u0985-\\u098C\\u098F\\u0990\\u0993-\\u09A8\\u09AA-\\u09B0\\u09B2\\u09B6-\\u09B9\\u09BD\\u09CE\\u09DC\\u09DD\\u09DF-\\u09E1\\u09E6-\\u09F1\\u09F4-\\u09F9\\u09FC\\u0A05-\\u0A0A\\u0A0F\\u0A10\\u0A13-\\u0A28\\u0A2A-\\u0A30\\u0A32\\u0A33\\u0A35\\u0A36\\u0A38\\u0A39\\u0A59-\\u0A5C\\u0A5E\\u0A66-\\u0A6F\\u0A72-\\u0A74\\u0A85-\\u0A8D\\u0A8F-\\u0A91\\u0A93-\\u0AA8\\u0AAA-\\u0AB0\\u0AB2\\u0AB3\\u0AB5-\\u0AB9\\u0ABD\\u0AD0\\u0AE0\\u0AE1\\u0AE6-\\u0AEF\\u0AF9\\u0B05-\\u0B0C\\u0B0F\\u0B10\\u0B13-\\u0B28\\u0B2A-\\u0B30\\u0B32\\u0B33\\u0B35-\\u0B39\\u0B3D\\u0B5C\\u0B5D\\u0B5F-\\u0B61\\u0B66-\\u0B6F\\u0B71-\\u0B77\\u0B83\\u0B85-\\u0B8A\\u0B8E-\\u0B90\\u0B92-\\u0B95\\u0B99\\u0B9A\\u0B9C\\u0B9E\\u0B9F\\u0BA3\\u0BA4\\u0BA8-\\u0BAA\\u0BAE-\\u0BB9\\u0BD0\\u0BE6-\\u0BF2\\u0C05-\\u0C0C\\u0C0E-\\u0C10\\u0C12-\\u0C28\\u0C2A-\\u0C39\\u0C3D\\u0C58-\\u0C5A\\u0C5D\\u0C60\\u0C61\\u0C66-\\u0C6F\\u0C78-\\u0C7E\\u0C80\\u0C85-\\u0C8C\\u0C8E-\\u0C90\\u0C92-\\u0CA8\\u0CAA-\\u0CB3\\u0CB5-\\u0CB9\\u0CBD\\u0CDD\\u0CDE\\u0CE0\\u0CE1\\u0CE6-\\u0CEF\\u0CF1\\u0CF2\\u0D04-\\u0D0C\\u0D0E-\\u0D10\\u0D12-\\u0D3A\\u0D3D\\u0D4E\\u0D54-\\u0D56\\u0D58-\\u0D61\\u0D66-\\u0D78\\u0D7A-\\u0D7F\\u0D85-\\u0D96\\u0D9A-\\u0DB1\\u0DB3-\\u0DBB\\u0DBD\\u0DC0-\\u0DC6\\u0DE6-\\u0DEF\\u0E01-\\u0E30\\u0E32\\u0E33\\u0E40-\\u0E46\\u0E50-\\u0E59\\u0E81\\u0E82\\u0E84\\u0E86-\\u0E8A\\u0E8C-\\u0EA3\\u0EA5\\u0EA7-\\u0EB0\\u0EB2\\u0EB3\\u0EBD\\u0EC0-\\u0EC4\\u0EC6\\u0ED0-\\u0ED9\\u0EDC-\\u0EDF\\u0F00\\u0F20-\\u0F33\\u0F40-\\u0F47\\u0F49-\\u0F6C\\u0F88-\\u0F8C\\u1000-\\u102A\\u103F-\\u1049\\u1050-\\u1055\\u105A-\\u105D\\u1061\\u1065\\u1066\\u106E-\\u1070\\u1075-\\u1081\\u108E\\u1090-\\u1099\\u10A0-\\u10C5\\u10C7\\u10CD\\u10D0-\\u10FA\\u10FC-\\u1248\\u124A-\\u124D\\u1250-\\u1256\\u1258\\u125A-\\u125D\\u1260-\\u1288\\u128A-\\u128D\\u1290-\\u12B0\\u12B2-\\u12B5\\u12B8-\\u12BE\\u12C0\\u12C2-\\u12C5\\u12C8-\\u12D6\\u12D8-\\u1310\\u1312-\\u1315\\u1318-\\u135A\\u1369-\\u137C\\u1380-\\u138F\\u13A0-\\u13F5\\u13F8-\\u13FD\\u1401-\\u166C\\u166F-\\u167F\\u1681-\\u169A\\u16A0-\\u16EA\\u16EE-\\u16F8\\u1700-\\u1711\\u171F-\\u1731\\u1740-\\u1751\\u1760-\\u176C\\u176E-\\u1770\\u1780-\\u17B3\\u17D7\\u17DC\\u17E0-\\u17E9\\u17F0-\\u17F9\\u1810-\\u1819\\u1820-\\u1878\\u1880-\\u1884\\u1887-\\u18A8\\u18AA\\u18B0-\\u18F5\\u1900-\\u191E\\u1946-\\u196D\\u1970-\\u1974\\u1980-\\u19AB\\u19B0-\\u19C9\\u19D0-\\u19DA\\u1A00-\\u1A16\\u1A20-\\u1A54\\u1A80-\\u1A89\\u1A90-\\u1A99\\u1AA7\\u1B05-\\u1B33\\u1B45-\\u1B4C\\u1B50-\\u1B59\\u1B83-\\u1BA0\\u1BAE-\\u1BE5\\u1C00-\\u1C23\\u1C40-\\u1C49\\u1C4D-\\u1C7D\\u1C80-\\u1C88\\u1C90-\\u1CBA\\u1CBD-\\u1CBF\\u1CE9-\\u1CEC\\u1CEE-\\u1CF3\\u1CF5\\u1CF6\\u1CFA\\u1D00-\\u1DBF\\u1E00-\\u1F15\\u1F18-\\u1F1D\\u1F20-\\u1F45\\u1F48-\\u1F4D\\u1F50-\\u1F57\\u1F59\\u1F5B\\u1F5D\\u1F5F-\\u1F7D\\u1F80-\\u1FB4\\u1FB6-\\u1FBC\\u1FBE\\u1FC2-\\u1FC4\\u1FC6-\\u1FCC\\u1FD0-\\u1FD3\\u1FD6-\\u1FDB\\u1FE0-\\u1FEC\\u1FF2-\\u1FF4\\u1FF6-\\u1FFC\\u2070\\u2071\\u2074-\\u2079\\u207F-\\u2089\\u2090-\\u209C\\u2102\\u2107\\u210A-\\u2113\\u2115\\u2119-\\u211D\\u2124\\u2126\\u2128\\u212A-\\u212D\\u212F-\\u2139\\u213C-\\u213F\\u2145-\\u2149\\u214E\\u2150-\\u2189\\u2460-\\u249B\\u24EA-\\u24FF\\u2776-\\u2793\\u2C00-\\u2CE4\\u2CEB-\\u2CEE\\u2CF2\\u2CF3\\u2CFD\\u2D00-\\u2D25\\u2D27\\u2D2D\\u2D30-\\u2D67\\u2D6F\\u2D80-\\u2D96\\u2DA0-\\u2DA6\\u2DA8-\\u2DAE\\u2DB0-\\u2DB6\\u2DB8-\\u2DBE\\u2DC0-\\u2DC6\\u2DC8-\\u2DCE\\u2DD0-\\u2DD6\\u2DD8-\\u2DDE\\u2E2F\\u3005-\\u3007\\u3021-\\u3029\\u3031-\\u3035\\u3038-\\u303C\\u3041-\\u3096\\u309D-\\u309F\\u30A1-\\u30FA\\u30FC-\\u30FF\\u3105-\\u312F\\u3131-\\u318E\\u3192-\\u3195\\u31A0-\\u31BF\\u31F0-\\u31FF\\u3220-\\u3229\\u3248-\\u324F\\u3251-\\u325F\\u3280-\\u3289\\u32B1-\\u32BF\\u3400-\\u4DBF\\u4E00-\\uA48C\\uA4D0-\\uA4FD\\uA500-\\uA60C\\uA610-\\uA62B\\uA640-\\uA66E\\uA67F-\\uA69D\\uA6A0-\\uA6EF\\uA717-\\uA71F\\uA722-\\uA788\\uA78B-\\uA7CA\\uA7D0\\uA7D1\\uA7D3\\uA7D5-\\uA7D9\\uA7F2-\\uA801\\uA803-\\uA805\\uA807-\\uA80A\\uA80C-\\uA822\\uA830-\\uA835\\uA840-\\uA873\\uA882-\\uA8B3\\uA8D0-\\uA8D9\\uA8F2-\\uA8F7\\uA8FB\\uA8FD\\uA8FE\\uA900-\\uA925\\uA930-\\uA946\\uA960-\\uA97C\\uA984-\\uA9B2\\uA9CF-\\uA9D9\\uA9E0-\\uA9E4\\uA9E6-\\uA9FE\\uAA00-\\uAA28\\uAA40-\\uAA42\\uAA44-\\uAA4B\\uAA50-\\uAA59\\uAA60-\\uAA76\\uAA7A\\uAA7E-\\uAAAF\\uAAB1\\uAAB5\\uAAB6\\uAAB9-\\uAABD\\uAAC0\\uAAC2\\uAADB-\\uAADD\\uAAE0-\\uAAEA\\uAAF2-\\uAAF4\\uAB01-\\uAB06\\uAB09-\\uAB0E\\uAB11-\\uAB16\\uAB20-\\uAB26\\uAB28-\\uAB2E\\uAB30-\\uAB5A\\uAB5C-\\uAB69\\uAB70-\\uABE2\\uABF0-\\uABF9\\uAC00-\\uD7A3\\uD7B0-\\uD7C6\\uD7CB-\\uD7FB\\uF900-\\uFA6D\\uFA70-\\uFAD9\\uFB00-\\uFB06\\uFB13-\\uFB17\\uFB1D\\uFB1F-\\uFB28\\uFB2A-\\uFB36\\uFB38-\\uFB3C\\uFB3E\\uFB40\\uFB41\\uFB43\\uFB44\\uFB46-\\uFBB1\\uFBD3-\\uFD3D\\uFD50-\\uFD8F\\uFD92-\\uFDC7\\uFDF0-\\uFDFB\\uFE70-\\uFE74\\uFE76-\\uFEFC\\uFF10-\\uFF19\\uFF21-\\uFF3A\\uFF41-\\uFF5A\\uFF66-\\uFFBE\\uFFC2-\\uFFC7\\uFFCA-\\uFFCF\\uFFD2-\\uFFD7\\uFFDA-\\uFFDC\\U00010000-\\U0001000B\\U0001000D-\\U00010026\\U00010028-\\U0001003A\\U0001003C\\U0001003D\\U0001003F-\\U0001004D\\U00010050-\\U0001005D\\U00010080-\\U000100FA\\U00010107-\\U00010133\\U00010140-\\U00010178\\U0001018A\\U0001018B\\U00010280-\\U0001029C\\U000102A0-\\U000102D0\\U000102E1-\\U000102FB\\U00010300-\\U00010323\\U0001032D-\\U0001034A\\U00010350-\\U00010375\\U00010380-\\U0001039D\\U000103A0-\\U000103C3\\U000103C8-\\U000103CF\\U000103D1-\\U000103D5\\U00010400-\\U0001049D\\U000104A0-\\U000104A9\\U000104B0-\\U000104D3\\U000104D8-\\U000104FB\\U00010500-\\U00010527\\U00010530-\\U00010563\\U00010570-\\U0001057A\\U0001057C-\\U0001058A\\U0001058C-\\U00010592\\U00010594\\U00010595\\U00010597-\\U000105A1\\U000105A3-\\U000105B1\\U000105B3-\\U000105B9\\U000105BB\\U000105BC\\U00010600-\\U00010736\\U00010740-\\U00010755\\U00010760-\\U00010767\\U00010780-\\U00010785\\U00010787-\\U000107B0\\U000107B2-\\U000107BA\\U00010800-\\U00010805\\U00010808\\U0001080A-\\U00010835\\U00010837\\U00010838\\U0001083C\\U0001083F-\\U00010855\\U00010858-\\U00010876\\U00010879-\\U0001089E\\U000108A7-\\U000108AF\\U000108E0-\\U000108F2\\U000108F4\\U000108F5\\U000108FB-\\U0001091B\\U00010920-\\U00010939\\U00010980-\\U000109B7\\U000109BC-\\U000109CF\\U000109D2-\\U00010A00\\U00010A10-\\U00010A13\\U00010A15-\\U00010A17\\U00010A19-\\U00010A35\\U00010A40-\\U00010A48\\U00010A60-\\U00010A7E\\U00010A80-\\U00010A9F\\U00010AC0-\\U00010AC7\\U00010AC9-\\U00010AE4\\U00010AEB-\\U00010AEF\\U00010B00-\\U00010B35\\U00010B40-\\U00010B55\\U00010B58-\\U00010B72\\U00010B78-\\U00010B91\\U00010BA9-\\U00010BAF\\U00010C00-\\U00010C48\\U00010C80-\\U00010CB2\\U00010CC0-\\U00010CF2\\U00010CFA-\\U00010D23\\U00010D30-\\U00010D39\\U00010E60-\\U00010E7E\\U00010E80-\\U00010EA9\\U00010EB0\\U00010EB1\\U00010F00-\\U00010F27\\U00010F30-\\U00010F45\\U00010F51-\\U00010F54\\U00010F70-\\U00010F81\\U00010FB0-\\U00010FCB\\U00010FE0-\\U00010FF6\\U00011003-\\U00011037\\U00011052-\\U0001106F\\U00011071\\U00011072\\U00011075\\U00011083-\\U000110AF\\U000110D0-\\U000110E8\\U000110F0-\\U000110F9\\U00011103-\\U00011126\\U00011136-\\U0001113F\\U00011144\\U00011147\\U00011150-\\U00011172\\U00011176\\U00011183-\\U000111B2\\U000111C1-\\U000111C4\\U000111D0-\\U000111DA\\U000111DC\\U000111E1-\\U000111F4\\U00011200-\\U00011211\\U00011213-\\U0001122B\\U00011280-\\U00011286\\U00011288\\U0001128A-\\U0001128D\\U0001128F-\\U0001129D\\U0001129F-\\U000112A8\\U000112B0-\\U000112DE\\U000112F0-\\U000112F9\\U00011305-\\U0001130C\\U0001130F\\U00011310\\U00011313-\\U00011328\\U0001132A-\\U00011330\\U00011332\\U00011333\\U00011335-\\U00011339\\U0001133D\\U00011350\\U0001135D-\\U00011361\\U00011400-\\U00011434\\U00011447-\\U0001144A\\U00011450-\\U00011459\\U0001145F-\\U00011461\\U00011480-\\U000114AF\\U000114C4\\U000114C5\\U000114C7\\U000114D0-\\U000114D9\\U00011580-\\U000115AE\\U000115D8-\\U000115DB\\U00011600-\\U0001162F\\U00011644\\U00011650-\\U00011659\\U00011680-\\U000116AA\\U000116B8\\U000116C0-\\U000116C9\\U00011700-\\U0001171A\\U00011730-\\U0001173B\\U00011740-\\U00011746\\U00011800-\\U0001182B\\U000118A0-\\U000118F2\\U000118FF-\\U00011906\\U00011909\\U0001190C-\\U00011913\\U00011915\\U00011916\\U00011918-\\U0001192F\\U0001193F\\U00011941\\U00011950-\\U00011959\\U000119A0-\\U000119A7\\U000119AA-\\U000119D0\\U000119E1\\U000119E3\\U00011A00\\U00011A0B-\\U00011A32\\U00011A3A\\U00011A50\\U00011A5C-\\U00011A89\\U00011A9D\\U00011AB0-\\U00011AF8\\U00011C00-\\U00011C08\\U00011C0A-\\U00011C2E\\U00011C40\\U00011C50-\\U00011C6C\\U00011C72-\\U00011C8F\\U00011D00-\\U00011D06\\U00011D08\\U00011D09\\U00011D0B-\\U00011D30\\U00011D46\\U00011D50-\\U00011D59\\U00011D60-\\U00011D65\\U00011D67\\U00011D68\\U00011D6A-\\U00011D89\\U00011D98\\U00011DA0-\\U00011DA9\\U00011EE0-\\U00011EF2\\U00011FB0\\U00011FC0-\\U00011FD4\\U00012000-\\U00012399\\U00012400-\\U0001246E\\U00012480-\\U00012543\\U00012F90-\\U00012FF0\\U00013000-\\U0001342E\\U00014400-\\U00014646\\U00016800-\\U00016A38\\U00016A40-\\U00016A5E\\U00016A60-\\U00016A69\\U00016A70-\\U00016ABE\\U00016AC0-\\U00016AC9\\U00016AD0-\\U00016AED\\U00016B00-\\U00016B2F\\U00016B40-\\U00016B43\\U00016B50-\\U00016B59\\U00016B5B-\\U00016B61\\U00016B63-\\U00016B77\\U00016B7D-\\U00016B8F\\U00016E40-\\U00016E96\\U00016F00-\\U00016F4A\\U00016F50\\U00016F93-\\U00016F9F\\U00016FE0\\U00016FE1\\U00016FE3\\U00017000-\\U000187F7\\U00018800-\\U00018CD5\\U00018D00-\\U00018D08\\U0001AFF0-\\U0001AFF3\\U0001AFF5-\\U0001AFFB\\U0001AFFD\\U0001AFFE\\U0001B000-\\U0001B122\\U0001B150-\\U0001B152\\U0001B164-\\U0001B167\\U0001B170-\\U0001B2FB\\U0001BC00-\\U0001BC6A\\U0001BC70-\\U0001BC7C\\U0001BC80-\\U0001BC88\\U0001BC90-\\U0001BC99\\U0001D2E0-\\U0001D2F3\\U0001D360-\\U0001D378\\U0001D400-\\U0001D454\\U0001D456-\\U0001D49C\\U0001D49E\\U0001D49F\\U0001D4A2\\U0001D4A5\\U0001D4A6\\U0001D4A9-\\U0001D4AC\\U0001D4AE-\\U0001D4B9\\U0001D4BB\\U0001D4BD-\\U0001D4C3\\U0001D4C5-\\U0001D505\\U0001D507-\\U0001D50A\\U0001D50D-\\U0001D514\\U0001D516-\\U0001D51C\\U0001D51E-\\U0001D539\\U0001D53B-\\U0001D53E\\U0001D540-\\U0001D544\\U0001D546\\U0001D54A-\\U0001D550\\U0001D552-\\U0001D6A5\\U0001D6A8-\\U0001D6C0\\U0001D6C2-\\U0001D6DA\\U0001D6DC-\\U0001D6FA\\U0001D6FC-\\U0001D714\\U0001D716-\\U0001D734\\U0001D736-\\U0001D74E\\U0001D750-\\U0001D76E\\U0001D770-\\U0001D788\\U0001D78A-\\U0001D7A8\\U0001D7AA-\\U0001D7C2\\U0001D7C4-\\U0001D7CB\\U0001D7CE-\\U0001D7FF\\U0001DF00-\\U0001DF1E\\U0001E100-\\U0001E12C\\U0001E137-\\U0001E13D\\U0001E140-\\U0001E149\\U0001E14E\\U0001E290-\\U0001E2AD\\U0001E2C0-\\U0001E2EB\\U0001E2F0-\\U0001E2F9\\U0001E7E0-\\U0001E7E6\\U0001E7E8-\\U0001E7EB\\U0001E7ED\\U0001E7EE\\U0001E7F0-\\U0001E7FE\\U0001E800-\\U0001E8C4\\U0001E8C7-\\U0001E8CF\\U0001E900-\\U0001E943\\U0001E94B\\U0001E950-\\U0001E959\\U0001EC71-\\U0001ECAB\\U0001ECAD-\\U0001ECAF\\U0001ECB1-\\U0001ECB4\\U0001ED01-\\U0001ED2D\\U0001ED2F-\\U0001ED3D\\U0001EE00-\\U0001EE03\\U0001EE05-\\U0001EE1F\\U0001EE21\\U0001EE22\\U0001EE24\\U0001EE27\\U0001EE29-\\U0001EE32\\U0001EE34-\\U0001EE37\\U0001EE39\\U0001EE3B\\U0001EE42\\U0001EE47\\U0001EE49\\U0001EE4B\\U0001EE4D-\\U0001EE4F\\U0001EE51\\U0001EE52\\U0001EE54\\U0001EE57\\U0001EE59\\U0001EE5B\\U0001EE5D\\U0001EE5F\\U0001EE61\\U0001EE62\\U0001EE64\\U0001EE67-\\U0001EE6A\\U0001EE6C-\\U0001EE72\\U0001EE74-\\U0001EE77\\U0001EE79-\\U0001EE7C\\U0001EE7E\\U0001EE80-\\U0001EE89\\U0001EE8B-\\U0001EE9B\\U0001EEA1-\\U0001EEA3\\U0001EEA5-\\U0001EEA9\\U0001EEAB-\\U0001EEBB\\U0001F100-\\U0001F10C\\U0001FBF0-\\U0001FBF9\\U00020000-\\U0002A6DF\\U0002A700-\\U0002B738\\U0002B740-\\U0002B81D\\U0002B820-\\U0002CEA1\\U0002CEB0-\\U0002EBE0\\U0002F800-\\U0002FA1D\\U00030000-\\U0003134A.,?!"\'=$|<>[\\]{}-]';
let pattern = pattern_from_uu.replace(/\\U000([a-f\d]+)/gi, "\\u{$1}");
console.log("Your regex is:\n/" + pattern + "/gu");
const texts = ["Test=😕查看","°°^ Marting 10202029 Offline!\"§$%&/()!\"§$%&/()After this we want to keep the allowed special chars: .,-?!\"'=$|<>[]{}"];
const reg = new RegExp(pattern, "gu")
for (const text of texts) {
console.log(text.replace(reg, ""));
}
See the generated regex demo.

Regex .match() not finding matches even though it should

I'm trying to scan a .txt file in a node.js script, and scan its contents for certain pieces of data. The lines I'm interested in getting look mostly like this:
DIBH91643 5/10/2019 108,75
SIR108811 5/10/2019 187,50
SIR108845 5/10/2019 63,75
So I've been trying to match them with a regex without succes. Using a regex testing site, I've even confirmed the fact that it should find the matches I'm looking for, but it always returns null when I call data.match(regex). I'm probably missing something basic here, but I can't figure it out for the life of me. This is the code I'm using (in its entirety, since there isn't much):
var fs = require('fs');
let regex = /\w*?(\d+)\s+(\d+\/\d+\/\d+)\s+(\-{0,1}\d+\,\d+)/g;
let ihateregex = /91/g;
fs.readFile('pathToFile/fileToRead.txt',{encoding: 'utf-8'}, (err, data) => {
var result = data.match(regex);
console.log(result);
});
As shown, even an attempt with a simple pattern that is definitely inside the file still returns null. I have looked into other answers here for similar problems, and they all point to deleting bytes from the beginning of the file. I have used vim -b to delete the first 2 bytes - which did look out of place and furthermore printing the entire data with console.log() did actually show 2 weird characters in the beginning of the file, but I get the exact same error.
I can't figure out what I'm missing here.
Try the following regex:
/^[A-Z]*(\d+)\s+(\d+\/\d+\/\d+)\s+(-?\d+,\d+)/gm
Improvements compared to your regex:
^ - start from the start of line,
[A-Z]* instead of \w*? - note that \w matches also digits,
removed / in front of - and ,,
? instead of {0,1},
added m option (I assume that you want to process all rows, not the first only).
To process the matches I used the following code, using rextester.com, so
instead of e.g. console.log(...) it contains print(...):
let data = 'DIBH91643 5/10/2019 108,75\nSIR108811 5/10/2019 187,50\nSIR108845 5/10/2019 63,75'
print("Data: ")
print(data)
let re = /^[A-Z]*(\d+)\s+(\d+\/\d+\/\d+)\s+(-?\d+,\d+)/gm
print("Result: ")
while ((matches = re.exec(data)) != null) {
print(matches[1], '_', matches[2], '_', matches[3])
}
For a working example see https://rextester.com/PZU21213
So I've finally figured out what went wrong and I feel extremely stupid for taking so long to figure it out. One thing I've failed to mention even though I should have is that the file I'm reading is one created by an OCR program. An OCR program which, apparently, added an invisible char between each character in the text file, that I only saw when I switched to php (fopen(), fgets(), fclose()) and looked at the source of the page I made.
Once I copied the contents of fileToRead.txt into a newly created fileToRead2.txt (simple copy-paste), it worked perfectly.

Negative Look-ahead in Regex doesn't seem to be working

I'm attempting to use Regex to extract a sub-domain from a url that follows a strict pattern. I want to only match urls with subdomains specified, so I'm using a negative look-ahead. This seems to work in many regex evaluators, but when I run in node, both strings get matched. Here's the code:
const defaultDomain = 'https://xyz.domain.com';
const scopedDomain = 'https://xyz.subdomain.domain.com';
const regex = /^https:\/\/xyz\.([^.]+(?!domain))\./
const matchPrefix1 = defaultDomain.match(regex);
const matchPrefix2 = scopedDomain.match(regex);
console.log(matchPrefix1);
console.log(matchPrefix2);
Expected: matchPrefix1 is null and matchPrefix2 results in a match where the first capture group is 'subdomain'
Actual: both matchPrefix1 and matchPrefix2 contain data, with the capture groups coming back as 'domain' and 'subdomain' respectively
Link to regexr (works!): https://regexr.com/42bfn
Link to repl (does not work): https://repl.it/#tomismore/SpiffyFrivolousLaws
What's going on here?
Regexr shows your code working because you didn't add the multiline flag. This causes the start of the second line to not match ^, so the whole second line is ignored. Add the multiline flag to see your regex not working.
I would use this regex:
^https:\/\/xyz\.(?!domain)([^.]+)\.
The change I made is to move the [^.]+ part to after checking (?!domain). Basically, you should check for (?!domain) immediately after matching xyz\..

Remove filename.extension in url using nodejs

I'm a beginner in nodejs, so please excuse if this question already answered. I tried multiple methods but didn't work for me.
I'm trying to remove the filename.extension in http url
For example:
http://somedomain.com/path1/path2/path3/myfile.txt
to
http://somedomain.com/path1/path2/path3/
The filename in the url is dynamic, so I cannot use "myfile.txt" explicitly in the code.
I'm not using any web frameworks, But I have http://stringjs.com library
Using a Regular Expression:
'http://somedomain.com/hello1/hello2/hello3/myfile.txt'.replace(/\/\w+\.\w+$/, '');
The regular expression matches two strings separated by a . and preceeded by a / (/myfile.txt in your case), which is then being replaced by an empty string. This method works in node as well as in pure javascript.
Using node.js' path module:
let path = require('path');
let parsed = path.parse('http://somedomain.com/hello1/hello2/hello3/myfile.txt');
console.log(parsed.dir) // => http://somedomain.com/hello1/hello2/hello3
node.js has a built-in module for parsing paths. It is however not made for parsing URIs, but should work just fine in your case.
Splitting, Slicing and Joining
let url = 'http://somedomain.com/hello1/hello2/hello3/myfile.txt';
url.split('/').slice(0, -1).join('/');
Split the url at every /, remove the last element from the resulting array (myfile.txt) and join them back together with / as the separator.
You can do it like this:
var path = 'http://somedomain.com/hello1/hello2/hello3/myfile.txt';
path = path.split('/');
path.splice(path.length-1,1);
path = path.join('/');

how to deal with special characters in nodejs fs readdir function

I'm reading a directory in nodejs using the fs.readdir() function. You feed it a string containing a path and it returns an array containing all the files inside that directory path in string format. It does not work for me with special characters (like ï).
I came across this similar issue, however I am on OS X).
First I created a new dir called encoding and created a file called maïs.md (with my editor Sublime Text).
fs.readdir('encoding', function(err, files) {
console.log(files); // [ 'maïs.md' ]
console.log(files[0]); // maïs.md
console.log(files[0] === 'maïs.md'); // false
console.log(files[0] == 'maïs.md'); // false
console.log(files[0].toString('utf8') === 'maïs.md'); // false
});
The above test works correctly for files without special characters. How can I compare this correctly?
you character seems to be this one. You should try with
(1) console.log(files[0] == 'ma\u00EF;s.md');
(2) console.log(files[0] == 'mai\u0308;s.md');
If (1) works it could mean that the file containing your code is not saved in utf-8 format, so the node.js engine does not interpret correctly the ï character in your code.
If (2) works it could mean that the file system gives to the node engine the ï character in its decomposed unicode form (i followed by a diacritic ¨). cf #thejh answer
In this (2) case, use the unorm library available on npm to normalize the strings before comparing them (or the original UnicodeNormalizer)
https://apple.stackexchange.com/a/10484/23863 looks relevant – it's probably because there are different ways to express ï in utf8.

Resources