How can I transform RegEx categories into plain RegEx? - node.js

This question is based on this question. During coding, I got some new things popping up and because the initial question is properly answered, I want to describe my issues in this question.
My goal is to have a RegEx which filters out everything, instead of some special requirements:
Alphanumeric allowed
non-Lating e.g. Chinese or Japanese allowed
.,-?!"'=$|<>[]{} allowed
Works with NodeJS 8.9.4
During implementation of the answer from the main question, I've found out, that this only works with newer Node versions (because of the supported ES version). Sadly, our project runs on 8.9.4 which can't be changed in any way. So upgrading is not an option.
I've started searching around and found this page: https://github.com/slevithan/xregexp/blob/master/tools/output/categories.js
With the help of another question, I've tried to build something together which matches my requirements. I came out with:
/[^\(?:[A-Za-z\xAA\xB5\xBA\xC0-\xD6\xD8-\xF6\xF8-\u02C1\u02C6-\u02D1\u02E0-\u02E4\u02EC\u02EE\u0370-\u0374\u0376\u0377\u037A-\u037D\u037F\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03F5\u03F7-\u0481\u048A-\u052F\u0531-\u0556\u0559\u0560-\u0588\u05D0-\u05EA\u05EF-\u05F2\u0620-\u064A\u066E\u066F\u0671-\u06D3\u06D5\u06E5\u06E6\u06EE\u06EF\u06FA-\u06FC\u06FF\u0710\u0712-\u072F\u074D-\u07A5\u07B1\u07CA-\u07EA\u07F4\u07F5\u07FA\u0800-\u0815\u081A\u0824\u0828\u0840-\u0858\u0860-\u086A\u0870-\u0887\u0889-\u088E\u08A0-\u08C9\u0904-\u0939\u093D\u0950\u0958-\u0961\u0971-\u0980\u0985-\u098C\u098F\u0990\u0993-\u09A8\u09AA-\u09B0\u09B2\u09B6-\u09B9\u09BD\u09CE\u09DC\u09DD\u09DF-\u09E1\u09F0\u09F1\u09FC\u0A05-\u0A0A\u0A0F\u0A10\u0A13-\u0A28\u0A2A-\u0A30\u0A32\u0A33\u0A35\u0A36\u0A38\u0A39\u0A59-\u0A5C\u0A5E\u0A72-\u0A74\u0A85-\u0A8D\u0A8F-\u0A91\u0A93-\u0AA8\u0AAA-\u0AB0\u0AB2\u0AB3\u0AB5-\u0AB9\u0ABD\u0AD0\u0AE0\u0AE1\u0AF9\u0B05-\u0B0C\u0B0F\u0B10\u0B13-\u0B28\u0B2A-\u0B30\u0B32\u0B33\u0B35-\u0B39\u0B3D\u0B5C\u0B5D\u0B5F-\u0B61\u0B71\u0B83\u0B85-\u0B8A\u0B8E-\u0B90\u0B92-\u0B95\u0B99\u0B9A\u0B9C\u0B9E\u0B9F\u0BA3\u0BA4\u0BA8-\u0BAA\u0BAE-\u0BB9\u0BD0\u0C05-\u0C0C\u0C0E-\u0C10\u0C12-\u0C28\u0C2A-\u0C39\u0C3D\u0C58-\u0C5A\u0C5D\u0C60\u0C61\u0C80\u0C85-\u0C8C\u0C8E-\u0C90\u0C92-\u0CA8\u0CAA-\u0CB3\u0CB5-\u0CB9\u0CBD\u0CDD\u0CDE\u0CE0\u0CE1\u0CF1\u0CF2\u0D04-\u0D0C\u0D0E-\u0D10\u0D12-\u0D3A\u0D3D\u0D4E\u0D54-\u0D56\u0D5F-\u0D61\u0D7A-\u0D7F\u0D85-\u0D96\u0D9A-\u0DB1\u0DB3-\u0DBB\u0DBD\u0DC0-\u0DC6\u0E01-\u0E30\u0E32\u0E33\u0E40-\u0E46\u0E81\u0E82\u0E84\u0E86-\u0E8A\u0E8C-\u0EA3\u0EA5\u0EA7-\u0EB0\u0EB2\u0EB3\u0EBD\u0EC0-\u0EC4\u0EC6\u0EDC-\u0EDF\u0F00\u0F40-\u0F47\u0F49-\u0F6C\u0F88-\u0F8C\u1000-\u102A\u103F\u1050-\u1055\u105A-\u105D\u1061\u1065\u1066\u106E-\u1070\u1075-\u1081\u108E\u10A0-\u10C5\u10C7\u10CD\u10D0-\u10FA\u10FC-\u1248\u124A-\u124D\u1250-\u1256\u1258\u125A-\u125D\u1260-\u1288\u128A-\u128D\u1290-\u12B0\u12B2-\u12B5\u12B8-\u12BE\u12C0\u12C2-\u12C5\u12C8-\u12D6\u12D8-\u1310\u1312-\u1315\u1318-\u135A\u1380-\u138F\u13A0-\u13F5\u13F8-\u13FD\u1401-\u166C\u166F-\u167F\u1681-\u169A\u16A0-\u16EA\u16F1-\u16F8\u1700-\u1711\u171F-\u1731\u1740-\u1751\u1760-\u176C\u176E-\u1770\u1780-\u17B3\u17D7\u17DC\u1820-\u1878\u1880-\u1884\u1887-\u18A8\u18AA\u18B0-\u18F5\u1900-\u191E\u1950-\u196D\u1970-\u1974\u1980-\u19AB\u19B0-\u19C9\u1A00-\u1A16\u1A20-\u1A54\u1AA7\u1B05-\u1B33\u1B45-\u1B4C\u1B83-\u1BA0\u1BAE\u1BAF\u1BBA-\u1BE5\u1C00-\u1C23\u1C4D-\u1C4F\u1C5A-\u1C7D\u1C80-\u1C88]+/g
My current example string is:
Test=😕查看
°°^ Marting 10202029 Offline!"§$%&/()!"§$%&/()After this we want to keep the allowed special chars: .,-?!"'=$|<>[]{}
Somehow, the answer from the first question works better as the parsed categories from me. So there must be something wrong, which I'm unable to find.
At the end, I want to put everything inside a var.replace() command, to replace everything bad with a single whitespace.
For testing, I'm using: https://regexr.com/

You can either use regexpu to transpile the regex into an ES6-compliant regex, or you may go to the Unicode Utilities: UnicodeSet page and get the code point ranges manually.
In your case, paste [^\p{L}\p{N}] into the Input field, check Abbreviate and Escape, then click Show Set. Add the .,?!"'=$|<>[\]{}- at the end of the character class. Then, double the backslashes (also, escape the ' or ", your string literal delimiter char, I escaped ' below) and put inside the pattern_from_uu variable definition in this JavaScript code and then, all you need to define the regex is const reg = new RegExp(pattern, "gu") or const reg = new RegExp(pattern, "u"):
const pattern_from_uu = '[^0-9A-Za-z\\u00AA\\u00B2\\u00B3\\u00B5\\u00B9\\u00BA\\u00BC-\\u00BE\\u00C0-\\u00D6\\u00D8-\\u00F6\\u00F8-\\u02C1\\u02C6-\\u02D1\\u02E0-\\u02E4\\u02EC\\u02EE\\u0370-\\u0374\\u0376\\u0377\\u037A-\\u037D\\u037F\\u0386\\u0388-\\u038A\\u038C\\u038E-\\u03A1\\u03A3-\\u03F5\\u03F7-\\u0481\\u048A-\\u052F\\u0531-\\u0556\\u0559\\u0560-\\u0588\\u05D0-\\u05EA\\u05EF-\\u05F2\\u0620-\\u064A\\u0660-\\u0669\\u066E\\u066F\\u0671-\\u06D3\\u06D5\\u06E5\\u06E6\\u06EE-\\u06FC\\u06FF\\u0710\\u0712-\\u072F\\u074D-\\u07A5\\u07B1\\u07C0-\\u07EA\\u07F4\\u07F5\\u07FA\\u0800-\\u0815\\u081A\\u0824\\u0828\\u0840-\\u0858\\u0860-\\u086A\\u0870-\\u0887\\u0889-\\u088E\\u08A0-\\u08C9\\u0904-\\u0939\\u093D\\u0950\\u0958-\\u0961\\u0966-\\u096F\\u0971-\\u0980\\u0985-\\u098C\\u098F\\u0990\\u0993-\\u09A8\\u09AA-\\u09B0\\u09B2\\u09B6-\\u09B9\\u09BD\\u09CE\\u09DC\\u09DD\\u09DF-\\u09E1\\u09E6-\\u09F1\\u09F4-\\u09F9\\u09FC\\u0A05-\\u0A0A\\u0A0F\\u0A10\\u0A13-\\u0A28\\u0A2A-\\u0A30\\u0A32\\u0A33\\u0A35\\u0A36\\u0A38\\u0A39\\u0A59-\\u0A5C\\u0A5E\\u0A66-\\u0A6F\\u0A72-\\u0A74\\u0A85-\\u0A8D\\u0A8F-\\u0A91\\u0A93-\\u0AA8\\u0AAA-\\u0AB0\\u0AB2\\u0AB3\\u0AB5-\\u0AB9\\u0ABD\\u0AD0\\u0AE0\\u0AE1\\u0AE6-\\u0AEF\\u0AF9\\u0B05-\\u0B0C\\u0B0F\\u0B10\\u0B13-\\u0B28\\u0B2A-\\u0B30\\u0B32\\u0B33\\u0B35-\\u0B39\\u0B3D\\u0B5C\\u0B5D\\u0B5F-\\u0B61\\u0B66-\\u0B6F\\u0B71-\\u0B77\\u0B83\\u0B85-\\u0B8A\\u0B8E-\\u0B90\\u0B92-\\u0B95\\u0B99\\u0B9A\\u0B9C\\u0B9E\\u0B9F\\u0BA3\\u0BA4\\u0BA8-\\u0BAA\\u0BAE-\\u0BB9\\u0BD0\\u0BE6-\\u0BF2\\u0C05-\\u0C0C\\u0C0E-\\u0C10\\u0C12-\\u0C28\\u0C2A-\\u0C39\\u0C3D\\u0C58-\\u0C5A\\u0C5D\\u0C60\\u0C61\\u0C66-\\u0C6F\\u0C78-\\u0C7E\\u0C80\\u0C85-\\u0C8C\\u0C8E-\\u0C90\\u0C92-\\u0CA8\\u0CAA-\\u0CB3\\u0CB5-\\u0CB9\\u0CBD\\u0CDD\\u0CDE\\u0CE0\\u0CE1\\u0CE6-\\u0CEF\\u0CF1\\u0CF2\\u0D04-\\u0D0C\\u0D0E-\\u0D10\\u0D12-\\u0D3A\\u0D3D\\u0D4E\\u0D54-\\u0D56\\u0D58-\\u0D61\\u0D66-\\u0D78\\u0D7A-\\u0D7F\\u0D85-\\u0D96\\u0D9A-\\u0DB1\\u0DB3-\\u0DBB\\u0DBD\\u0DC0-\\u0DC6\\u0DE6-\\u0DEF\\u0E01-\\u0E30\\u0E32\\u0E33\\u0E40-\\u0E46\\u0E50-\\u0E59\\u0E81\\u0E82\\u0E84\\u0E86-\\u0E8A\\u0E8C-\\u0EA3\\u0EA5\\u0EA7-\\u0EB0\\u0EB2\\u0EB3\\u0EBD\\u0EC0-\\u0EC4\\u0EC6\\u0ED0-\\u0ED9\\u0EDC-\\u0EDF\\u0F00\\u0F20-\\u0F33\\u0F40-\\u0F47\\u0F49-\\u0F6C\\u0F88-\\u0F8C\\u1000-\\u102A\\u103F-\\u1049\\u1050-\\u1055\\u105A-\\u105D\\u1061\\u1065\\u1066\\u106E-\\u1070\\u1075-\\u1081\\u108E\\u1090-\\u1099\\u10A0-\\u10C5\\u10C7\\u10CD\\u10D0-\\u10FA\\u10FC-\\u1248\\u124A-\\u124D\\u1250-\\u1256\\u1258\\u125A-\\u125D\\u1260-\\u1288\\u128A-\\u128D\\u1290-\\u12B0\\u12B2-\\u12B5\\u12B8-\\u12BE\\u12C0\\u12C2-\\u12C5\\u12C8-\\u12D6\\u12D8-\\u1310\\u1312-\\u1315\\u1318-\\u135A\\u1369-\\u137C\\u1380-\\u138F\\u13A0-\\u13F5\\u13F8-\\u13FD\\u1401-\\u166C\\u166F-\\u167F\\u1681-\\u169A\\u16A0-\\u16EA\\u16EE-\\u16F8\\u1700-\\u1711\\u171F-\\u1731\\u1740-\\u1751\\u1760-\\u176C\\u176E-\\u1770\\u1780-\\u17B3\\u17D7\\u17DC\\u17E0-\\u17E9\\u17F0-\\u17F9\\u1810-\\u1819\\u1820-\\u1878\\u1880-\\u1884\\u1887-\\u18A8\\u18AA\\u18B0-\\u18F5\\u1900-\\u191E\\u1946-\\u196D\\u1970-\\u1974\\u1980-\\u19AB\\u19B0-\\u19C9\\u19D0-\\u19DA\\u1A00-\\u1A16\\u1A20-\\u1A54\\u1A80-\\u1A89\\u1A90-\\u1A99\\u1AA7\\u1B05-\\u1B33\\u1B45-\\u1B4C\\u1B50-\\u1B59\\u1B83-\\u1BA0\\u1BAE-\\u1BE5\\u1C00-\\u1C23\\u1C40-\\u1C49\\u1C4D-\\u1C7D\\u1C80-\\u1C88\\u1C90-\\u1CBA\\u1CBD-\\u1CBF\\u1CE9-\\u1CEC\\u1CEE-\\u1CF3\\u1CF5\\u1CF6\\u1CFA\\u1D00-\\u1DBF\\u1E00-\\u1F15\\u1F18-\\u1F1D\\u1F20-\\u1F45\\u1F48-\\u1F4D\\u1F50-\\u1F57\\u1F59\\u1F5B\\u1F5D\\u1F5F-\\u1F7D\\u1F80-\\u1FB4\\u1FB6-\\u1FBC\\u1FBE\\u1FC2-\\u1FC4\\u1FC6-\\u1FCC\\u1FD0-\\u1FD3\\u1FD6-\\u1FDB\\u1FE0-\\u1FEC\\u1FF2-\\u1FF4\\u1FF6-\\u1FFC\\u2070\\u2071\\u2074-\\u2079\\u207F-\\u2089\\u2090-\\u209C\\u2102\\u2107\\u210A-\\u2113\\u2115\\u2119-\\u211D\\u2124\\u2126\\u2128\\u212A-\\u212D\\u212F-\\u2139\\u213C-\\u213F\\u2145-\\u2149\\u214E\\u2150-\\u2189\\u2460-\\u249B\\u24EA-\\u24FF\\u2776-\\u2793\\u2C00-\\u2CE4\\u2CEB-\\u2CEE\\u2CF2\\u2CF3\\u2CFD\\u2D00-\\u2D25\\u2D27\\u2D2D\\u2D30-\\u2D67\\u2D6F\\u2D80-\\u2D96\\u2DA0-\\u2DA6\\u2DA8-\\u2DAE\\u2DB0-\\u2DB6\\u2DB8-\\u2DBE\\u2DC0-\\u2DC6\\u2DC8-\\u2DCE\\u2DD0-\\u2DD6\\u2DD8-\\u2DDE\\u2E2F\\u3005-\\u3007\\u3021-\\u3029\\u3031-\\u3035\\u3038-\\u303C\\u3041-\\u3096\\u309D-\\u309F\\u30A1-\\u30FA\\u30FC-\\u30FF\\u3105-\\u312F\\u3131-\\u318E\\u3192-\\u3195\\u31A0-\\u31BF\\u31F0-\\u31FF\\u3220-\\u3229\\u3248-\\u324F\\u3251-\\u325F\\u3280-\\u3289\\u32B1-\\u32BF\\u3400-\\u4DBF\\u4E00-\\uA48C\\uA4D0-\\uA4FD\\uA500-\\uA60C\\uA610-\\uA62B\\uA640-\\uA66E\\uA67F-\\uA69D\\uA6A0-\\uA6EF\\uA717-\\uA71F\\uA722-\\uA788\\uA78B-\\uA7CA\\uA7D0\\uA7D1\\uA7D3\\uA7D5-\\uA7D9\\uA7F2-\\uA801\\uA803-\\uA805\\uA807-\\uA80A\\uA80C-\\uA822\\uA830-\\uA835\\uA840-\\uA873\\uA882-\\uA8B3\\uA8D0-\\uA8D9\\uA8F2-\\uA8F7\\uA8FB\\uA8FD\\uA8FE\\uA900-\\uA925\\uA930-\\uA946\\uA960-\\uA97C\\uA984-\\uA9B2\\uA9CF-\\uA9D9\\uA9E0-\\uA9E4\\uA9E6-\\uA9FE\\uAA00-\\uAA28\\uAA40-\\uAA42\\uAA44-\\uAA4B\\uAA50-\\uAA59\\uAA60-\\uAA76\\uAA7A\\uAA7E-\\uAAAF\\uAAB1\\uAAB5\\uAAB6\\uAAB9-\\uAABD\\uAAC0\\uAAC2\\uAADB-\\uAADD\\uAAE0-\\uAAEA\\uAAF2-\\uAAF4\\uAB01-\\uAB06\\uAB09-\\uAB0E\\uAB11-\\uAB16\\uAB20-\\uAB26\\uAB28-\\uAB2E\\uAB30-\\uAB5A\\uAB5C-\\uAB69\\uAB70-\\uABE2\\uABF0-\\uABF9\\uAC00-\\uD7A3\\uD7B0-\\uD7C6\\uD7CB-\\uD7FB\\uF900-\\uFA6D\\uFA70-\\uFAD9\\uFB00-\\uFB06\\uFB13-\\uFB17\\uFB1D\\uFB1F-\\uFB28\\uFB2A-\\uFB36\\uFB38-\\uFB3C\\uFB3E\\uFB40\\uFB41\\uFB43\\uFB44\\uFB46-\\uFBB1\\uFBD3-\\uFD3D\\uFD50-\\uFD8F\\uFD92-\\uFDC7\\uFDF0-\\uFDFB\\uFE70-\\uFE74\\uFE76-\\uFEFC\\uFF10-\\uFF19\\uFF21-\\uFF3A\\uFF41-\\uFF5A\\uFF66-\\uFFBE\\uFFC2-\\uFFC7\\uFFCA-\\uFFCF\\uFFD2-\\uFFD7\\uFFDA-\\uFFDC\\U00010000-\\U0001000B\\U0001000D-\\U00010026\\U00010028-\\U0001003A\\U0001003C\\U0001003D\\U0001003F-\\U0001004D\\U00010050-\\U0001005D\\U00010080-\\U000100FA\\U00010107-\\U00010133\\U00010140-\\U00010178\\U0001018A\\U0001018B\\U00010280-\\U0001029C\\U000102A0-\\U000102D0\\U000102E1-\\U000102FB\\U00010300-\\U00010323\\U0001032D-\\U0001034A\\U00010350-\\U00010375\\U00010380-\\U0001039D\\U000103A0-\\U000103C3\\U000103C8-\\U000103CF\\U000103D1-\\U000103D5\\U00010400-\\U0001049D\\U000104A0-\\U000104A9\\U000104B0-\\U000104D3\\U000104D8-\\U000104FB\\U00010500-\\U00010527\\U00010530-\\U00010563\\U00010570-\\U0001057A\\U0001057C-\\U0001058A\\U0001058C-\\U00010592\\U00010594\\U00010595\\U00010597-\\U000105A1\\U000105A3-\\U000105B1\\U000105B3-\\U000105B9\\U000105BB\\U000105BC\\U00010600-\\U00010736\\U00010740-\\U00010755\\U00010760-\\U00010767\\U00010780-\\U00010785\\U00010787-\\U000107B0\\U000107B2-\\U000107BA\\U00010800-\\U00010805\\U00010808\\U0001080A-\\U00010835\\U00010837\\U00010838\\U0001083C\\U0001083F-\\U00010855\\U00010858-\\U00010876\\U00010879-\\U0001089E\\U000108A7-\\U000108AF\\U000108E0-\\U000108F2\\U000108F4\\U000108F5\\U000108FB-\\U0001091B\\U00010920-\\U00010939\\U00010980-\\U000109B7\\U000109BC-\\U000109CF\\U000109D2-\\U00010A00\\U00010A10-\\U00010A13\\U00010A15-\\U00010A17\\U00010A19-\\U00010A35\\U00010A40-\\U00010A48\\U00010A60-\\U00010A7E\\U00010A80-\\U00010A9F\\U00010AC0-\\U00010AC7\\U00010AC9-\\U00010AE4\\U00010AEB-\\U00010AEF\\U00010B00-\\U00010B35\\U00010B40-\\U00010B55\\U00010B58-\\U00010B72\\U00010B78-\\U00010B91\\U00010BA9-\\U00010BAF\\U00010C00-\\U00010C48\\U00010C80-\\U00010CB2\\U00010CC0-\\U00010CF2\\U00010CFA-\\U00010D23\\U00010D30-\\U00010D39\\U00010E60-\\U00010E7E\\U00010E80-\\U00010EA9\\U00010EB0\\U00010EB1\\U00010F00-\\U00010F27\\U00010F30-\\U00010F45\\U00010F51-\\U00010F54\\U00010F70-\\U00010F81\\U00010FB0-\\U00010FCB\\U00010FE0-\\U00010FF6\\U00011003-\\U00011037\\U00011052-\\U0001106F\\U00011071\\U00011072\\U00011075\\U00011083-\\U000110AF\\U000110D0-\\U000110E8\\U000110F0-\\U000110F9\\U00011103-\\U00011126\\U00011136-\\U0001113F\\U00011144\\U00011147\\U00011150-\\U00011172\\U00011176\\U00011183-\\U000111B2\\U000111C1-\\U000111C4\\U000111D0-\\U000111DA\\U000111DC\\U000111E1-\\U000111F4\\U00011200-\\U00011211\\U00011213-\\U0001122B\\U00011280-\\U00011286\\U00011288\\U0001128A-\\U0001128D\\U0001128F-\\U0001129D\\U0001129F-\\U000112A8\\U000112B0-\\U000112DE\\U000112F0-\\U000112F9\\U00011305-\\U0001130C\\U0001130F\\U00011310\\U00011313-\\U00011328\\U0001132A-\\U00011330\\U00011332\\U00011333\\U00011335-\\U00011339\\U0001133D\\U00011350\\U0001135D-\\U00011361\\U00011400-\\U00011434\\U00011447-\\U0001144A\\U00011450-\\U00011459\\U0001145F-\\U00011461\\U00011480-\\U000114AF\\U000114C4\\U000114C5\\U000114C7\\U000114D0-\\U000114D9\\U00011580-\\U000115AE\\U000115D8-\\U000115DB\\U00011600-\\U0001162F\\U00011644\\U00011650-\\U00011659\\U00011680-\\U000116AA\\U000116B8\\U000116C0-\\U000116C9\\U00011700-\\U0001171A\\U00011730-\\U0001173B\\U00011740-\\U00011746\\U00011800-\\U0001182B\\U000118A0-\\U000118F2\\U000118FF-\\U00011906\\U00011909\\U0001190C-\\U00011913\\U00011915\\U00011916\\U00011918-\\U0001192F\\U0001193F\\U00011941\\U00011950-\\U00011959\\U000119A0-\\U000119A7\\U000119AA-\\U000119D0\\U000119E1\\U000119E3\\U00011A00\\U00011A0B-\\U00011A32\\U00011A3A\\U00011A50\\U00011A5C-\\U00011A89\\U00011A9D\\U00011AB0-\\U00011AF8\\U00011C00-\\U00011C08\\U00011C0A-\\U00011C2E\\U00011C40\\U00011C50-\\U00011C6C\\U00011C72-\\U00011C8F\\U00011D00-\\U00011D06\\U00011D08\\U00011D09\\U00011D0B-\\U00011D30\\U00011D46\\U00011D50-\\U00011D59\\U00011D60-\\U00011D65\\U00011D67\\U00011D68\\U00011D6A-\\U00011D89\\U00011D98\\U00011DA0-\\U00011DA9\\U00011EE0-\\U00011EF2\\U00011FB0\\U00011FC0-\\U00011FD4\\U00012000-\\U00012399\\U00012400-\\U0001246E\\U00012480-\\U00012543\\U00012F90-\\U00012FF0\\U00013000-\\U0001342E\\U00014400-\\U00014646\\U00016800-\\U00016A38\\U00016A40-\\U00016A5E\\U00016A60-\\U00016A69\\U00016A70-\\U00016ABE\\U00016AC0-\\U00016AC9\\U00016AD0-\\U00016AED\\U00016B00-\\U00016B2F\\U00016B40-\\U00016B43\\U00016B50-\\U00016B59\\U00016B5B-\\U00016B61\\U00016B63-\\U00016B77\\U00016B7D-\\U00016B8F\\U00016E40-\\U00016E96\\U00016F00-\\U00016F4A\\U00016F50\\U00016F93-\\U00016F9F\\U00016FE0\\U00016FE1\\U00016FE3\\U00017000-\\U000187F7\\U00018800-\\U00018CD5\\U00018D00-\\U00018D08\\U0001AFF0-\\U0001AFF3\\U0001AFF5-\\U0001AFFB\\U0001AFFD\\U0001AFFE\\U0001B000-\\U0001B122\\U0001B150-\\U0001B152\\U0001B164-\\U0001B167\\U0001B170-\\U0001B2FB\\U0001BC00-\\U0001BC6A\\U0001BC70-\\U0001BC7C\\U0001BC80-\\U0001BC88\\U0001BC90-\\U0001BC99\\U0001D2E0-\\U0001D2F3\\U0001D360-\\U0001D378\\U0001D400-\\U0001D454\\U0001D456-\\U0001D49C\\U0001D49E\\U0001D49F\\U0001D4A2\\U0001D4A5\\U0001D4A6\\U0001D4A9-\\U0001D4AC\\U0001D4AE-\\U0001D4B9\\U0001D4BB\\U0001D4BD-\\U0001D4C3\\U0001D4C5-\\U0001D505\\U0001D507-\\U0001D50A\\U0001D50D-\\U0001D514\\U0001D516-\\U0001D51C\\U0001D51E-\\U0001D539\\U0001D53B-\\U0001D53E\\U0001D540-\\U0001D544\\U0001D546\\U0001D54A-\\U0001D550\\U0001D552-\\U0001D6A5\\U0001D6A8-\\U0001D6C0\\U0001D6C2-\\U0001D6DA\\U0001D6DC-\\U0001D6FA\\U0001D6FC-\\U0001D714\\U0001D716-\\U0001D734\\U0001D736-\\U0001D74E\\U0001D750-\\U0001D76E\\U0001D770-\\U0001D788\\U0001D78A-\\U0001D7A8\\U0001D7AA-\\U0001D7C2\\U0001D7C4-\\U0001D7CB\\U0001D7CE-\\U0001D7FF\\U0001DF00-\\U0001DF1E\\U0001E100-\\U0001E12C\\U0001E137-\\U0001E13D\\U0001E140-\\U0001E149\\U0001E14E\\U0001E290-\\U0001E2AD\\U0001E2C0-\\U0001E2EB\\U0001E2F0-\\U0001E2F9\\U0001E7E0-\\U0001E7E6\\U0001E7E8-\\U0001E7EB\\U0001E7ED\\U0001E7EE\\U0001E7F0-\\U0001E7FE\\U0001E800-\\U0001E8C4\\U0001E8C7-\\U0001E8CF\\U0001E900-\\U0001E943\\U0001E94B\\U0001E950-\\U0001E959\\U0001EC71-\\U0001ECAB\\U0001ECAD-\\U0001ECAF\\U0001ECB1-\\U0001ECB4\\U0001ED01-\\U0001ED2D\\U0001ED2F-\\U0001ED3D\\U0001EE00-\\U0001EE03\\U0001EE05-\\U0001EE1F\\U0001EE21\\U0001EE22\\U0001EE24\\U0001EE27\\U0001EE29-\\U0001EE32\\U0001EE34-\\U0001EE37\\U0001EE39\\U0001EE3B\\U0001EE42\\U0001EE47\\U0001EE49\\U0001EE4B\\U0001EE4D-\\U0001EE4F\\U0001EE51\\U0001EE52\\U0001EE54\\U0001EE57\\U0001EE59\\U0001EE5B\\U0001EE5D\\U0001EE5F\\U0001EE61\\U0001EE62\\U0001EE64\\U0001EE67-\\U0001EE6A\\U0001EE6C-\\U0001EE72\\U0001EE74-\\U0001EE77\\U0001EE79-\\U0001EE7C\\U0001EE7E\\U0001EE80-\\U0001EE89\\U0001EE8B-\\U0001EE9B\\U0001EEA1-\\U0001EEA3\\U0001EEA5-\\U0001EEA9\\U0001EEAB-\\U0001EEBB\\U0001F100-\\U0001F10C\\U0001FBF0-\\U0001FBF9\\U00020000-\\U0002A6DF\\U0002A700-\\U0002B738\\U0002B740-\\U0002B81D\\U0002B820-\\U0002CEA1\\U0002CEB0-\\U0002EBE0\\U0002F800-\\U0002FA1D\\U00030000-\\U0003134A.,?!"\'=$|<>[\\]{}-]';
let pattern = pattern_from_uu.replace(/\\U000([a-f\d]+)/gi, "\\u{$1}");
console.log("Your regex is:\n/" + pattern + "/gu");
const texts = ["Test=😕查看","°°^ Marting 10202029 Offline!\"§$%&/()!\"§$%&/()After this we want to keep the allowed special chars: .,-?!\"'=$|<>[]{}"];
const reg = new RegExp(pattern, "gu")
for (const text of texts) {
console.log(text.replace(reg, ""));
}
See the generated regex demo.

Related

How can I remove all characters inside angle brackets python?

How can I remove all characters inside angle brackets including the brackets in a string? How can I also remove all the text between ("\r\n") and ("."+"any 3 characters") Is this possible? I am currently using the solution by #xkcdjerry
e.g
body = """Dear Students roads etc. you place a tree take a snapshot, then when you place a\r\nbuilding, take a snapshot. Place at least 5-6 objects and then have 5-6\r\nsnapshots. Please keep these snapshots with you as everyone will be asked\r\nto share them during the class.\r\n\r\nI am attaching one PowerPoint containing instructions and one video of\r\nexplanation for your reference.\r\n\r\nKind regards,\r\nTeacher Name\r\n zoom_0.mp4\r\n<https://drive.google.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>"""
d = re.compile("\r\n.+?\\....")
body = d.sub('', body)
a = re.compile("<.*?>")
body = a.sub('', body)
print(body)```
For some reason the output is fine except that it has:
```gle.com/file/d/1UX-klOfVhbefvbhZvIWijaBdQuLgh_-Uru4_1QTkth/view?usp=drive_web>
randomly attached to the end How can I fix it.
Answer
Your problem can be solved by a regex:
Put this into the shell:
import re
a=re.compile("<.*?>")
a.sub('',"Keep this part of the string< Remove this part>Keep This part as well")
Output:
'Keep this part of the stringKeep This part as well'
Second question:
import re
re.compile("\r\n.*?\\..{3}")
a.sub('',"Hello\r\nFilename.png")
Output:
'Hello'
Breakdown
Regex is a robust way of finding, replacing, and mutating small strings inside bigger ones, for further reading,consult https://docs.python.org/3/library/re.html. Meanwhile, here are the breakdowns of the regex information used in this answer:
. means any char.
*? means as many of the before as needed but as little as possible(non-greedy match)
So .*? means any number of characters but as little as possible.
Note: The reason there is a \\. in the second regex is that a . in the match needs to be escaped by a \, which in its turn needs to be escaped as \\
The methods:
re.compile(patten:str) compiles a regex for farther use.
regex.sub(repl:str,string:str) replaces every match of regex in string with repl.
Hope it helps.

Python3 strip() get unexpect result

It's a weird problem
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120"
And two strings below:
s1="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\[Content_Types].xml"
s2="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\_rels\.rels"
When I use the command below:
s1.strip(to_be_stripped)
s2.strip(to_be_stripped)
I get these outputs:
'[Content_Types].x'
'_rels\\.'
If I use lstrip(), they will be:
'[Content_Types].xml'
'_rels\\.rels'
Which is the right outputs.
However, if we replace all Project Known with zeus_pipeline:
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120"
And:
s2="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120\\_rels\.rels"
s2.lstrip(to_be_stripped)will be '.rels'
If I use / instead of \\, nothing goes wrong. I am wondering why this problem happens.
strip isn't meant to remove full strings exactly. Rather, you give it a string, and every character in that string is removed from the start and of the string to be stripped.
In your case, the variable to_be_stripped contains the characters m and l, so those are stripped from the end of s1. However, it doesn't contain the character x, so the stripping stops there and no characters beyond that are removed.
Check out this question. The accepted answer is probably more extensive than you need - I like another user's suggestion of using replace instead of strip. This would look like:
s1.replace(to_be_stripped, "")

Regex .match() not finding matches even though it should

I'm trying to scan a .txt file in a node.js script, and scan its contents for certain pieces of data. The lines I'm interested in getting look mostly like this:
DIBH91643 5/10/2019 108,75
SIR108811 5/10/2019 187,50
SIR108845 5/10/2019 63,75
So I've been trying to match them with a regex without succes. Using a regex testing site, I've even confirmed the fact that it should find the matches I'm looking for, but it always returns null when I call data.match(regex). I'm probably missing something basic here, but I can't figure it out for the life of me. This is the code I'm using (in its entirety, since there isn't much):
var fs = require('fs');
let regex = /\w*?(\d+)\s+(\d+\/\d+\/\d+)\s+(\-{0,1}\d+\,\d+)/g;
let ihateregex = /91/g;
fs.readFile('pathToFile/fileToRead.txt',{encoding: 'utf-8'}, (err, data) => {
var result = data.match(regex);
console.log(result);
});
As shown, even an attempt with a simple pattern that is definitely inside the file still returns null. I have looked into other answers here for similar problems, and they all point to deleting bytes from the beginning of the file. I have used vim -b to delete the first 2 bytes - which did look out of place and furthermore printing the entire data with console.log() did actually show 2 weird characters in the beginning of the file, but I get the exact same error.
I can't figure out what I'm missing here.
Try the following regex:
/^[A-Z]*(\d+)\s+(\d+\/\d+\/\d+)\s+(-?\d+,\d+)/gm
Improvements compared to your regex:
^ - start from the start of line,
[A-Z]* instead of \w*? - note that \w matches also digits,
removed / in front of - and ,,
? instead of {0,1},
added m option (I assume that you want to process all rows, not the first only).
To process the matches I used the following code, using rextester.com, so
instead of e.g. console.log(...) it contains print(...):
let data = 'DIBH91643 5/10/2019 108,75\nSIR108811 5/10/2019 187,50\nSIR108845 5/10/2019 63,75'
print("Data: ")
print(data)
let re = /^[A-Z]*(\d+)\s+(\d+\/\d+\/\d+)\s+(-?\d+,\d+)/gm
print("Result: ")
while ((matches = re.exec(data)) != null) {
print(matches[1], '_', matches[2], '_', matches[3])
}
For a working example see https://rextester.com/PZU21213
So I've finally figured out what went wrong and I feel extremely stupid for taking so long to figure it out. One thing I've failed to mention even though I should have is that the file I'm reading is one created by an OCR program. An OCR program which, apparently, added an invisible char between each character in the text file, that I only saw when I switched to php (fopen(), fgets(), fclose()) and looked at the source of the page I made.
Once I copied the contents of fileToRead.txt into a newly created fileToRead2.txt (simple copy-paste), it worked perfectly.

Change Image source in Markdown text using Node JS

I have some markdown text containing image references but those are relative paths & I need to modify those to absolute paths using NodeJS script. Is there any way to achieve the same in simple way ?
Example:
Source
![](myImage.png?raw=true)
Result
![](www.example.com/myImage.png?raw=true)
I have multiple images in the markdwon content that I need to modify.
Check this working example: https://repl.it/repls/BlackJuicyCommands
First of all you need the read the file with fs.readFile() and parse the Buffer to a string.
Now you got access to the text of the file, you need the replace every image path with a new image path. One way to capture it is with a regular expression. You could for example look for ](ANY_IMAGE_PATH.ANY_IMAGE_EXTENSION. We're only interested in replacing the ANY_IMAGE_PATH part.
An example regex could be (this could be improved!) (see it live in action here: https://regex101.com/r/xRioBq/1):
const regex = /\]\((.+)(?=(\.(svg|gif|png|jpe?g)))/g
This will look for a literal ] followed by a literal ( followed by any sequence (that's the .+) until it finds a literal . followed by either svg, gif,png, or jpeg / jpg. the g after the regex is necessary to match all occurences, not only the first.
Javascript/node (< version 9) does not support lookbehind. I've used a regex that includes the ]( before the image path and then will filter it out later (or you could use a more complex regex).
The (.+) captures the image path as a group. Then you can use the replace function of the string. The replace function accepts the regex as first argument. The second argument can be a function which receives as first argument the full match (in this case ](image_path_without_extension and as the following arguments any captured group.
Change the url with the node modules path or url and return the captured group (underneath I've called that argument imagePath). Because we're also replacing the ](, that should be included in the return value.
const url = require('url');
const replacedText = data.toString().replace(regex, (fullResult, imagePath) => {
const newImagePath = url.resolve('http://www.example.org', imagePath)
return `](${newImagePath}`;
})
See the repl example to see it live in action. Note: in the repl example it is written to a different file. Use the same file name if you want to overwrite.
I ran into the same issue yesterday, and came up with this solution.
const markdownReplaced = markdown.replace(
/(?<=\]\()(.+)(?=(\)))/g,
(url) => `www.example.com/${url}`,
);
The regex finds anything starts with ]( and ends with ), without including the capturing group, which is the original URL itself.

Coldfusion not converting accented text or MS Word chars

Running Coldfusion 8, I am trying to clean text input before saving to a database that will take things like the MS equivalent of ' " - and accented letters, and converting them.
I have tried replace, REReplace, and various UDFs found on the internet. None seem to work. In fact, I tried this:
<cfscript>
function cleanString(string) {
var newString = string;
newString = replace("'", "'", ALL);
return newString;
}
</cfscript>
The single quote to be replaced above is a MS Word style single quote. Coldfusion threw an error, the error scope said invalid syntax and the single quote in the error scope was a square. If I change it to the chr() form, and replace with ', I get a blank. If I do chr() to the entity, I get a blank.
I am more than certain I have jumped this hurdle before, and not sure why nothing is working now. Is there a new setting in CF8 vs CF7 regarding character encoding that I am missing?
There is a great script for demoronizing (yes, that's a technical term) text copied from MS word and the like. It can be found at CFLib:
http://cflib.org/index.cfm?event=page.udfbyid&udfid=725
I've used it several times, and been happy with it out-of-the-box (though I have added some additions for specific applications).

Resources