Non capturing branch reset regex in NodeJS - node.js

https://regex101.com/r/UXnhTy/1
var date = /(?|(Sat)ur|(Sun))day/;
console.log(date.exec("Sunday"));
This fails with:
SyntaxError: Invalid regular expression: /(?|(Sat)ur|(Sun))day/: Invalid group
Is there a version of NodeJS that supports this? Or some library out there that
I tested this with nodejs v8.12.0

Not really. An advanced alternative regex library for JavaScript is XRegExp, but it doesn't have the feature you're after - not even as an addon.
A simpler regex feature that is supported by XRegExp is named capture groups, so you can write:
var days = XRegExp('(?:(?<d>Sat)ur|(?<d>Sun))day', 'gi');
You can't use numbers as group names, but named groups should fit what your needs - they allow backreferences (using \k<d>), replacement (${d}), capturing (match.d), and all features of a regular numbered group.
Named captured groups is supported natively by ES2018: ES2018 Regular Expression Updates.
According to node.green, named capture groups are supported by Node.js ≥10.3.0, or by ≥8.6.0 with the --harmony flag.

Related

Regex deprecation warning confusion [duplicate]

The following fragment of code comes from my github repository found here.
It opens a binary file, and extracts the text within <header> tags. These are the crucial lines:
gbxfile = open(filename,'rb')
gbx_data = gbxfile.read()
gbx_header = b'(<header)((?s).*)(</header>)'
header_intermediate = re.findall(gbx_header, gbx_data)
The script works BUT it receives the following Deprecation Warning:
DeprecationWarning: Flags not at the start of the expression b'(<header)((?s).*)(</' (truncated)
header_intermediate = re.findall(gbx_header, gbx_data)
What is the correct use of the regular expression in gbx_header, so that this warning is not displayed?
You can check the Python bug tacker Issue 39394, the warning was introduced in Python 3.6.
The point is that the Python re now does not allow using inline modifiers not at the start of string. In Python 2.x, you can use your pattern without any problem and warnings as (?s) is silently applied to the whole regular expression under the hood. Since it is not always an expected behavior, the Python developers decided to produce a warning.
Note you can use inline modifier groups in Python re now, see restrict 1 word as case sensitive and other as case insensitive in python regex | (pipe).
So, the solutions are
Putting (?s) (or any other inline modifier) at the start of the pattern: (?s)(<header)(.*)(</header>)
Using the re option, re.S / re.DOTALL instead of (?s), re.I / re.IGNORECASE instead of (?i), etc.
Using workarounds (instead of ., use [\w\W]/[\d\D]/[\s\S] if you do not want to use (?s) or re.S/re.DOTALL).

How to use JOOQ Java Generator includes and excludes

The JOOQ Java code generation tools uses regular expressions defined in the includes and excludes elements to control what is generated. I cant find an explanation of what the schema structure is that these expressions are run against.
I want to have the ability to exclude specific databases in the server as well as tables by prefix or specifically.
Simple examples:
Given a SQL server with two DBs 'A' and 'B', how do I instruct JOOQ to only generate for tables in DB 'A'?
How do in instruct JOOQ to only generate for tables starting with the prefix "qtbl"?
It would be great if there were some example use cases available showing some simple common configurations.
The jOOQ manual section about includes and excludes, as well as a few other sections that explain the code generator's usage of regular expressions to match identifier establishes that the code generator will always try to:
Match fully qualified identifiers
Match unqualified identifiers
Or, if you're using jOOQ 3.12+ and did not turn off <regexMatchesPartialQualification/>:
Match partially qualified identifiers (see #7947)
For example:
<excludes>
(?i: # Using case insensitive regex for the example
database_prefix.*?\. # Match a catalog prefix prior to the qualifying "."
.*?\. # You don't seem to care about schema names, so match them all
table_prefix.*? # Match a table prefix at the end of the identifier
)
</excludes>
In addition to the above, if you want to exclude specific databases ("catalogs") from being generated without pattern matching, you'll get even better results if you specify your <inputCatalog>A</inputCatalog>. See also the manual's section about schema mapping.
Benefits include a much faster code generation, because only that catalog will be searched for objects to generate, prior to excluding them again using regular expressions. So, your configuration could be this:
<!-- Include only database A -->
<inputCatalog>A</inputCatalog>
<!-- Include only tables with this (unqualified) prefix -->
<includes>qtbl.*</includes>

Node.js v12 changed console output

I am trying to figure out how to achieve consistent console output across Node.js versions, in a module that applies colors to the text.
Up until v12 of Node.js there was no problem, but with v12 many of my tests stopped working, and here's why...
const a = [1, 'text\nwith', 'line\nbreaks'];
console.log.apply(null, a);
This test outputs the following under any Node.js version before v12:
1 'text\nwith' 'line\nbreaks'
And after v12, it outputs the following:
1 text
with line
breaks
i.e. it breaks console output on \n.
Is there any new API that can make the output consistent across multiple versions?
Is there a known commit/PR that brought this breaking change?
UPDATE
Probably even a better question - how to detect that the Node.js version supports this new line-break on the console output?
Internally, it seems that Node.js makes the call into util.format:
const util = require('util');
const a = [1, 'text\nwith', 'line\nbreaks'];
console.log(util.format.apply(null, a));
The question is then, how to determine if util.format now supports a different output?
Just escape the backslash character. I don't know what version of 12 "broke" as you say, but use \\n to escape instead of \n.

Determine the firefox version available for update using Python

I am looking for snippet which will check which version is available to download for updates.
I use python 3.x. So it would be nice if anyone has a hint how i can check the version available on the server. The OUtput should generate a variable in which the version number of firefox is stored. for example 22.0
I am using linux as the operating system of my choice.
to be clear:
I don't want to know whhich version is already installed on my system. i want to know which version can be updated.
So far i got the following code:
def firefox_version_remote():
firefox_version_fresh = os.popen("curl -s -l ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest/linux-i686/de/").read()
# short name for firefox version num fresh
fvnf = " "
for i in firefox_version_fresh:
if i.isalpha() != True:
fvnf = fvnf + i
return fvnf.strip()
this returns -22.0..2 where it should return 22.0
Have you considered using a regular expression to match the numbers you're trying to extract. That would be a lot easier. Something like this:
matches = re.findall(r'\d+(?:\.\d+)+', firefox_version_fresh)
if matches:
fvnf = matches[0]
That's assuming the version is of the form x.y potentially followed by more sub versions (e.g. x.y.z).
\d+ is one or more digits
(?: )+ is one or more of everything in the parentheses. The ?: tells the compiler that it's a non-capturing group - i.e. you're not interesting in extracting the data inside the parentheses as a separate group.
\.\d+ matches a dot followed by one or more digits
So the whole expression can be described as one or more digits followed by one or more occurences of a dot and one or more digits.

Node.js URL-encoding for pre-RFC3986 urls (using + vs %20)

Within Node.js, I am using querystring.stringify() to encode an object into a query string for usage in a URL. Values that have spaces are encoded as %20.
I'm working with a particularly finicky web service that will only accept spaces encoded as +, as used to be commonly done prior to RFC3986.
Is there a way to set an option for querystring so that it encodes spaces as +?
Currently I am simply doing a .replace() to replace all instances of %20 with +, but this is a bit tedious if there is an option I can set ahead of time.
If anyone still facing this issue, "qs" npm package has feature to encode spaces as +
qs.stringify({ a: 'b c' }, { format : 'RFC1738' })
I can't think of any library doing that by default, and unfortunately, I'd say your implementation may be the more efficient way to do this, since any other option would probably either do what you're already doing, or will use slower non-compiled pure JavaScript code.
What about asking the web service provider to follow the RFC?
https://github.com/kvz/phpjs is a node.js package that provides all the php functions. The http_build_query implementation at the time of writing this only supports urlencode (the query string includes + instead of spaces), but hopefully soon will include the enc_type parameter / rawurlencode (%20's for spaces).
See http://php.net/http_build_query.
RFC1738 (+'s) will be the default enc_type either way, so you can use it immediately for your purposes.

Resources