Why does Node.js allow this seemingly invalid character sequence?

Why does Node.js allow this seemingly invalid character sequence? - node.js

I was looking to see if there were a way to distinguish between a return in a file (next line), and a typed newline (\n in the file). While I was playing around in the REPL, I made a typo in a comparison, and Node.js to my surprise didn't care. It even gave what I believe is undefined behavior, unless I completely missed something in my years of Node.js intimacy. And I also discovered a couple other things in my playing around, I'll ask those below.
Code is at the bottom of the post.
The main question is:
Why is Node.js not complaining about the syntax at the last two comparisons (==+ and ==-)? Is that somehow valid syntax somewhere? And why does it make the comparison true when without the trailing +/- it is false? (updates in post comments)
The main side question is:
Why do the 'Buffer separate self comparison' and 'Buffer comparison' results come out as false when all the other tests are true? And why does a buffer not compare with a buffer of the same data?
Also:
How can I reliably distinguish between a return in a file and a typed newline as described above?
Here's the code:
const nl = '\n'
const newline = `
`
const NL = Buffer.from('\n')
const NEWLINE = Buffer.from(`
`)
const NEWLINE2 = Buffer.from(`
`)
console.log("Buffer separate self comparison: "+(NEWLINE2 == NEWLINE))
console.log("Buffer comparison: "+(NL == NEWLINE))
console.log("Non buffer comparison: "+(nl == newline))
console.log("Buffer self comparison 1: "+(NL == NL))
console.log("Buffer self comparison 2: "+(NEWLINE == NEWLINE))
console.log("Buffer/String comparison 1: "+(nl == NL))
console.log("Buffer/String comparison 2: "+(newline == NEWLINE))
console.log("Buffer/String cross comparison 1: "+(nl == NEWLINE))
console.log("Buffer/String cross comparison 2: "+(newline == NL))
console.log("Buffer toString comparison: "+(NL.toString() == NEWLINE.toString()))
console.log("Strange operator comparison 1: "+(NL ==+ NEWLINE))
console.log("Strange operator comparison 2: "+(NL ==- NEWLINE))

NEWLINE2 == NEWLINE (false)
NL == NEWLINE (false)
An expression comparing Objects is only true if the operands reference the same Object. src
This is not the case: they're two separate objects, even if their initial values are alike, so the result is false.
Edit: If you want to compare the values and not the identity of two Buffers, you can use Buffer.compare. Buffer.compare(NEWLINE2, NEWLINE) === 0 means both are equal.
nl == newline (true)
Two strings are strictly equal when they have the same sequence of characters, same length, and same characters in corresponding positions. src
The strings are equal, so true.
NL == NL (true)
NEWLINE == NEWLINE (true)
An expression comparing Objects is only true if the operands reference the same Object. src
nl == NL (true)
newline == NEWLINE (true)
nl == NEWLINE (true)
newline == NL (true)
What's happening here is that you're comparing two different types. One is a string, the other an object.
Each of these operators will coerce its operands to primitives before a comparison is made. If both end up as strings, they are compared using lexicographic order, otherwise they are cast to numbers to be compared. A comparison against NaN will always yield false. src
Buffer has a toString method, so that is called in order to have the same primitive types on both side of the ==. The result of this method is a string containing \n. '\n' == '\n' is true.
As an aside, if your comparison was NEWLINE == 0, then this would happen:
' 1 ' == 1 equals true. When casting, whitespace is discarded so ' 1 ' will be cast into a number with value 1. The resulting comparison would be 1 == 1.
A string of only whitespace characters will be coerced into 0. The Buffer is first converted to a string and then to an integer, so this would happen: 0 == 0, so the result would've been true.
NL.toString() == NEWLINE.toString() (true)
Two strings are strictly equal when they have the same sequence of characters, same length, and same characters in corresponding positions. src
The strings are equal, so true.
NL ==+ NEWLINE (true)
NL ==- NEWLINE (true)
This is the same as doing == +NEWLINE. You're using a unary + or - to explicitly cast to a Number. What's interesting here is that you're doing these comparisons, after casting: 0 == +0 and 0 == -0. Negative and positive zero are considered equal.
None of the behavior here is 'undefined'.
Apart from "huh, that's neat" there's really very little reason to not use the strict equality operator (===) which would not cast things into the same primitives.
As for your question:
A newline in a file (\n) is the same as a newline in a self-typed string ('\n'). They're both ASCII or Unicode character 0x0A, byte-wise.
Some documents contain both a newline character and a carriage return. A newline then consists of two characters: 0x0D 0x0A (or \r\n).

Related

How to compare a char to [ in Julia?

In Julia, I have some lines in a file that start with a [ char. To get these lines, I try to compare the first char of each line to this char, but I seem to be missing some syntax. So far, I've tried this, which returns false (for the first one) or doesn't accept the char (for the second) :
if (line[1] == "[")
if (line[1] == "\[")
What would be the proper syntax to use here ?

The canonical way would be to use startswith, which works with both single characters and longer strings:
julia> line = "[hello, world]";
julia> startswith(line, '[') # single character
true
julia> startswith(line, "[") # length-1 string
true
julia> startswith(line, "[hello") # longer string
true
If you really want to get the first character of a string it is better to use first since indexing to strings is, in general, tricky.
julia> first(line) == '['
true
See https://docs.julialang.org/en/v1/manual/strings/#Unicode-and-UTF-8-1 for more details about string indexing.

You comparing a string "[" not a char '['
Hope it solve your problem

How to compare upper and lowercase letters in a conditional in Swift

Apologies if this is a duplicate. I have a helper function called inputString() that takes user input and returns a String. I want to proceed based on whether an upper or lowercase character was entered. Here is my code:
print("What do you want to do today? Enter 'D' for Deposit or 'W' for Withdrawl.")
operation = inputString()
if operation == "D" || operation == "d" {
print("Enter the amount to deposit.")
My program quits after the first print function, but gives no compiler errors. I don't know what I'm doing wrong.

It's important to keep in mind that there is a whole slew of purely whitespace characters that show up in strings, and sometimes, those whitespace characters can lead to problems just like this.
So, whenever you are certain that two strings should be equal, it can be useful to print them with some sort of non-whitespace character on either end of them.
For example:
print("Your input was <\(operation)>")
That should print the user input with angle brackets on either side of the input.
And if you stick that line into your program, you'll see it prints something like this:
Your input was <D
>
So it turns out that your inputString() method is capturing the newline character (\n) that the user presses to submit their input. You should improve your inputString() method to go ahead and trim that newline character before returning its value.
I feel it's really important to mention here that your inputString method is really clunky and requires importing modules. But there's a way simpler pure Swift approach: readLine().
Swift's readLine() method does exactly what your inputString() method is supposed to be doing, and by default, it strips the newline character off the end for you (there's an optional parameter you can pass to prevent the method from stripping the newline).
My version of your code looks like this:
func fetchInput(prompt: String? = nil) -> String? {
if let prompt = prompt {
print(prompt, terminator: "")
}
return readLine()
}
if let input = fetchInput("Enter some input: ") {
if input == "X" {
print("it matches X")
}
}

the cause of the error that you experienced is explained at Swift how to compare string which come from NSString. Essentially, we need to remove any whitespace or non-printing characters such as newline etc.
I also used .uppercaseString to simplify the comparison
the amended code is as follows:
func inputString() -> String {
var keyboard = NSFileHandle.fileHandleWithStandardInput()
var inputData = keyboard.availableData
let str: String = (NSString(data: inputData, encoding: NSUTF8StringEncoding)?.stringByTrimmingCharactersInSet(
NSCharacterSet.whitespaceAndNewlineCharacterSet()))!
return str
}
print("What do you want to do today? Enter 'D' for Deposit or 'W' for Withdrawl.")
let operation = inputString()
if operation.uppercaseString == "D" {
print("Enter the amount to deposit.")
}

split string by char

scala has a standard way of splitting a string in StringOps.split
it's behaviour somewhat surprised me though.
To demonstrate, using the quick convenience function
def sp(str: String) = str.split('.').toList
the following expressions all evaluate to true
(sp("") == List("")) //expected
(sp(".") == List()) //I would have expected List("", "")
(sp("a.b") == List("a", "b")) //expected
(sp(".b") == List("", "b")) //expected
(sp("a.") == List("a")) //I would have expected List("a", "")
(sp("..") == List()) // I would have expected List("", "", "")
(sp(".a.") == List("", "a")) // I would have expected List("", "a", "")
so I expected that split would return an array with (the number a separator occurrences) + 1 elements, but that's clearly not the case.
It is almost the above, but remove all trailing empty strings, but that's not true for splitting the empty string.
I'm failing to identify the pattern here. What rules does StringOps.split follow?
For bonus points, is there a good way (without too much copying/string appending) to get the split I'm expecting?

For curious you can find the code here.https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala
See the split function with the character as an argument(line 206).
I think, the general pattern going on over here is, all the trailing empty splits results are getting ignored.
Except for the first one, for which "if no separator char is found then just send the whole string" logic is getting applied.
I am trying to find if there is any design documentation around these.
Also, if you use string instead of char for separator it will fall back to java regex split. As mentioned by #LRLucena, if you provide the limit parameter with a value more than size, you will get your trailing empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)

You can use split with a regular expression. I´m not sure, but I guess that the second parameter is the largest size of the resulting array.
def sp(str: String) = str.split("\\.", str.length+1).toList

Seems to be consistent with these three rules:
1) Trailing empty substrings are dropped.
2) An empty substring is considered trailing before it is considered leading, if applicable.
3) First case, with no separators is an exception.

split follows the behaviour of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
That is split "around" the separator character, with the following exceptions:
Regardless of anything else, splitting the empty string will always give Array("")
Any trailing empty substrings are removed
Surrogate characters only match if the matched character is not part of a surrogate pair.

Match first letter of a string in Tcl

I want to compare the first letter of a string with a known character. For example, I want to check if the string "example"'s first letter matches with "e" or not. I'm sure there must be a very simple way to do it, but I could not find it.

One way is to get the first character with string index:
if {[string index $yourstring 0] eq "e"} {
...

I think it's a good idea to collect the different methods in a single answer.
Assume
set mystring example
set mychar e
The goal is to test whether the first character in $mystring is equal to $mychar.
My suggestion was (slightly edited):
if {[string match $mychar* $mystring]} {
...
This invocation does a glob-style match, comparing $mystring to the character $mychar followed by a sequence of zero or more arbitrary characters. Due to shortcuts in the algorithm, the comparison stops after the first character and is quite efficient.
Donal Fellows:
if {[string index $mystring 0] eq $mychar} {
...
This invocation specifically compares a string consisting of the first character in $mystring with the string $mychar. It uses the efficient eq operator rather than the == operator, which is the only one available in older versions of Tcl.
Another way to construct a string consisting of the first character in $mystring is by invoking string range $mystring 0 0.
Mark Kadlec:
if {[string first $mychar $mystring] == 0 }
...
This invocation searches the string $mystring for the first occurrence of the character $mychar. If it finds any, it returns with the index where the character was found. This index number is then compared to 0. If they are equal the first character of $mystring was $mychar.
This solution is rather inefficient in the worst case, where $mystring is long and $mychar does not occur in it. The command will then examine the whole string even though only the first character is of interest.
One more string-based solution:
if {[string compare -length 1 $mychar $mystring] == 0} {
...
This invocation compares the first n characters of both strings (n being hardcoded to 1 here): if there is a difference the command will return -1 or 1 (depending on alphabetical order), and if they are equal 0 will be returned.
Another solution is to use a regular expression match:
if {[regexp -- ^$mychar.* $mystring]} {
...
This solution is similar to the string match solution above, but uses regular expression syntax rather than glob syntax. Don't forget the ^ anchor, otherwise the invocation will return true if $mychar occurs anywhere in $mystring.
Documentation: eq and ==, regexp, string

if { [string first e $yourString] == 0 }
...

set mychar "e"
if { [string first $mychar $myString] == 0}{
....

String splitting based on unknown delimeters (rather, LENGTH of delimeter)

I have a somewhat esoteric problem. My program wants to decode morse code.
The point is, I will need to handle any character. Any random characters that adhere to my system and can correspond to a letter should be accepted. Meaning, the letter "Q" is represented by "- - . -", but my program will treat any string of characters (separated by appropriate newchar signal) to be accepted as Q, for example "dj ir j kw" (long long short long).
There is a danger of falling out of sync, so I will need to implement a "new character" signal. I chose this to be "xxxx" as in 4 letters. For white, blank space symbol, I chose "xxxxxx", 6 chars.
Long story short, how can I split the string that is to be decoded into readable characters based on the length of the delimeter (4 continous symbols), since I can't really deterministically know what letters will make up the newchar delimeter?

The question is not very clearly worded.
For instance, here you show space as a delimeter between parts of the symbol Q:
for example "dj ir j kw" (long long short long)
Later you say:
For white, blank space symbol, I chose "xxxxxx", 6 chars.
Is that the symbol for whitespace, or the delimeter you use within a symbol (such as Q, above)? Your post doesn't say.
In this case, as always, an example is worth a thousands words. You should have shown a few examples of possible input and shown how you'd like them parsed.
If what you mean was that "dj ir j kw jfkl abpzoq jfkl dj ir j kw" should be decoded as "Q Q", and you just want to know how to match tokens by their length, then... the question is easy. There's a million ways you could do that.
In Lua, I'd do it in two passes. First, convert the message into a string containing only the length of each chunk of consequitive characters:
message = 'dj ir j kw jfkl abpzoq jfkl dj ir j kw'
message = message:gsub('(%S+)%s*', function(s) return #s end)
print(message) --> 22124642212
Then split on the number 4 to get each group
for group in message:gmatch('[^4]+') do
print(group)
end
Which gives you:
2212
6
2212
So you could convert something like this:
function translate(message)
local lengthToLetter = {
['2212'] = 'Q',
[ '6'] = ' ',
}
local translation = {}
message = message:gsub('(%S+)%s*', function(s) return #s end)
for group in message:gmatch('[^4]+') do
table.insert(translation, lengthToLetter[group] or '?')
end
return table.concat(translation)
end
print(translate(message))

This will split a string by any len continuous occurrences of char, which may be a character or pattern character class (such as %s), or of any character (i.e. .) if char is not passed.
It does this by using backreferences in the pattern passed to string.find, e.g. (.)%1%1%1 to match any character repeated four times.
The rest is just a bog-standard string splitter; the only real Lua peculiarity here is the choice of pattern.
-- split str, using (char * len) as the delimiter
-- leave char blank to split on len repetitions of any character
local function splitter(str, len, char)
-- build pattern to match len continuous occurrences of char
-- "(x)%1%1%1%1" would match "xxxxx" etc.
local delim = "("..(char or ".")..")" .. string.rep("%1", len-1)
local pos, out = 1, {}
-- loop through the string, find the pattern,
-- and string.sub the rest of the string into a table
while true do
local m1, m2 = string.find(str, delim, pos)
-- no sign of the delimiter; add the rest of the string and bail
if not m1 then
out[#out+1] = string.sub(str, pos)
break
end
out[#out+1] = string.sub(str, pos, m1-1)
pos = m2+1
-- string ends with the delimiter; bail
if m2 == #str then
break
end
end
return out
end
-- and the result?
print(unpack(splitter("dfdsfsdfXXXXXsfsdfXXXXXsfsdfsdfsdf", 5)))
-- dfdsfsdf, sfsdf, sfsdfsdfsdf

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does Node.js allow this seemingly invalid character sequence? - node.js

Related

How to compare a char to [ in Julia?

How to compare upper and lowercase letters in a conditional in Swift

split string by char

Match first letter of a string in Tcl

String splitting based on unknown delimeters (rather, LENGTH of delimeter)

Categories

Resources