How to compare a char to [ in Julia? - string

In Julia, I have some lines in a file that start with a [ char. To get these lines, I try to compare the first char of each line to this char, but I seem to be missing some syntax. So far, I've tried this, which returns false (for the first one) or doesn't accept the char (for the second) :
if (line[1] == "[")
if (line[1] == "\[")
What would be the proper syntax to use here ?

The canonical way would be to use startswith, which works with both single characters and longer strings:
julia> line = "[hello, world]";
julia> startswith(line, '[') # single character
true
julia> startswith(line, "[") # length-1 string
true
julia> startswith(line, "[hello") # longer string
true
If you really want to get the first character of a string it is better to use first since indexing to strings is, in general, tricky.
julia> first(line) == '['
true
See https://docs.julialang.org/en/v1/manual/strings/#Unicode-and-UTF-8-1 for more details about string indexing.

You comparing a string "[" not a char '['
Hope it solve your problem

Related

How to check if a string is alphanumeric?

I can use occursin function, but its haystack argument cannot be a regular expression, which means I have to pass the entire alphanumeric string to it. Is there a neat way of doing this in Julia?
I'm not sure your assumption about occursin is correct:
julia> occursin(r"[a-zA-z]", "ABC123")
true
julia> occursin(r"[a-zA-z]", "123")
false
but its haystack argument cannot be a regular expression, which means I have to pass the entire alphanumeric string to it.
If you mean its needle argument, it can be a Regex, for eg.:
julia> occursin(r"^[[:alnum:]]*$", "adf24asg24y")
true
julia> occursin(r"^[[:alnum:]]*$", "adf24asg2_4y")
false
This checks that the given haystack string is alphanumeric using Unicode-aware character class
[[:alnum:]] which you can think of as equivalent to [a-zA-Z\d], extended to non-English characters too. (As always with Unicode, a "perfect" solution involves more work and complication, but this takes you most of the way.)
If you do mean you want the haystack argument to be a Regex, it's not clear why you'd want that here, and also why "I have to pass the entire alphanumeric string to it" is a bad thing.
As has been noted, you can indeed use regexes with occursin, and it works well. But you can also roll your own version, quite simply:
isalphanumeric(c::AbstractChar) = isletter(c) || ('0' <= c <= '9')
isalphanumeric(str::AbstractString) = all(isalphanumeric, str)

Why does Node.js allow this seemingly invalid character sequence?

I was looking to see if there were a way to distinguish between a return in a file (next line), and a typed newline (\n in the file). While I was playing around in the REPL, I made a typo in a comparison, and Node.js to my surprise didn't care. It even gave what I believe is undefined behavior, unless I completely missed something in my years of Node.js intimacy. And I also discovered a couple other things in my playing around, I'll ask those below.
Code is at the bottom of the post.
The main question is:
Why is Node.js not complaining about the syntax at the last two comparisons (==+ and ==-)? Is that somehow valid syntax somewhere? And why does it make the comparison true when without the trailing +/- it is false? (updates in post comments)
The main side question is:
Why do the 'Buffer separate self comparison' and 'Buffer comparison' results come out as false when all the other tests are true? And why does a buffer not compare with a buffer of the same data?
Also:
How can I reliably distinguish between a return in a file and a typed newline as described above?
Here's the code:
const nl = '\n'
const newline = `
`
const NL = Buffer.from('\n')
const NEWLINE = Buffer.from(`
`)
const NEWLINE2 = Buffer.from(`
`)
console.log("Buffer separate self comparison: "+(NEWLINE2 == NEWLINE))
console.log("Buffer comparison: "+(NL == NEWLINE))
console.log("Non buffer comparison: "+(nl == newline))
console.log("Buffer self comparison 1: "+(NL == NL))
console.log("Buffer self comparison 2: "+(NEWLINE == NEWLINE))
console.log("Buffer/String comparison 1: "+(nl == NL))
console.log("Buffer/String comparison 2: "+(newline == NEWLINE))
console.log("Buffer/String cross comparison 1: "+(nl == NEWLINE))
console.log("Buffer/String cross comparison 2: "+(newline == NL))
console.log("Buffer toString comparison: "+(NL.toString() == NEWLINE.toString()))
console.log("Strange operator comparison 1: "+(NL ==+ NEWLINE))
console.log("Strange operator comparison 2: "+(NL ==- NEWLINE))
NEWLINE2 == NEWLINE (false)
NL == NEWLINE (false)
An expression comparing Objects is only true if the operands reference the same Object. src
This is not the case: they're two separate objects, even if their initial values are alike, so the result is false.
Edit: If you want to compare the values and not the identity of two Buffers, you can use Buffer.compare. Buffer.compare(NEWLINE2, NEWLINE) === 0 means both are equal.
nl == newline (true)
Two strings are strictly equal when they have the same sequence of characters, same length, and same characters in corresponding positions. src
The strings are equal, so true.
NL == NL (true)
NEWLINE == NEWLINE (true)
An expression comparing Objects is only true if the operands reference the same Object. src
nl == NL (true)
newline == NEWLINE (true)
nl == NEWLINE (true)
newline == NL (true)
What's happening here is that you're comparing two different types. One is a string, the other an object.
Each of these operators will coerce its operands to primitives before a comparison is made. If both end up as strings, they are compared using lexicographic order, otherwise they are cast to numbers to be compared. A comparison against NaN will always yield false. src
Buffer has a toString method, so that is called in order to have the same primitive types on both side of the ==. The result of this method is a string containing \n. '\n' == '\n' is true.
As an aside, if your comparison was NEWLINE == 0, then this would happen:
' 1 ' == 1 equals true. When casting, whitespace is discarded so ' 1 ' will be cast into a number with value 1. The resulting comparison would be 1 == 1.
A string of only whitespace characters will be coerced into 0. The Buffer is first converted to a string and then to an integer, so this would happen: 0 == 0, so the result would've been true.
NL.toString() == NEWLINE.toString() (true)
Two strings are strictly equal when they have the same sequence of characters, same length, and same characters in corresponding positions. src
The strings are equal, so true.
NL ==+ NEWLINE (true)
NL ==- NEWLINE (true)
This is the same as doing == +NEWLINE. You're using a unary + or - to explicitly cast to a Number. What's interesting here is that you're doing these comparisons, after casting: 0 == +0 and 0 == -0. Negative and positive zero are considered equal.
None of the behavior here is 'undefined'.
Apart from "huh, that's neat" there's really very little reason to not use the strict equality operator (===) which would not cast things into the same primitives.
As for your question:
A newline in a file (\n) is the same as a newline in a self-typed string ('\n'). They're both ASCII or Unicode character 0x0A, byte-wise.
Some documents contain both a newline character and a carriage return. A newline then consists of two characters: 0x0D 0x0A (or \r\n).

How to check if two strings can be made equal by using recursion?

I am trying to practice recursion, but at the moment I don't quite understand it well...
I want to write a recursive Boolean function which takes 2 strings as arguments, and returns true if the second string can be made equal to the first by replacing some letters with a certain special character.
I'll demonstrate what I mean:
Let s1 = "hello", s2 = "h%lo", where '%' is the special character.
The function will return true since '%' can replace "el", causing the two strings to be equal.
Another example:
Let s1 = "hello", s2 = "h%l".
The function will return false since an 'o' is lacking in the second string, and there is no special character that can replace the 'o' (h%l% would return true).
Now the problem isn't so much with writing the code, but with understanding how to solve the problem in general, I don't even know where to begin.
If someone could guide me in the right direction I would be very grateful, even by just using English words, I'll try to translate it to code (Java)...
Thank you.
So this is relatively easy to do in Python. The method I chose was to put the first string ("hello") into an array then iterate over the second string ("h%lo") comparing the elements to those in the array. If the element was in the array i.e. 'h', 'l', 'o' then I would pop it from the array. The resulting array is then ['e','l']. The special character can be found as it is the element which does not exist in the initial array.
One can then substitute the special character for the joined array "el" in the string and compare with the first string.
In the first case this will give "hello" == "hello" -> True
In the second case this will give "hello" == "helol" -> False
I hope this helps and makes sense.

split string by char

scala has a standard way of splitting a string in StringOps.split
it's behaviour somewhat surprised me though.
To demonstrate, using the quick convenience function
def sp(str: String) = str.split('.').toList
the following expressions all evaluate to true
(sp("") == List("")) //expected
(sp(".") == List()) //I would have expected List("", "")
(sp("a.b") == List("a", "b")) //expected
(sp(".b") == List("", "b")) //expected
(sp("a.") == List("a")) //I would have expected List("a", "")
(sp("..") == List()) // I would have expected List("", "", "")
(sp(".a.") == List("", "a")) // I would have expected List("", "a", "")
so I expected that split would return an array with (the number a separator occurrences) + 1 elements, but that's clearly not the case.
It is almost the above, but remove all trailing empty strings, but that's not true for splitting the empty string.
I'm failing to identify the pattern here. What rules does StringOps.split follow?
For bonus points, is there a good way (without too much copying/string appending) to get the split I'm expecting?
For curious you can find the code here.https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala
See the split function with the character as an argument(line 206).
I think, the general pattern going on over here is, all the trailing empty splits results are getting ignored.
Except for the first one, for which "if no separator char is found then just send the whole string" logic is getting applied.
I am trying to find if there is any design documentation around these.
Also, if you use string instead of char for separator it will fall back to java regex split. As mentioned by #LRLucena, if you provide the limit parameter with a value more than size, you will get your trailing empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)
You can use split with a regular expression. I´m not sure, but I guess that the second parameter is the largest size of the resulting array.
def sp(str: String) = str.split("\\.", str.length+1).toList
Seems to be consistent with these three rules:
1) Trailing empty substrings are dropped.
2) An empty substring is considered trailing before it is considered leading, if applicable.
3) First case, with no separators is an exception.
split follows the behaviour of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
That is split "around" the separator character, with the following exceptions:
Regardless of anything else, splitting the empty string will always give Array("")
Any trailing empty substrings are removed
Surrogate characters only match if the matched character is not part of a surrogate pair.

split a string delimited by given object type

How can I split a string into a list of substrings, where the delimiter to split by is a MATLAB object type?
For example:
>> splitByType('a1b2c3',type=integer)
['a','b','c']
or:
>> splitByType('a1b2c3',type=character)
['1','2','3']
I'm not sure what you mean by MATLAB object type. For integers, you can use:
a='a1b2c'
regexp(a,'[0-9]+','split')
which outputs:
ans =
'a' 'b' 'c'
Another alternative is:
regexp(a,'\d+','split')
You're looking for regexp() by passing the corresponding regular expression of the type:
For integers: regexp('a1b2c','\d+','split') % or use '[0-9]+'
For characters: regexp('a1b2c','[a-z]+','split')
I'd go with the regexp answer, if you are comfortable with regular expressions, but you can also use strsplit with a cell array of strings containing the possible delimiters:
strsplit(a,cellstr(num2str((0:9)'))') % digits
strsplit(a,cellstr(char([65:90 97:122])')') % word characters
Also, strsplit has a regular expression mode (bizarre! why would you use this over regexp?):
strsplit(a,'\d+','delim','reg') % one or more digits
strsplit(a,'\w+','delim','reg') % one or more word characters
Which are equivalent to regexp(a,'\d+','split') and regexp(a,'\w+','split').

Resources