Most common way to evaluate truthiness of string in vim - vim

In python to see if a string is non-empty, I'll do something like:
if STRING_VAR:
...
However, in vim a string doesn't really give a real truth-value when used directly. Is there a suggested way to evaluate the truthiness of a string? Usually I will use either:
if (STRING_VAR != "")
...
Or:
if (len(STRING_VAR) != 0)
...
What's the suggested/cleanest way to do this?

However, in vim a string doesn't really give a real truth-value when used directly.
If programming were a religion then the Pythonistas would become zealots ;-) Say, for a C programmer the real truth is a number, and for an assembly one it's totally unaddressable and the Strings do not exist at all. So how can you speak of a "real truth" then?
VimScript is partially modelled after C and AWK. In particular, :if accepts only numeric values (zero is false, non-zero is true) including those "convertible" to numbers, such as... Strings. However, String-to-Number conversion is done like with :h str2nr(). And so :if "hello world" is legal but false.
So you'll probably want :if empty(var) instead.
Note however that :h empty() accepts any Vim type. Usually this is quite good, but on some rare occasions you prefer :if len(var) or :if strlen(var).
The difference between :h len() and :h strlen() is that strlen() accepts Strings, while len() also accepts Lists and Dictionaries.
Still note that :echo strlen(0) is 1, as Numbers are auto-converted to Strings anyway. And so if you want to check simultaneously that 1) foo has type of String, and 2) foo is empty then you have to write :if foo is# ''

I would do !empty(STRING_VAR) *
I haven't looked how Vim compares strings. But remembering Vim is written in C, then if it is in similar manner to
int string_compare(const char* s1, const char* s2)
{
while(*s1 && (*s1 == *s2)) {
++s1;
++s2;
}
return *s1 == *s2;
}
then STRING_VAR != "" is definitely better than len(STRING_VAR) != 0
The time complexity of former is O(1) - the result is after comparing first characters. The latter requires O(n) - for counting the length.
On the other hand, if string in Vim is represented by structure, e.g.
struct string {
char str[1000];
int len;
};
then len(STRING_VAR) != 0 will be just simple integer comparison (var.len != 0).
Actually, len(STRING_VAR) will be enough.
* As we rather can trust Bram, he implemented empty() in most efficient way
In Vim, like in Python, if doesn't require brackets

Related

Hash function to see if one string is scrambled form/permutation of another?

I want to check if string A is just a reordered version of string B. For example, "abc" = "bca" = "cab"...
There are other solutions here: https://www.geeksforgeeks.org/check-if-two-strings-are-permutation-of-each-other/
However, I was thinking a hash function would be an easy way of doing this, but the typical hash function takes order into consideration. Are there any hash functions that do not care about character order?
Are there any hash functions that do not care about character order?
I don't know of real-world hash functions that have this property, no. Because this is not a problem they are designed to solve.
However, in this specific case, you can make your own "hash" function (a very very bad one) that will indeed ignore order: just sum ASCII codes of characters. This works due to the commutative property of addition (a + b == b + a)
def isAnagram(self,a,b):
sum_a = 0
sum_b = 0
for c in a:
sum_a += ord(c)
for c in b:
sum_b += ord(c)
return sum_a == sum_b
To reiterate, this is absolutely a hack, that only happens to work because input strings are limited in content in the judge system (only have lowercase ASCII characters and do not contain spaces). It will not work (reliably) on arbitrary strings.
For a fast check you could use a kind af hash-funkction
Candidates are:
xor all characters of a String
add all characters of a String
multiply all characters of a String (be careful might lead to overflow for large Strings)
If the hash-value is equal, it could still be a collision of two not 'equal' strings. So you still need to make a dedicated compare. (e.g. sort the characters of each string before comparing them).

inplace string zero termination length change

in nim if you set a charachter in the middle of a string to '\0' the len function doesn't update. Is this something we should just not do ever ? Is slicing the recommended way ?
Generally you should not think of Nim strings as null terminated. Even though they are, but that's just an implementation detail to allow seamless C interop.
Also Nim strings are encoding agnostic meaning that a '\0' can be a valid byte within a string. The convention is utf8 though.
To change the length of the string use setLen proc.
var s = "123456"
s.setLen(3)
assert(s == "123")

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.
Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

What is the easiest way to determine if a character is in Unicode range, in Rust?

I'm looking for easiest way to determine if a character in Rust is between two Unicode values.
For example, I want to know if a character s is between [#x1-#x8] or [#x10FFFE-#x10FFFF]. Is there a function that does this already?
The simplest way for me to match a character was this
fn match_char(data: &char) -> bool {
match *data {
'\x01'...'\x08' |
'\u{10FFFE}'...'\u{10FFFF}' => true,
_ => false,
}
}
Pattern matching a character was the easiest route for me, compared to a bunch of if statements. It might not be the most performant solution, but it served me very well.
The simplest way, assuming that they are not Unicode categories (in which case you should be using std::unicode) is to use the regular comparison operators:
(s >= '\x01' && s <= '\x08') || s == '\U0010FFFE' || s == '\U0010FFFF'
(In case you weren't aware of the literal forms of these things, one gets 8-bit hexadecimal literals \xXX, 16-bit hexadecimal literals \uXXXX, and 32-bit hexadecimal literals \UXXXXXXXX. Matter of fact, casts would work fine too, e.g. 0x10FFFE as char, and would be just as efficient; just less easily readable.)

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

Resources