Large arrays with default values in Minizinc data file? - modeling

How to model a large array e.g. "array[1..10000000] of int: A;" which has a lot of 0 as default value. Is there a way to specify "default" value to MiniZinc in order to reduce the data file size?

MiniZinc is a declarative language where you can not change the value of an assigned variable. This makes the question a bit strange, there is no need of a default value since the value of a variable can't be changed.
I would use an array comprehension [if i == 3 then i else 0 endif | i in 1..10000000] and switch the condition and result of the if-statement depending on what you need.
There might a another way of representing the array? An array that is mostly empty does not sound very effective.

Related

Python: How is a ~ used to exclude data?

In the below code I know that it is returning all records that are outside of the buffer, but I'm confused as to the mechanics of how that is happening.
I see that there is a "~" (aka a bitwise not) being used. From some googling my understanding of ~ is that it returns the inverse of each bit in the input it is passed eg if the bit is a 0 it returns a 1. Is this correct if not could someone please ELI5?
Could someone please explain the actual mechanics of how the below code is returning records that are outside of the "my_union" buffer?
NOTE: hospitals and collisions are just geo dataframes.
coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000)
my_union = coverage.geometry.unary_union
outside_range = collisions.loc[~collisions["geometry"].apply(lambda x: my_union.contains(x))]
~ does indeed perform a bitwise not in python. But here it is used to perform a logical not on each element of a list (or rather pandas Series) of booleans. See this answer for an example.
Let's assume the collisions GeoDataFrame contains points, but it will work similarly for other types of geometries.
Let me further change the code a bit:
coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000)
my_union = coverage.geometry.unary_union
within_my_union = collisions["geometry"].apply(lambda x: my_union.contains(x))
outside_range = collisions.loc[~within_my_union]
Then:
my_union is a single (Multi)Polygon.
my_union.contains(x) returns a boolean indicating whether the point x is within the my_union MultiPolygon.
collisions["geometry"] is a pandas Series containing the points.
collisions["geometry"].apply(lambda x: my_union.contains(x)) will run my_union.contains(x) on each of these points.
This will result in another pandas Series containing booleans, indicating whether each point is within my_union.
~ then negates these booleans, so that the Series now indicates whether each point is not within my_union.
collisions.loc[~within_my_union] then selects all the rows of collisions where the entry in ~within_my_union is True, i.e. all the points that don't lie within my_union.
I'm not sure exactly what you mean by the actual mechanics and it's hard to know for sure without seeing input and output, but I had a go at explaining it below if it is helpful:
All rows from the geometry column in the collisions dataframe that contain any value in my_union will be excluded in the newly created outside_range dataframe.

What is the lookup / access time complexity of VBA Dictionary?

What is the time complexity of VBA Dictionary lookup or access?
I couldn't find the documentation on it.
EDIT - Some comments from here suggested that it is O(1) which I think is true, however, there are no online references on it.
Set dict = New Scripting.Dictionary
dict("Apples") = 50
dict("Oranges") = 100
dict("Bananas") = 30
'lookup
dict.Exists("Apples")
'access
dict("Oranges")
According to microsoft documentation,
A Dictionary object is the equivalent of a PERL associative array.
And
according to perl documentation, associative arrays are called hashes:
Hashes (Associative Arrays)
Lastly, according to this perl documentation states that:
If you evaluate a hash in scalar context, it returns a false value if
the hash is empty. If there are any key/value pairs, it returns a true
value. A more precise definition is version dependent.
Prior to Perl 5.25 the value returned was a string consisting of the
number of used buckets and the number of allocated buckets, separated
by a slash. This is pretty much useful only to find out whether Perl's
internal hashing algorithm is performing poorly on your data set. For
example, you stick 10,000 things in a hash, but evaluating %HASH in
scalar context reveals "1/16" , which means only one out of sixteen
buckets has been touched, and presumably contains all 10,000 of your
items. This isn't supposed to happen.
As of Perl 5.25 the return was changed to be the count of keys in the
hash. If you need access to the old behavior you can use
Hash::Util::bucket_ratio() instead.
which implies amortized O(1) for lookup.

Why LinkedList as a bucket implementation for HashMap?

I understand the workings of HashMap -collisions and everything - trying to understand the deeper mechanics and the choice for entry bucket - and not say an Array (making it a 2 dimensional matrix )? Since search through both are O(n) operations ? I assumed one factor that might make the choice for Linked List is the insertions o(1) factor! Is that a correct assumption?
If it is in Java then the linked list is the bucket. And each entry in the table is a linked list containing the array of entry elements. In that each node knows what is in the next to the list till the time you reach to the end when the next reference is null.
Also do check:-
How does Java's Hashmap work internally?
An array has a predetermined size. So if you use an array, then the hash table has a predetermined size for each of the array elements. Every possible bucket is allocated, and your hash table will become big. This will be make sense if you have very huge memory but if not then use link list and walk through the list to find the match.
Its because you do not add at the end of the linkedlist but at the head to make it O(1)
[Answering below question - No not looking for a solution :) - I just wanted to understand why was the choice was made for Buckets to be am internam implemetnation of a Linked List and not an array based List (i didn't mean a fixed size array - ). Linked lists offer node based implemetnation that certainly help when there are a large number of inserts into the data structure - but when you are adding to the end of list - any growable array based implementation for bucket should suffice?]

Check if values of two string-type items are equal in a Zabbix trigger

I am monitoring an application using Zabbix and have defined a custom item which returns a string value. Since my item's values are actually checksums, they will only contain the characters [0-9a-f]. Two mirror copies of my application are running on two servers for the sake of redundancy. I would like to create a trigger which would take the item values from both machines and fire if they are not the same.
For a moment, let's forget about the moment when values change (it's not an atomic operation, so the system may see inconsistent state, which is not a real error, for a short time), since I could work around it by looking at several previous values.
The crux is: how to write a Zabbix trigger expression which could compare for equality the string values of two items (the same item on two mirror hosts, actually)?
Both according to the fine manual and as I confirmed in praxis, the standard operators = and # only work on numeric values, so I can't just write the natural {host1:myitem[param].last(0)} # {host2:myitem[param].last(0)}. Functions such as change() or diff() can only compare values of the same item at different points in time. Functions such as regexp() can only compare the item's value with a constant string/regular expression, not with another item's value. This is very limiting.
I could move the comparison logic into the script which my custom item executes, but it's a bit messy and not elegant, so if at all possible, I would prefer to have this logic inside my Zabbix trigger.
Perhaps despite the limitations listed above, someone can come up with a workaround?
Workaround:
{host1:myitem[param].change(0)} # {host2:myitem[param].change(0)}
When only one of the servers sees a modification since the previously received value, an event is triggered.
From the Zabbix Manual,
change (float, int, str, text, log)
Returns difference between last and previous values.
For strings:
0 - values are equal
1 - values differ
I believe, and am struggling with this EXACT situation to this myself, that the correct way to do this is via calculated items.
You want to create a new ITEM, not trigger (yet!), that performs a calculated comparison on multiple item values (Strings Difference, Numbers within range, etc).
Once you have that item, have the calculation give you a value you can trigger off of. You can use ANY trigger functions in your calculation along with arrhythmic operations.
Now to the issue (which I've submitted a feature request for because this is extremely limiting), most trigger expressions evaluate to a number or a 0/1 bool.
I think I have a solution for my problem, which is that I am tracking a version number from a webpage: e.g. v2.0.1, I believe I can use string connotation and regex in calculated items in order to convert my string values into multiple number values. As these would be a breeze to compare.
But again, this is convoluted and painful.
If you want my advice, have yourself or a dev look at the code for trigger expressions and see if you can submit a patch add one trigger function for simple string comparison. (Difference, Length, Possible conversion to numerical values (using binary and/or hex combinations) etc.)
I'm trying to work on a patch myself, but I don't have time as I have so much monitoring to implement and while zabbix is powerful, it's got several huge flaws. I still believe it's the best monitoring system out there.
Simple answer: Create a UserParameter until someone writes a patch.
You could change your items to return numbers instead of strings. Because your items are checksums that are using only [0-9a-f] characters, they are numbers written in hexadecimal. So you would need to convert the checksum to decimal number.
Because the checksum is a big number, you would need to limit the hexadecimal number to 8 characters for Numeric (unsigned) type before conversion. Or if you would want higher precision, you could use float (but that would be more work):
Numeric (unsigned) - 64bit unsigned integer
Numeric (float) - floating point number
Negative values can be stored.
Allowed range (for MySQL): -999999999999.9999 to 999999999999.9999 (double(16,4)).
I wish Zabbix would have .hashedUnsigned() function that would compute hash of a string and return it as a number. Such a function should be easy to write.

Core Data - Optional attributes and performance

Per the Core Data Programming Guide:
You can specify that an attribute is
optional—that is, it is not required
to have a value. In general, however,
you are discouraged from doing
so—especially for numeric values
(typically you can get better results
using a mandatory attribute with a
default value—in the model—of 0). The
reason for this is that SQL has
special comparison behavior for NULL
that is unlike Objective-C's nil. NULL
in a database is not the same as 0,
and searches for 0 will not match
columns with NULL.
I have always made numeric values non-optional, but have not for dates and strings. It is convenient in my code to base logic on dates and/or strings being nil.
Based on the above recommendations, I am considering making everything in my database non-optional. For dates I could set the model default to a value of 0 and for strings a model default of nothing (""). Then, in my code I could test dates for [date timeIntervalSince1970] != 0 and strings for string.length != 0.
The question is, for a relatively small database, does this really matter from a Core Data performance standpoint? And what is the tradeoff if the attribute in question will never be directly queried via a predicate?
I have not seen any performance problems on small to medium sized data sets. I suspect that this is something you would deal with in the performance stage of your application.
Personally I use the same logic of non numerics being optional if it makes sense as it does indeed make the code easier which in turn gives me more time to optimize later.

Resources