Numeric values in YamlDotNet.RepresentationModel - yamldotnet

How do I get numeric values from the RepresentationModel?
Say, after traversing a document, I have a YamlScalarNode. It has a string Value, which I can, of course, try to convert to a number, but I'd expect YAML to detect the type and present it as int or double etc. (perhaps via descendants from YamlScalarNode, whose type I could detect).
Is there an official way to do it that I'm missing?
Note that I can't use Serialization: the document structure does not directly map to a class; it can be a recursive definition of arbitrary depth, and the end values are either scalar numbers or sequences of numbers (vectors).
Also, can YamlDotNet handle numerical keys in mappings? This means that keys 1 and 01 should be considered duplicates. I believe YAML specification requires that, but I'm not certain...

The YAML schemas specify how scalars are to be interpreted. Ideally, you would look at the tag of a scalar to establish its type according to the selected schema. However, YamlDotNet does not yet implement them. For now you will have to do that yourself.

Related

How to make a TypedDict with integer keys?

Is it possible to use an integer key with TypedDict (similar to dict?).
Trying a simple example:
from typing import TypedDict
class Moves(TypedDict):
0: int=1
1: int=2
Throws: SyntaxError: illegal target for annotation
It seems as though only Mapping[str, int] is supported but I wanted to confirm. It wasn't specifically stated in the Pep docs.
The intent of TypedDict is explicit in the PEP's abstract (emphasis added):
This PEP proposes a type constructor typing.TypedDict to support the use case where a dictionary object has a specific set of string keys, each with a value of a specific type.
and given the intended use cases are all annotatable in class syntax, implicitly applies only to dicts keyed by strings that constitute valid identifiers (things you could use as attribute or keyword argument names), not even strings in general. So as intended, int keys aren't a thing, this is just for enabling a class that uses dict-like syntax to access the "attributes" rather than attribute access syntax.
While the alternative, backwards compatible syntax, allowed for compatibility with pre-3.6 Python, allows this (as well as allowing strings that aren't valid Python identifiers), e.g.:
Moves = TypedDict('Moves', {0: int, 1: int})
you could only construct it with dict literals (e.g. Moves({0: 123, 1: 456})) because the cleaner keyword syntax like Moves(0=123, 1=456) doesn't work. And even though that technically works at runtime (it's all just dicts under the hood after all), the actual type-checkers that validate your type correctness may not support it (because the intent and documented use exclusively handles strings that constitute valid identifiers).
Point is, don't do this. For the simple case you're describing here (consecutive integer integer "keys" starting from zero, where each position has independent meaning, where they may or may not differ by type), you really just want a tuple anyway:
Moves = typing.Tuple[int, int] # Could be [int, str] if index 1 should be a string
would be used for annotations the same way, and your actual point of use in the code would just be normal tuple syntax (return 1, 2).
If you really want to be able to use the name Moves when creating instances, on 3.9+ you could use PEP 585 to do (no import required):
Moves = tuple[int, int]
allowing you to write:
return Moves((1, 2))
when you want to make an "instance" of it. No runtime checking is involved (it's roughly equivalent to running tuple((1, 2)) at runtime), but static type-checkers should understand the intent.

Storing a list of mixed types in Cassandra

In Cassandra, when specifying a table and fields, one has to give each field a type (text, int, boolean, etc.). The same applies for collections, you have to give lock a collection to specific type (set<text> and such).
I need to store a list of mixed types in Cassandra. The list may contain numbers, strings and booleans. So I would need something like list<?>.
Is this possible in Cassandra and if not, What workaround would you suggest for storing a list of mixed type items? I sketched a few, but none of them seem the right way to go...
Cassandra's CQL interface is strictly typed, so you will not be able to create a table with an untyped collection column.
I basically see two options:
Create a list field, and convert everything to text (not too nice, I agree)
Use the thift API and store everything as is.
As suggested at http://www.mail-archive.com/user#cassandra.apache.org/msg37103.html I decided to encode the various values into binary and store them into list<blob>. This allows to still query the collection values (in Cassandra 2.1+), one just needs to encode the values in the query.
On python, simplest way is probably to pickle and hexify when storing data:
pickle.dumps('Hello world').encode('hex')
And to load it:
pickle.loads(item.decode('hex'))
Using pickle ties the implementation to python, but it automatically converts to correct type (int, string, boolean, etc.) when loading, so it's convenient.

cql binary protocol and named bound variables in prepared queries

imagine I have a simple CQL table
CREATE TABLE test (
k int PRIMARY KEY,
v1 text,
v2 int,
v3 float
)
There are many cases where one would want to make use of the schema-less essence of Cassandra and only set some of the values and do, for example, a
INSERT into test (k, v1) VALUES (1, 'something');
When writing an application to write to such a CQL table in a Cassandra cluster, the need to do this using prepared statements immediately arises, for performance reasons.
This is handled in different ways by different drivers. Java driver for example has introduced (with the help of a modification in CQL binary protocol), the chance of using named bound variables. Very practical: CASSANDRA-6033
What I am wondering is what is the correct way, from a binary protocol point of view, to provide values only for a subset of bound variables in a prepared query?
Values in fact are provided to a prepared query by building a values list as described in
4.1.4. QUERY
[...]
Values. In that case, a [short] <n> followed by <n> [bytes]
values are provided. Those value are used for bound variables in
the query.
Please note the definition of [bytes]
[bytes] A [int] n, followed by n bytes if n >= 0. If n < 0,
no byte should follow and the value represented is `null`.
From this description I get the following:
"Values" in QUERY offers no ways to provide a value for a specific column. It is just an ordered list of values. I guess the [short] must correspond to the exact number of bound variables in a prepared query?
All values, no matter what types they are, are represented as [bytes]. If that is true, any interpretation of the [bytes] value is left to the server (conversion to int, short, text,...)?
Assuming I got this all right, I wonder if a 'null' [bytes] value can be used to just 'skip' a bound variable and not assign a value for it.
I tried this and patched the cpp driver (which is what I am interested in). Queries get executed but when I perform a SELECT from clqsh, I don't see the 'null' string representation for empty fields, so I wonder if that is a hack that for some reasons is not just crashing or the intended way to do this.
I am sorry but I really don't think I can just download the java driver and see how named bound variables are implemented ! :(
---------- EDIT - SOLVED ----------
My assumptions were right and now support to skip a field in a prepared query has been added to cpp driver (see here ) by using a null [bytes value].
What I am wondering is what is the correct way, from a binary protocol point of view, to provide values only for a subset of bound variables in a prepared query?
You need to prepare a query that only inserts/updates the subset of columns that you're interested in.
"Values" in QUERY offers no ways to provide a value for a specific column. It is just an ordered list of values. I guess the [short] must correspond to the exact number of bound variables in a prepared query?
That's correct. The ordering is determined by the column metadata that Cassandra returns when you prepare a query.
All values, no matter what types they are, are represented as [bytes]. If that is true, any interpretation of the [bytes] value is left to the server (conversion to int, short, text,...)?
That's also correct. The driver will use the returned column metadata to determine how to convert native values (strings, UUIDS, ints, etc) to a binary (bytes) format. Cassandra does the inverse of this operation server-side.
Assuming I got this all right, I wonder if a 'null' [bytes] value can be used to just 'skip' a bound variable and not assign a value for it.
A null column insertion is interpreted as a deletion.
Implementation of what I was trying to achieve has been done (see here ) based on the principle I described.

Data Structure to use instead of hash_map

I want to make an array containing three wide character arrays such that one of them is the key.
"LPWCH,LPWCH,LPWCH" was not able to use the greater than/lesser than symbols since it thinks it is a tag
Hash_map only lets me use a pair. wKey and the element associated with it. Is there another data structure that lets me use this?
This set will be updated by different threads almost simultaneously. And thats the reason why I don't want to use a class or another struct to define the remaining two wide character arrays.
You can use LPWCH as a key and std::pair<LPWCH, LPWCH> as an element.
Using any of LP-typedefs is not good. You would only be comparing the points, and not strings.
LPWCH is nothing but a WCHAR* which can be drilled down to void*. When you compare two pointers, you are comparing where they are pointing, and not what they are pointing.
You either need to have another comparer attached to your map/hash_map, or use actual string datatype (like std::string, CString)

Scalar vs. primitive data type - are they the same thing?

In various articles I have read, there are sometimes references to primitive data types and sometimes there are references to scalars.
My understanding of each is that they are data types of something simple like an int, boolean, char, etc.
Is there something I am missing that means you should use particular terminology or are the terms simply interchangeable?
The Wikipedia pages for each one doesn't show anything obvious.
If the terms are simply interchangeable, which is the preferred one?
I don't think they're interchangeable. They are frequently similar, but differences do exist, and seems to mainly be in what they are contrasted with and what is relevant in context.
Scalars are typically contrasted with compounds, such as arrays, maps, sets, structs, etc. A scalar is a "single" value - integer, boolean, perhaps a string - while a compound is made up of multiple scalars (and possibly references to other compounds). "Scalar" is used in contexts where the relevant distinction is between single/simple/atomic values and compound values.
Primitive types, however, are contrasted with e.g. reference types, and are used when the relevant distinction is "Is this directly a value, or is it a reference to something that contains the real value?", as in Java's primitive types vs. references. I see this as a somewhat lower-level distinction than scalar/compound, but not quite.
It really depends on context (and frequently what language family is being discussed). To take one, possibly pathological, example: strings. In C, a string is a compound (an array of characters), while in Perl, a string is a scalar. In Java, a string is an object (or reference type). In Python, everything is (conceptually) an object/reference type, including strings (and numbers).
There's a lot of confusion and misuse of these terms. Often one is used to mean another. Here is what those terms actually mean.
"Native" refers to types that are built into to the language, as opposed to being provided by a library (even a standard library), regardless of how they're implemented. Perl strings are part of the Perl language, so they are native in Perl. C provides string semantics over pointers to chars using a library, so pointer to char is native, but strings are not.
"Atomic" refers to a type that can no longer be decomposed. It is the opposite of "composite". Composites can be decomposed into a combination of atomic values or other composites. Native integers and floating point numbers are atomic. Fractions, complex numbers, containers/collections, and strings are composite.
"Scalar" -- and this is the one that confuses most people -- refers to values that can express scale (hence the name), such as size, volume, counts, etc. Integers, floating point numbers, and fractions are scalars. Complex numbers, booleans, and strings are NOT scalars. Something that is atomic is not necessarily scalar and something that is scalar is not necessarily atomic. Scalars can be native or provided by libraries.
Some types have odd classifications. BigNumber types, usually implemented as an array of digits or integers, are scalars, but they're technically not atomic. They can appear to be atomic if the implementation is hidden and you can't access the internal components. But the components are only hidden, so the atomicity is an illusion. They're almost invariably provided in libraries, so they're not native, but they could be. In the Mathematica programming language, for example, big numbers are native and, since there's no way for a Mathematica program to decompose them into their building blocks, they're also atomic in that context, despite the fact that they're composites under the covers (where you're no longer in the world of the Mathematica language).
These definitions are independent of the language being used.
Put simply, it would appear that a 'scalar' type refers to a single item, as opposed to a composite or collection. So scalars include both primitive values as well as things like an enum value.
http://ee.hawaii.edu/~tep/EE160/Book/chap5/section2.1.3.html
Perhaps the 'scalar' term may be a throwback to C:
where scalars are primitive objects which contain a single value and are not composed of other C++ objects
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1995/N0774.pdf
I'm curious about whether this refers to whether these items would have a value of 'scale'? - Such as counting numbers.
I like Scott Langeberg's answer because it is concise and backed by authoritative links. I would up-vote Scott's answer if I could.
I suppose that "primitive" data type could be considered primary data type so that secondary data types are derived from primary data types. The derivation is through combining, such as a C++ struct. A struct can be used to combine data types (such as and int and a char) to get a secondary data type. The struct-defined data type is always a secondary data type. Primary data types are not derived from anything, rather they are a given in the programming language.
I have a parallel to primitive being the nomenclature meaning primary. That parallel is "regular expression". I think the nomenclature "regular" can be understood as "regulating". Thus you have an expression that regulates the search.
Scalar etymology (http://www.etymonline.com/index.php?allowed_in_frame=0&search=scalar&searchmode=none) means ladder-like. I think the way this relates to programming is that a ladder has only one dimension: How many rungs from the end of the ladder. A scalar data type has only one dimension, thus represented by a single value.
I think in usage, primitive and scalar are interchangeable. Is there any example of a primitive that is not scalar, or of a scalar that is not primitive?
Although interchangeable, primitive refers to the data-type being a basic building block of other data types, and a primitive is not composed of other data types.
Scalar refers to its having a single value. Scalar contrasts with the mathematical vector. A vector is not represented by a single value because (using one kind of vector as an example) one value is needed to represent the vector's direction and another value needed to represent the vector's magnitude.
Reference links:
http://whatis.techtarget.com/definition/primitive
http://en.wikipedia.org/wiki/Primitive_data_type
Being scalar has nothing to do with the language, whereas being primitive is all dependent on the language. The two have nothing to do with each other.
A scalar data type is something that has a finite set of possible values, following some scale, i.e. each value can be compared to any other value as either equal, greater or less. Numeric values (floating point and integer) are the obvious examples, while discrete/enumerated values can also be considered scalar. In this regard, boolean is a scalar with 2 discrete possible values, and normally it makes sense that true > false. Strings, regardless of programming language, are technically not scalars.
Now what is primitive depends on the language. Every language classifies what its "basic types" are, and these are designated as its primitives. In JavaScript, string is primitive, despite it not being a scalar in the general sense. But in some languages a string is not primitive. To be a primitive type, the language must be able to treat it as immutable, and for this reason referential types such as objects, arrays, collections, cannot be primitive in most, if not all, languages.
In C, enumeration types, characters, and the various representations of integers form a more general type class called scalar types. Hence, the operations you can perform on values of any scalar type are the same as those for integers.
null type is the only thing that most realistically conforms to the definition of a "scalar type". Even the serialization of 'None' as 'N.' fitting into a 16bit word which is traditionally scalar -- or even a single bit which has multiple possible values -- isn't a "single data".
Every primitive is scalar, but not vice versa. DateTime is scalar, but not primitive.

Resources