How to set geometry of feature as union of multiple geometries - geospatial

I want to organise my administrative geographies based on the GeoSPARQL standard. This question is moved by the obvious consideration that administrative units are hierarchical and nested. So for example, the geo:feature (see http://www.opengis.net/ont/geosparql#) Italy geo:hasGeometry IT
Then I have other 20 features corresponding to the 20 regions (first-level administrative divisions).
Each feature has a corresponding geometry (based on the ISO_3166-2:IT standard they are named: IT-65, IT-77, IT-78, etc.)
Let's say I have three levels of administrative divisions in Italy, so for ADM0 n=1, ADM1 n=20, ADM2 n=100 and ADM3 n=10000. In total, I should define 10121 geometries but, in practice, I only need 10000, since ADM0 geo:hasGeometry that corresponds to the union of the 20 geometries of the ADM1 features. How to formalise this in an RDF serialisation?
ex:Italy a geo:Feature
geo:hasGeometry ex:IT
or instead can I point to the multiple geometries
ex:Italy a geo:Feature;
geo:hasGeometry ex:IT-65 ;
...
geo:hasGeometry ex:IT-78 .
Would it be implicit in the second declaration that ex:Italy is not defined by any individual geometry but only by the full union of all the geometries?

Actually, the best way to get the union of geometries is the union function proposed by OGC in GeoSPARQL.
Here is the link to the documentation : GeoFunction UNION
Obviously, take care of the CRS as usual.

Related

Formal UML representation of reshaping a data frame

For documentation of the restructuring of a data table from "wide" using a criteria column for each score to using a score column and a criterion column my first reaction was to use UML class diagram.
I am aware that by changing the structure of the data table, the class attributes have not changed.
My first question is whether the wide or the long version is the more correct representation of the data table?
My second question is whether it would make sense to relate the two representations - and if so, by which relationship?
My third question would be whether something else than a UML class diagram would be more suitable for documenting the reshaping (data preprocessing before showing distribution as a box pot in R).
You jumped a little bit to fast from the table to the UML. This makes your question very confusing, because what is wide as a table is represented long as a class, and the contrary.
Reformulating your problem, it appears that you are refactoring some tables. The wide table shows several values for a same student in the same row. This means that the maximum number of exercises is fixed by the table structure:
ID Ex1 Ex2 Ex3 .... Ex N
-----------------------------
111 A A A ... A
119 A C - ... D
127 B F B ... F
The long table has fewer columns, and each row shows only 1 specific score of 1 specific student:
ID # Score
---------------
111 1 A
111 2 A
111 3 A
...
111 N A
119 1 A
119 2 C
...
You can model this structure in an UML class diagram. But in UML, the table layout doesn't matter: that's an issue of the ORM mapping and you could perfectly have one class model (with an attribute or an association having a multiplicity 1..N) that could be implemented using either the wide or the long version. If the multiplicity would be 1..* only the long option would work.
Now to your questions:
Both representations are correct; they just have different characteristics. The wide is inflexible, since the maximum number of scores is fixed by the table structure. Also adding a new score requires in fact to update a record (so the possible concurrency of both models is not the same). The long is a little more complex to use if you want to show history of a student scores in a row.
Yes it makes sense to relate both, especially if you're writing for a transformation of the first into the second.
UML would not add necessarily value here. If you're really about tables and values, you could as well use an Entity/Relationship diagram. But UML has the advantage of allowing database modelling as well and it lets you add behavioral aspects. If not now, then later. You could consider using the non-standard «table» stereotype, to clarify what you are modelling a table (so a low level view on your design).

What is the difference between st_isvalid and st_issimple?

In Postgis there are two very similar functions. One is st_isValid, the other one is st_isSimple. I'd like to understand the difference between both for Polygons. For the st_isValid we have:
Some of the rules of polygon validity feel obvious, and others feel arbitrary (and in fact, are arbitrary).
Polygon rings must close.
Rings that define holes should be inside rings that define exterior boundaries.
Rings may not self-intersect (they may neither touch nor cross one another).
Rings may not touch other rings, except at a point.
For the st_isSimple we've got:
Returns true if this Geometry has no anomalous geometric points, such as self intersection or self tangency. For more information on the OGC's definition of geometry simplicity and validity, refer to "Ensuring OpenGIS compliancy of geometries"
Does it mean that any valid polygon is automatically simple?
Both functions check for similar OGC definition compliancy of geometries, but are defined for different geometries (by dimension);
By OGC definition
a [Multi]LineString can (should) be simple
a [Multi]Polygon can (should) be valid
This implies that
a simple [Multi]LineString is always considered valid
a valid [Multi]Polygon is always considered simple (as in, it must have at least one simple closed LineString ring)
thus the answer is yes.
Strictly speaking, using the inherent checks of the OGC defined functionality on the 'wrong' geometry type is useless.
PostGIS, however, liberally extends the functionality of ST_IsValid to use the correct checks for all geometry types.

What is the correct graphical representation of an empty UML class?

Most UML tools represent an empty UML class with empty fields for both, attributes and operations. But looking at the UML Infrastructure and Superstructure, there are a lot of empty classes shown as one single rectangle including the name of the class. But I cannot find a clear statement which of both representations is absolutly correct.
So which graphical representation is correct - both or just one of them - and where is your information coming from?
Though #JimL's answer is correct, here is the section from Superstructures 2.5 that explains the use of compartments:
The model in Figure B.6 specializes UMLDiagramElement and UMLShape
into UMLCompartmentableShape and UMLCompartment, respectively, to make
them concrete, add properties, and redefine inherited properties for
shapes with segregated contents.
UMLCompartmetableShape is the most
general class for UML elements that may have information shown in
separated portions inside their shapes, usually arranged linearly and
separated by solid lines (compartments). It subsets ownedElement from
UMLDiagramElement to specify compartments that are to appear
vertically ordered (first in order is shown at the top), where are
captured with UMLCompartment. UMLCompartment subsets ownedElement from
UMLDiagramElement to specify contents of compartments that are to
appear vertically ordered (first in order is shown at the top).
UMLCompartments have no modelElements.
Compartment titles shall be
interchanged as UMLLabels with no modelElements, and as the first
orderedElement of UMLCompartments.
It means that you can show from zero to N compartments.
Both are correct. Showing compartments is optional.

UML metamodel: derived, derived union and subsetting

If you have ever worked with the metamodel of UML, you propably know the concepts of unions and subsets - As far as I understand it:
Attributes and associations of an element/class marked as "derived union" cannot be used directly. In more specific sub-classes, you can possibly find subsets of them that can be used, as long as they are not marked as derived unions themselves.
"derived" (without union) attributes and associations have also subsets in more specific classes, but unlike above you can use them directly without having to look for subsets in more specific classes
My questions:
Does this make sense or am I on the wrong track here?
What is the meaning of the "/" (slash) you can find in front of some
attributes/associations, that they have subsets in child-classes?
E.g. /general : Classifier[*]
An union property is a property that consists of multiple other properties. You can only understand the union, when you combine all subsets. A list is almost by definition an union.
Almost, because it might be uninitialized.
A derived union is a property requiring a specific collection of subsets. I would not talk about accessing them directly, but about how direct you can understand them. You need all information before you can make an interpretation.
The difference between the two that a derived union requires a specific subset and an union might have a subset and might have different subsets in different contexts. A very simple example being the fields on a form. All required fields show the definition of a derived union. All other fields are part of the complete union.
Derived unions can contain derived unions in their subsets. It directs the creation of classes and their instances, it does not make them impossible.
All derived features require other features to be known. Temperature can be read directly, but to know if someone has fever requires more knowledge, like the time of the day, the place of collecting information etc..
The slash implies that it is being derived.

Converting graph to canonical string

I'm looking for a way of storing graphs as strings. The strings are to be used as keys in a map, so that two topologically identical graphs will map to the same value in the map. Does anybody know of such an algorithm?
The nodes of the tree are labeled with duplicate labels being allowed.
The program is in java and an implementation in that would be neat, but any pointers to possible algorithms are appreciated.
if you have an algorithm that maps general graphs to strings, and so that two graphs map to the same string if and only if they are topologically equivalent, then you have an algorithm for GRAPH AUTOMORPHISM. Graph automorphism has no known polynomial-time algorithms. So you can't have (easily :) a polynomial-time algorithm that calculates the strings as you postulate them, because otherwise you'd have constructed a previously unknown and very efficient algorithm to graph automorphism.
This doesn't mean that it wouldn't be possible to solve the problem for your class of graphs; it just means that for the class of all graphs it's kind of difficult.
You may find the following question relevant...
Using finite automata as keys to a container
Basically, an automaton can be minimised using well-known algorithms from automata-theory textbooks. Hopcrofts is an example. There is precisely one minimal automaton that is equivalent to any given automaton. However, that minimal automaton may be represented in different ways. Constructing a safe canonical form is basically a matter of renumbering nodes and ordering the adjacency table using information that is significant in defining the automaton, and not by information that is specific to the representation.
The basic principle should extend to general graphs. Whether you can minimise your graphs depends on their semantics, but the basic idea of renumbering the nodes and sorting the adjacency list still applies.
Other answers here assume things about your graphs - for example that the nodes have unique labels that can be ordered and which are significant for the semantics of your graphs, that can be used to identify the nodes in an adjacency matrix or list. This simply won't work if you're interested in morphims of unlabelled graphs, for instance. Different ways of numbering the nodes (and thus ordering the adjacency list) will result in different canonical forms for equivalent graphs that just happen to be represented differently.
As for the renumbering etc, an approach is to borrow and adapt principles from automata minimisation algorithms. Basically...
Create a vector of blocks (sets of nodes). Initially, populate this with one block per class of nodes (ie per distinct node annotation). The modification here is that we order these by annotation details (not by representation-specific node IDs).
For each class (annotation) of edges in order, evaluate each block. If each node in the block can follow the current edge-type to reach the same set of next blocks, leave it untouched. Otherwise, split it as necessary to get maximal blocks that achieve this objective. Keep these split blocks clustered together in the vector (preserve the existing ordering, just refine it a bit), and order the split blocks based on a suitable ordering of the next-block sets. For example, use bitvectors as long as the current vector of blocks, with a set bit for each block reachable by following the current edge type. To order the bitvectors, treat them as big integers.
EDIT - I forgot to mention - in the second bullet, as soon as you split a block, you restart with the first block in the vector and first edge annotation. Obviously, a naive implementation will be slow, so take the principle and use it to adapt Hopcrofts minimisation algorithm.
If you end up with blocks that have multiple nodes in them, those nodes are equivalent. Whether that means they can be merged or not depends on your semantics, but the relative ordering of nodes within each such block clearly doesn't matter.
If dealing with graphs that can be minimised (e.g. automaton digraphs) I suspect it's best to minimise first, though I still haven't got around to implementing this myself.
The key thing is, of course, ensuring that your renumbering is sensitive only to the significant details of the graph - its structure and annotations - and not the things that are only there so that you can construct a representation such as node IDs/addresses etc.
Once you have the blocks ordered, deriving a canonical form should be easy.
gSpan introduced the 'minimum DFS code' which encodes graphs such that if two graphs have the same code, they must be isomorphic. gSpan has implementations in C++ and Java.
A common way to do this is using Adjacency lists
Beside an Adjacency list, there are adjacency matrices. Which one you choose should depend on which you use to implement your Graph class (adjacency lists are usually the better choice, but they both have strengths and weaknesses). If you have a totally different implementation of Graph, consider using one of these, as it makes many graph algorithms very easy to implement.
One other option is, if possible, overriding hashCode() and equals() on the Graph class and use the actual graph object as the key rather than converting to a string.
E: overriding the hashCode() and equals() is the route I would take if some vertices are not uniquely labeled. As noted in the comments, this can be expensive, but I think it would depend on the implementation of the Graph class.
If equals() is too expensive, then you should use an adjacency list or matrix, but don't just use the node names. You have to carefully specify exactly what it is that identifies individual graphs and vertices (and therefore what would make them equal), and then make your string representation of the adjacency list use those properties instead of the node names. I'd suggest you write this specification of your graph equals operation down.

Resources