Detecting data leakage?

Detecting data leakage? - alloy

I have a tree of elements. I harvest data from some of the elements and use that data to create a set of nodes. I want an assert to check that, given access to only the set of nodes, there is no possibility of accessing other data, such as data in other elements of the tree. That is, I want to ensure no data leakage.
Here is my tree of elements:
sig Element {
data: Data,
children: set Element
}
I harvest data from some elements. I stuff the data into nodes:
sig Node {
data: Data
}
If someone has access to just the set of nodes, then there is no data leakage because the nodes contain just the data that was deliberately harvested from elements in the tree.
However, for debugging purposes I added a field to Node to reference the Element that was the source of the Node's data:
sig Node {
data: Data,
represents: Element
}
Through accidental oversight, the represents field was not removed. Now a person with access to the set of Node also has access to the tree and therefore can see more data than the person should. Thus, there is the potential for data leakage.
I want to create an assert that checks the model for potential data leakage:
assert No_data_leakage { ??? }
Intuitively, I want the assert to say something like this: Of the universe (univ) of values in this model, a person with access to only the set of Nodes just has access to the data values in the set of Nodes and nothing else. How do I express that?
Below is a simplified version of my model.
open util/ordering[Element]
open util/ordering[Node]
sig Element {
data: Data,
children: set Element
}
one sig Root extends Element {}
sig Data {}
sig Node {
data: Data,
represents: Element
}
fact No_disconnected_elements {
all e: Element |
(e = Root) or (e in Root.^children)
}
fact Each_element_has_one_parent {
no disj e, e', e'': Element |
(e in e'.children) and (e in e''.children)
}
fact No_loops {
no e: Element | e in e.^children
}
fact First_Node_data_is_first_Element_data {
(Node <: first).data = (Element <: first).data
(Node <: first).represents = (Element <: first)
}
fact Last_Node_data_is_last_Element_data {
(Node <: last).data = (Element <: last).data
(Node <: last).represents = (Element <: last)
}
fact Every_element_has_different_data {
no disj e, e': Element | e.data = e'.data
}
run {} for 3 but 2 Node
assert No_data_leaks {
// How to express this?
}

See http://alloytools.org/quickguide/meta.html
There is a meta capability that allows you to 'iterate' over the fields of an Atom.
assert no_data_leaks {
all f : Node$.subfields| f.value[Node] in Data
}

Related

No Instance Found When Using "always"

I'm trying to visualise a spec for a Payment object where it moves from "queued" to "processing" to "complete". I have come up with the following:
enum State {Queued, Processing, Complete}
sig Payment {
var state: State
}
pred processPayment[p: Payment] {
p.state = Queued // guard
p.state' = Processing // action
}
pred completePayment[p: Payment] {
p.state = Processing // guard
p.state' = Complete // action
}
fact init {
Payment.state = Queued
}
fact next {
always (some p : Payment | processPayment[p] or completePayment[p])
}
run {} for 1 Payment
Unfortunately, I get no instances found for this spec. From my understanding, a Payment with state "Queued" initially and a next state where it's in "Processing" should be allowed with the always (some p : Payment | processPayment[p] or completePayment[p])" formula according to the tutorial at https://haslab.github.io/formal-software-design/overview/index.html. Am I missing something?

The issue turned out to be a missing terminating predicate, adding the below(or a stutter) fixes it.
pred afterComplete[p: Payment] {
p.state = Complete // guard
p.state' = Complete // action
}

I am not an expert but I believe that it is impossible for either of your predicates to be true at t0. I think you need a holdPayment predicate like this:
pred holdPayment[p:Payment] {
p.state = p.state'
}
fact next {
always (some p : Payment | processPayment[p] or completePayment[p] or holdPayment[p])
}
run {} for 1 Payment

Writing Rust-y code: Keeping references to different structs depending on type of object in XML

I'm having a hard time formulating this in a rust-y manner, since my brain is still hardwired in Python. So I have a XML file:
<xml>
<car>
<name>First car</name>
<brand>Volvo</brand>
</car>
<plane>
<name>First plane</name>
<brand>Boeing</brand>
</plane>
<car>
<name>Second car</name>
<brand>Volvo</brand>
</car>
</xml>
In reality it's much more complex and the XML is about 500-1000MB large. I'm reading this using quick-xml which gives me events such as Start (tag start), Text and End (tag end) and I'm doing a state machine to keep track.
Now I want to off-load the parsing of car and plane to different modules (they need to be handled differently) but share a base-implementation/trait.
So far so good.
Now using my state machine I know when I need to offload to the car or the plane:
When I enter the main car tag I want to create a new instance of car
After that, offload everything until the corresponding </car> to it
When we reach the end I'm going to call .save() on the car implementation to store it elsewhere, and can free/destroy the instance.
But this means in my main loop I need to create a new instance of the car and keep track of it (and the same for plane if that's the main element.
let mut current_xml_section: I_DONT_KNOW_THE_TYPE = Some()
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) => {
if state == State::Unknown {
match e.name() {
b"car" => {
state = State::InSection;
current_section = CurrentSection::Car;
state_at_depth = depth;
current_xml_section = CurrentSection::Car::new(e); // this won't work
},
b"plane" => {
state = State::InSection;
current_section = CurrentSection::Plane;
state_at_depth = depth;
current_xml_section = CurrentSection::Plane::new(e); // this won't work
},
_ => (),
};
}else{
current_xml_section.start_tag(e); // this won't work
}
depth += 1;
},
Ok(Event::End(ref e)) => {
depth -= 1;
if state == State::InSection && state_at_depth == depth {
state = State::Unknown;
current_section = CurrentSection::Unknown;
state_at_depth = 0;
current_xml_section.save(); // this won't work
// Free current_xml_section here
}else{
if state == State::InSection {
current_xml_section.end_tag(e) // this won't work
}
}
},
// unescape and decode the text event using the reader encoding
Ok(Event::Text(e)) => (
if state == State::InSection {
current_xml_section.text_data(e) // this won't work
}
),
Ok(Event::Eof) => break, // exits the loop when reaching end of file
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
_ => (), // There are several other `Event`s we do not consider here
}
// if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
buf.clear();
}
}
So I basically don't know how to keep a reference in the main loop to the "current" object (I'm sorry, Python term), given that:
We may or may not have a current tag we're processing
That section might be a reference to either Car or Plane
I've also considered:
Use Serde, but it's a massive document and frankly I don't know the entire structure of it (I'm black box decoding it) so it would need to be passed to Serde in chunks (and I didn't manage to do that, even though I tried)
Keeping a reference to the latest plane, the latest car (and start by creating blank objects outside of the main loop) but it feels ugly
Using Generics
Any nudge in the right direction would be welcome as I try to un-Python my brain!

Event driven parsing of XML lends itself particularly well to a scope driven approach, where each level is parsed by a different function.
For example, your main loop could look like this:
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) => {
match e.name() {
b"car" => handle_car(&mut reader, &mut buf)?,
b"plane" => handle_plane(&mut reader, &mut buf)?,
_ => return Err("Unexpected Tag"),
}
},
Ok(Event::Eof) => break,
_ => (),
}
}
Note that the inner match statement only has to consider the XML tags that can occur at the top level; any other tag is unexpected and should generate an error.
handle_car would look something like this:
fn handle_car(reader: &mut Reader<&[u8]>, buf:&mut Vec<u8>) -> Result<(),ErrType> {
let mut car = Car::new();
loop {
match reader.read_event(buf) {
Ok(Event::Start(ref e)) => {
match e.name() {
b"name" => {
car.name = handle_name(reader, buf)?;
},
b"brand" => {
car.brand = handle_brand(reader, buf)?;
},
_ => return Err("bad tag"),
}
},
Ok(Event::End(ref e)) => break,
Ok(Event::Eof) => return Err("Unexpected EOF"),
_ => (),
}
}
car.save();
Ok(())
}
handle_car creates its own instance of Car, which lives within the scope of that function. It has its own loop where it handles all the tags that can occur within it. If those tags contain yet more tags, you just introduce a new set of handling functions for them. The function returns a Result so that if the input structure does not match expectations the error can be passed up (as can any errors produced by quick_xml, which I have ignored but real code would handle).
This pattern has some advantages when parsing XML:
The structure of the code matches the expected structure of the XML, making it easier to read and understand.
The state is implicit in the structure of the code. No need for state variables or depth counters.
Common tags, that appear in multiple places (such as <name> and <brand> can be handled by common functions that are re-used
If the XML format you are parsing has nested structures (eg. if <car> could contain another <car>) this is handled by recursion.
Your original problem of not knowing how to store the Car / Plane within the main loop is completely avoided.

Compare two maps and find differences using Groovy or Java

I would like to find difference in two maps and create a new csv file with the difference (and put the difference between **) like below:
Map 1
[
[cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"],
[cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G12356"],
[cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12300"],
[cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
Map 2
[
[name:"Apple", cuInfo:"T12",service:"3",startDate:"14-02-16 10:00",appId:"G12351"],
[name:"Apple",cuInfo:"T13",service:"3",startDate:"14-01-16 13:00",appId:"G12352"],
[name:"Apple",cuInfo:"T16",service:"3",startDate:"14-01-16 13:00",appId:"G12353"],
[name:"Google",cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12301"],
[name:"Microsoft",cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"],
[name:"Microsoft",cuInfo:"T18",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
How can I get the output csv like below
Map 1 data | Map 2 data
service 3;name Apple;
cuInfo;startDate;appId | cuInfo;startDate;appId
T12;*14-02-16 10:00*;*G12351* | T12;*14-01-16 13:22*;*G12355*
T13;*14-01-16 13:00*;*G12352* | T13;*12-02-16 13:00*;*G12356*
service 9;name Google;
T14;*10-01-16 11:20*;*G12301* | T12;*10-01-16 11:20*;*G12300*
Thanks

In the following I'm assuming that the list of maps is sorted appropriately so that the comparison is fair, and that both lists are of the same length:
First, create an Iterator to traverse both lists simultaneously:
#groovy.transform.TupleConstructor
class DualIterator implements Iterator<List> {
Iterator iter1
Iterator iter2
boolean hasNext() {
iter1.hasNext() && iter2.hasNext()
}
List next() {
[iter1.next(), iter2.next()]
}
void remove() {
throw new UnsupportedOperationException()
}
}
Next, process the lists to get rows for the CSV file:
def rows = new DualIterator(list1.iterator(), list2.iterator())
.findAll { it[0] != it[1] } // Grab the non-matching lines.
.collect { // Mark the non-matching values.
(m1, m2) = it
m1.keySet().each { key ->
if(m1[key] != m2[key]) {
m1[key] = "*${m1[key]}*"
m2[key] = "*${m2[key]}*"
}
}
[m1, m2]
}.collect { // Merge the map values into a List of String arrays
[it[0].values(), it[1].values()].flatten() as String[]
}
Finally, write the header and rows out in CSV format. NOTE: I'm using a proper CSV; your example is actually invalid because the number of columns are inconsistent:
def writer = new CSVWriter(new FileWriter('blah.csv'))
writer.writeNext(['name1', 'cuInfo1', 'service1', 'startDate1', 'appId1', 'name2', 'cuInfo2', 'service2', 'startDate2', 'appId2'] as String[])
writer.writeAll(rows)
writer.close()
The output looks like this:
"name1","cuInfo1","service1","startDate1","appId1","name2","cuInfo2","service2","startDate2","appId2"
"Apple","T12","3","*14-02-16 10:00*","*G12351*","Apple","T12","3","*14-01-16 13:22*","*G12355*"
"Apple","T13","3","*14-01-16 13:00*","*G12352*","Apple","T13","3","*12-02-16 13:00*","*G12356*"
"Google","T14","9","10-01-16 11:20","*G12301*","Google","T14","9","10-01-16 11:20","*G12300*"

How to query for node's subtree in gremlin?

Having a graph, actually it is a tree: vertexes are nodes, edges are labeled as "subnode" and directed from child to parent.
I need to make gremlin query to get recursive structure as this:
node_info = [properties: node.map(),
subnodes: [...list of node_info items...]]
Groovy function describes more precisely what I need to get:
def get_node_hierarchy(node_id) {
def get_hierarchy(node) {
def hierarchy_list = []
for (subnode in node.in('subnode')) {
sub_hierarchy = get_hierarchy(subnode)
hierarchy_list.add(sub_hierarchy)
}
[properties: node.map(), subnodes: hierarchy_list]
}
node = g.V('node_id', node_id).next()
get_hierarchy(node)
}
result = get_node_hierarchy(1)
Is it possible to implement this using a single Gremlin query?

groovy Object and primitive confusion

Here's a part of (groovy) class that stores some data in Mongodb:
long save(Object data) {
def customerReference = getNextCustomerReference()
def map = ['customerReference': customerReference, 'data': data, 'created': new Date()]
BasicDBObject basicDBObject = new BasicDBObject(map)
collection.insert(basicDBObject)
customerReference
}
private long getNextCustomerReference() {
1234
}
even though I have explicitly said i want a primitive long, what ends up in the database is an object:
{ "_id" : ObjectId("52f3c0597d844b0fcee29013"), "customerReference" : NumberLong(1234), "data" : "original data", "created" : ISODate("2014-02-06T17:03:21.411Z") }
However, if I change the return type to def for the private method this happens:
{ "_id" : ObjectId("52f3c1477d84698725f50fe5"), "customerReference" : 1234, "data" : "data", "created" : ISODate("2014-02-06T17:07:19.055Z") }
which the behaviour I want (a primitive stored in the db).
Can someone explain this because its baffling. Surely if I go out of my way to define a type, Groovy should try and honour it?

Groovy almost always automatically autoboxes primitive types to their number reference type-equivalent:
long test_long() { 123l }
int test_int() { 123 }
def test_def() { 123 }
def test_def_long() { 123l }
long l = 42l
assert test_long().class == Long.class
assert test_int().class == Integer.class
assert test_def().class == Integer.class
assert test_def_long().class == Long.class
assert l.class === Long.class
If you remove the long return type, the object is autoboxed to java.lang.Integer. Seems your code handles the Integer like a "primitive".
Some time ago Groovy 1.8 introduced primitive type optimization, an internal fallback to use primitive types under the hood in certain situations. This can help in some situations but is an internal performance optimization you can't directly make use of (by using some syntax construct or something like that).
Sometimes you can force a primitive by an explicit cast, but chances are high it will be converted to a reference type along the way through methods calls and stuff.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Detecting data leakage? - alloy

See http://alloytools.org/quickguide/meta.html There is a meta capability that allows you to 'iterate' over the fields of an Atom. assert no_data_leaks { all f : Node$.subfields| f.value[Node] in Data }

Related

No Instance Found When Using "always"

Writing Rust-y code: Keeping references to different structs depending on type of object in XML

Compare two maps and find differences using Groovy or Java

How to query for node's subtree in gremlin?

groovy Object and primitive confusion

Categories

Resources