Searching through a container vs. saving indices within elements? - search

I've been working on a few programming projects lately, and I've been running into some issues with managing object lifespans and general resource fundamentals. I'm a self-taught programmer, so this may be a very basic issue that I've just overlooked. One of the issues that I've been having has been related to managing containers of objects, especially in situations where you have a reference/pointer to a single element and you need to remove it from any/all of the containers that might also be holding a reference/pointer to it.
Here's a (pseudocode) example that's I've been struggling with:
class GameBoard
{
Vector<GamePieces> activePieces; //All pieces on the board..
Vector<Match> matchList;
public void scanBoardForMatchingNeighbors();
public void removeMatchingPieces();
}
class Match //Represents a group of matching pieces as a single object.
{
Vector<GamePieces> matchingPieces;
}
If I have a GameBoard with multiple pieces on it (something like chess/checkers, or even like Dr.Mario/Tetris/etc.), each piece can be matched with another piece of the same color/type/etc.
When multiple pieces are found to be matching, a "Match" object is created and it internally references the matching pieces. Moving 3 blue pieces together would create a match object that references those 3 blue objects, and moving 6 yellow pieces together would create another new match object that references those 6 yellow pieces. Those matches would then be added into the matchList.
While pieces are matched (and therefore, referenced by match objects inside the matchList), they might be animated, change color, or have different interactions, etc., for some amount of time before they're ready to be removed from the game altogether. However, because they're still on the board in the same 'spaces' they're now being referenced by both containers (activePieces and the matches in the matchList).
However, because these pieces are now referenced by multiple containers, I'm struggling with finding a good way of removing pieces from both. Here's an example of the algorithm that I was considering using for "removeMatchingPieces()":
public void removeMatchingPieces()
{
-iterate through container of matches (matchList)..
--award points, play animations, etc. for each match in the list..
--within each match, iterate through each piece..
---remove each piece reference from the list of activePieces in the game board.. (which involves either storing indices or using search algorithms)
--Finally, remove the match object from the matchList in the game board after it's been processed..
}
The issue that I've been having is that, while I'm scanning through the list of matches and I grab a piece that is part of a match, I want to both remove it from the list of matches after it's been processed AND remove it from the game board. The problem is, I'm iterating through the match container, so when I want to delete each piece from the game board's activePieces container I can't.
I spent last night thinking about how to address this, and I came up with two ideas:
1.) Have each piece internally store their own index within the container of activePieces. I'm not sure that this is the greatest idea, as it seems error-prone and a little bit encapsulation breaking.
2.) Take each piece in the match, and then iterate through the container of activePieces in the game board while comparing references. If the references match, the objects are the same and the index is the number of iterations used in the loop. What concerns me is that this seems like it could be expensive if there are many matching pieces.
So what's the best way to 'cross-check' multiple containers of objects references to find the same object so that I can properly delete it (or, depending on the language, pass it over to the garbage collector)?

Related

Always valid domain model entails prefixing a bunch of Value Objects. Doesn't that break the ubiquitous language?

The principle of always valid domain model dictates that value object and entities should be self validating to never be in an invalid state.
This requires creating some kind of wrapper, sometimes, for primitive values. However it seem to me that this might break the ubiquitous language.
For instance say I have 2 entities: Hotel and House. Each of those entities has images associated with it which respect the following rules:
Hotels must have at least 5 images and no more than 20
Houses must have at least 1 image and no more than 10
This to me entails the following classes
class House {
HouseImages images;
// ...
}
class Hotel {
HotelImages images;
}
class HouseImages {
final List<Image> images;
HouseImages(this.images) : assert(images.length >= 1),
assert(images.length <= 10);
}
class HotelImages {
final List<Image> images;
HotelImages(this.images) : assert(images.length >= 5),
assert(images.length <= 20);
}
Doesn't that break the ubiquitous languages a bit ? It just feels a bit off to have all those classes that are essentially prefixed (HotelName vs HouseName, HotelImages vs HouseImages, and so on). In other words, my value object folder that once consisted of x, y, z, where x, y and z where also documented in a lexicon document, now has house_x, hotel_x, house_y, hotel_y, house_z, hotel_z and it doesn't look quite as english as it was when it was x, y, z.
Is this common or is there something I misunderstood here maybe ? I do like the assurance it gives though, and it actually caught some bugs too.
There is some reasoning you can apply that usually helps me when deciding to introduce a value object or not. There are two very good blog articles concerning this topic I would like to recommend:
https://enterprisecraftsmanship.com/posts/value-objects-when-to-create-one/
https://enterprisecraftsmanship.com/posts/collections-primitive-obsession/
I would like to address your concrete example based on the heuristics taken from the mentioned article:
Are there more than one primitive values that encapsulate a concept, i.e. things that always belong together?
For instance, a Coordinate value object would contain Latitude and Longitude, it would not make sense to have different places of your application knowing that these need to be instantiated and validated together as a whole. A Money value object with an amount and a currency identifier would be another example. On the other hand I would usually not have a separate value object for the amount field as the Money object would already take care of making sure it is a reasonable value (e.g. positive value).
Is there complexity and logic (like validation) that is worth being hidden behind a value object?
For instance, your HotelImages value object that defines a specific collection type caught my attention. If HotelImages would not be used in different spots and the logic is rather simple as in your sample I would not mind adding such a collection type but rather do the validation inside the Hotel entity. Otherwise you would blow up your application with custom value objects for basically everything.
On the other hand, if there was some concept like an image collection which has its meaning in the business domain and a set of business rules and if that type is used in different places, for instance, having a ImageCollection value object that is used by both Hotel and House it could make sense to have such a value object.
I would apply the same thinking concerning your question for HouseName and HotelName. If these have no special meaning and complexity outside of the Hotel and House entity but are just seen as some simple properties of those entities in my opinion having value objects for these would be an overkill. Having something like BuildingName with a set of rules what this name has to follow or if it even is consisting of several primitive values then it would make sense again to use a value object.
This relates to the third point:
Is there actual behaviour duplication that could be avoided with a value object?
Coming from the last point thinking of actual duplication (not code duplication but behaviour duplication) that can be avoided with extracting things into a custom value object can also make sense. But in this case you always have to be careful not to fall into the trap of incidental duplication, see also [here].1
Does your overall project complexity justify the additional work?
This needs to be answered from your side of course but I think it's good to always consider if the benefits outweigh the costs. If you have a simple CRUD like application that is not expected to change a lot and will not be long lived all the mentioned heuristics also have to be used with the project complexity in mind.

How to implement efficient string interning in f#?

What is to implement a custom string type in f# for interning strings. i have to read large csv files into memory. Given most of the columns are categorical, values are repeating and it makes sense to create new string first time it is encountered and only refer to it on subsequent occurrences to save memory.
In c# I do this by creating a global intern pool (concurrent dict) and before setting a value, lookup the dictionary if it already exists. if it exists, just point to the string already in the dictionary. if not, add it to the dictionary and set the value to the string just added to dictionary.
New to f# and wondering what is the best way to do this in f#. will be using the new string type in records named tuples etc and it will have to work with concurrent processes.
Edit:
String.Intern uses the Intern Pool. My understanding is, it is not very efficient for large pools and is not garbage collected i.e. any/all interned strings will remain in intern pool for lifetime of the app. Imagine a an application where you read a file, perform some operations and write data. Using Intern Pool solution will probably work. Now imagine you have to do the same 100 times and the strings in each file have little in common. If the memory is allocated on heap, after processing each file, we can force garbage collector to clear unnecessary strings.
I should have mentioned I could not really figure out how to do the C# approach in F# (other than implementing a C# type and using it in F#)
Memorisation pattern is slightly different from what I am looking for? We are not caching calculated results - we are ensuring each string object is created no more than once and all subsequent creations of same string are just references to the original. Using a dictionary to do this is a one way and using String.Intern is other.
sorry if is am missing something obvious here.
I have a few things to say, so I'll post them as an answer.
First, I guess String.Intern works just as well in F# as in C#.
let x = "abc"
let y = StringBuilder("a").Append("bc").ToString()
printfn "1 : %A" (LanguagePrimitives.PhysicalEquality x y) // false
let y2 = String.Intern y
printfn "2 : %A" (LanguagePrimitives.PhysicalEquality x y2) // true
Second, are you using a dictionary in combination with String.Intern in your C# solution? If so, why not just do s = String.Intern(s); after the string is ready following input from file?
To create a type for use in your business domain to handle string deduplication in general is a very bad idea. You don't want your business domain polluted by that kind of low level stuff.
As for rolling your own. I did that some years ago, probably to avoid that problem you mentioned with the strings not being garbage collected, but I never tested if that actually was a problem.
It might be a good idea to use a dictionary (or something) for each column (or type of column) where the same values are likely to repeat in great numbers. (This is pretty much what you said already.)
It makes sense to only keep these dictionaries live while you read the information from file, and stuff it into internal data structures. You might be thinking that you need the dictionaries for subsequent reads, but I am not so sure about that.
The important thing is to deduplicate the great majority of strings, and not necessarily every single duplicate. Because of this you can greatly simplify the solution as indicated. You most probably have nothing to gain by overcomplicating your solution to squeeze out the last fraction of memory savings.
Releasing the dictionaries after the file is read and structures filled, will have the advantage of not holding on to strings when they are no longer really needed. And of course you save memory by not holding onto the dictionaries.
I see no need to handle concurrency issues in the implementation here. String.Intern must necessarily be immune to concurrency issues. If you roll your own with the design suggested, you would not use it concurrently. Each file being read would have its own set of dictionaries for its columns.

How to delete elements by value in a map structure restricted with having one key

The main problem is that I'm working in a functional language with immutable types so thing like pointers and deletion are a bit harder. I would prefer if this was implementable primarily in Haskell.
Let's imagine we have a single dimensional field
[x,x,x,x,x,x,x,x,x]
So I have a map with keys being SIZES and values being ADDRESSES because each entry starts from a certain ADDRESS and has a certain SIZE.
[(x,x,x),x,x,(x,x,x,x)]
I want to be able to add an element by SIZE to a map and then check if the entries are touching so that I can merge them.
Since my map is by SIZEs I have to iterate through the whole map to find the ones with the bordering ADDRESSes.
Do I really have to chose between implementing a 2 key map and O(n) for merger?
Welp, in essence, this looks like computer memory. Do you want it to be efficient? Because you know, "things like pointers" exist and work in Haskell perfectly well.
Since my map is by SIZEs I have to iterate through the whole map to find the ones with the bordering ADDRESSes.
No, if you store the ranges in a separate data structure. I think for such non-overlapping subsets, there was something called a spanning tree (or as suggested by #Daniel, IntervalMap), but I'm not exactly an expert on those. Otherwise, why don't you simply hold memory blocks like that?
data Block = Block { start :: Int, data :: [Byte] }
type Memory = [Block]
You could cache the block length or use a data structure where length is O(1), to make merges O(nBlocks).
Sure, that doesn't make it obvious at the type level that they won't ever overlap, but that's an invariant you can keep for yourself.

How to define a tree-like DAG in Haskell

How do you define a directed acyclic graph (DAG) (of strings) (with one root) best in Haskell?
I especially need to apply the following two functions on this data structure as fast as possible:
Find all (direct and indirect) ancestors of one element (including the parents of the parents etc.).
Find all (direct) children of one element.
I thought of [(String,[String])] where each pair is one element of the graph consisting of its name (String) and a list of strings ([String]) containing the names of (direct) parents of this element. The problem with this implementation is that it's hard to do the second task.
You could also use [(String,[String])] again while the list of strings ([String]) contain the names of the (direct) children. But here again, it's hard to do the first task.
What can I do? What alternatives are there? Which is the most efficient way?
EDIT: One more remark: I'd also like it to be defined easily. I have to define the instance of this data type myself "by hand", so i'd like to avoid unnecessary repetitions.
Have you looked at the tree implemention in Martin Erwig's Functional Graph Library? Each node is represented as a context containing both its children and its parents. See the graph type class for how to access this. It might not be as easy as you requested, but it is already there, well-tested and easy-to-use. I have used it for more than a decade in a large project.

Why do some programming languages restrict you from editing the array you're looping through?

Pseudo-code:
for each x in someArray {
// possibly add an element to someArray
}
I forget the name of the exception this throws in some languages.
I'm curious to know why some languages prohibit this use case, whereas other languages allow it. Are the allowing languages unsafe -- open to some pitfall? Or are the prohibiting languages simply being overly cautious, or perhaps lazy (they could have implemented the language to gracefully handle this case, but simply didn't bother).
Thanks!
What would you want the behavior to be?
list = [1,2,3,4]
foreach x in list:
print x
if x == 2: list.remove(1)
possible behaviors:
list is some linked-list type iterator, where deletions don't affect your current iterator:
[1,2,3,4]
list is some array, where your iterator iterates via pointer increment:
[1,2,4]
same as before, only the system tries to cache the iteration count
[1,2,4,<segfault>]
The problem is that different collections implementing this enumerable/sequence interface that allows for foreach-looping have different behaviors.
Depending on the language (or platform, as .Net), iteration may be implemented differently.
Typically a foreach creates an Iterator or Enumerator object on the array, which internally keeps its state about the iteration details. If you modify the array (by adding or deleting an element), the iterator state would be inconsistent in regard to the new state of the array.
Platforms such as .Net allow you to define your own enumerators which may not be susceptible to adding/removing elements of the underlying array.
A generic solution to the problem of adding/removing elements while iterating is to collect the elements in a new list/collection/array, and add/remove the collected elements after the enumeration has completed.
Suppose your array has 10 elements. You get to the 7th element, and decide there that you need to add a new element earlier in the array. Uh-oh! That element doesn't get iterated on! for each has the semantics, to me at least, of operating on each and every element of the array, once and only once.
Your pseudo example code would lead to an infinite loop. For each element you look at, you add one to the collection, hence if you have at least 1 element to start with, you will have i (iterative counter) + 1 elements.
Arrays are typically fixed in the number of elements. You get flexible sized widths through wrapped objects (such as List) that allow the flexibility to occur. I suspect that the language may have issues if the mechanism they used created a whole new array to allow for the edit.
Many compiled languages implement "for" loops with the assumption that the number of iterations will be calculated once at loop startup (or better yet, compile time). This means that if you change the value of the "to" variable inside the "for i = 1 to x" loop, it won't change the number of iterations. Doing this allows a legion of loop optimizations, which are very important in speeding up number-crunching applications.
If you don't like that semantics, the idea is that you should use the language's "while" construct instead.
Note that in this view of the world, C and C++ don't have proper "for" loops, just fancy "while" loops.
To implement the lists and enumerators to handle this, would mean a lot of overhead. This overhead would always be there, and it would only be useful in a vast miniority of the cases.
Also, any implementation that were chosen would not always make sense. Take for example the simple case of inserting an item in the list while enumerating it, would the new item always be included in the enumeration, always excluded, or should that depend on where in the list the item was added? If I insert the item at the current position, would that change the value of the Current property of the enumerator, and should it skip the currently current item which is then the next item?
This only happens within foreach blocks. Use a for loop with an index value and you'll be allowed to. Just make sure to iterate backwards so that you can delete items without causing issues.
From the top of my head there could be two scenarios of implementing iteration on a collection.
the iterator iterates over the collection for which it was created
the iterator iterates over a copy of the collection for which it was created
when changes are made to the collection on the fly, the first option should either update its iteration sequence (which could be very hard or even impossible to do reliably) or just deny the possibility (throw an exception). The last of which obviously is the safe option.
In the second option changes can be made upon the original collection without bothering the iteration sequence. But any adjustments will not be seen in the iteration, this might be confusing for users (leaky abstraction).
I could imagine languages/libraries implementing any of these possibilities with equal merit.

Resources