Is an object a data structure?

Is an object a data structure? - object

I know things like arrays, linked lists, etc are data structures...but what about objects?
Like say if I create an object say employee and it stores and keeps track of the employees name, salary, phone number...etc etc.

It's a question of semantics really, and I expect it could be argued either way.
When you create an object, you are telling a compiler, or an interpreter to store a group of information together. Usually the interpreter/compiler, will use some type of data structure to store that information (Python uses a hash table for example).
I might call that data structure the object if I pointed it out in a hex dump, but that's just because saying 'the bytes that represents the object' is a bit inconvenient.
You could (and maybe someone has) write a compiler that stores many objects in one data structure. In that case there would be no one to one mapping between object and data structure. So for that reason - I'm going to say no, an object is not a data structure, but it is normally stored in one.

Interestingly, before the OOP paradigm came into vogue, languages like C used struct's (structures). Which, in C is simply a contiguous block of virtual memory with named offsets (members). Although that does get more complex with things like unions, etc.
But essentially a structure, as #James pointed out, is exactly that - some collection or grouping of related things together in a way a programmer (or mad scientist) feels is logical.
In the modern programming lexicon, with languages such as Java, C#, etc - your Objects usually represent one real-world thing, such as a Customer, Order, < inset over-used object example here>, etc. While your data-structures (usually collections in the libraries of the languages I mentioned) represent the containment of multiple "Objects".
Strictly speaking, however (and just to confuse everyone), the data structures in languages like Java and C# are objects! (i.e. they are referenced passed around, can have methods called on them, etc.)
In a more classical (and perhaps more CS derived) sense, "data structures" are collections of behaviours (typically algorithms) that are used to manage the memory that stores data, while "objects" are that data.

Related

(Beginner) Why does the temporary variable change in example 1, but not in example 2? [duplicate]

I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?

Well, there are a few aspects to this.
Mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with a value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");
p.setName("Daniel");
map.get(p); // => null
The Person instance gets "lost" in the map when used as a key because its hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.

Immutable Objects vs. Immutable Collections
One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.
An immutable collection is a collection that never changes.
When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.
When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.
Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.
Immutable vs. Mutable variables/references
Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.
In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declared with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more C-like or Java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in C.
Ease of Development vs. Performance
Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.
The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.
The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. This large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.

Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.
You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.
Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.
Greatest advantage: immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.
Disadvantage: when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic and do a lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance if the interface is not designed to accommodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations, this could lead to massive memory consumption and performance degradation.
So immutable objects are definitely my primary way of thinking about object-oriented design, but they are not a dogma.
They solve a lot of problems for clients of objects, but also create many, especially for the implementers.

Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:
immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache

You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".

A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.
I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.

Shortly:
Mutable instance is passed by reference.
Immutable instance is passed by value.
Abstract example. Lets suppose that there exists a file named txtfile on my HDD. Now, when you are asking me to give you the txtfile file, I can do it in the following two modes:
I can create a shortcut to the txtfile and pass shortcut to you, or
I can do a full copy of the txtfile file and pass copied file to you.
In the first mode, the returned file represents a mutable file, because any change into the shortcut file will be reflected into the original one as well, and vice versa.
In the second mode, the returned file represents an immutable file, because any change into the copied file will not be reflected into the original one, and vice versa.

If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:
A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.
In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.

Mutable collections are in general faster than their immutable counterparts when used for in-place
operations.
However, mutability comes at a cost: you need to be much more careful sharing them between
different parts of your program.
It is easy to create bugs where a shared mutable collection is updated
unexpectedly, forcing you to hunt down which line in a large codebase is performing the unwanted update.
A common approach is to use mutable collections locally within a function or private to a class where there
is a performance bottleneck, but to use immutable collections elsewhere where speed is less of a concern.
That gives you the high performance of mutable collections where it matters most, while not sacrificing
the safety that immutable collections give you throughout the bulk of your application logic.

If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.

Immutable means can't be changed, and mutable means you can change.
Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.
Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.
A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.

General Mutable vs Immutable
Unmodifiable - is a wrapper around modifiable. It guarantees that it can not be changed directly(but it is possibly using backing object)
Immutable - state of which can not be changed after creation. Object is immutable when all its fields are immutable. It is a next step of Unmodifiable object
Thread safe
The main advantage of Immutable object is that it is a naturally for concurrent environment. The biggest problem in concurrency is shared resource which can be changed any of thread. But if an object is immutable it is read-only which is thread safe operation. Any modification of an original immutable object return a copy
source of truth, side-effects free
As a developer you are completely sure that immutable object's state can not be changed from any place(on purpose or not). For example if a consumer uses immutable object he is able to use an original immutable object
compile optimisation
Improve performance
Disadvantage:
Copying of object is more heavy operation than changing a mutable object, that is why it has some performance footprint
To create an immutable object you should use:
1. Language level
Each language contains tools to help you with it. For example:
Java has final and primitives
Swift has let and struct[About].
Language defines a type of variable. For example:
Java has primitive and reference type,
Swift has value and reference type[About].
For immutable object more convenient is primitives and value type which make a copy by default. As for reference type it is more difficult(because you are able to change object's state out of it) but possible. For example you can use clone pattern on a developer level to make a deep(instead of shallow) copy.
2. Developer level
As a developer you should not provide an interface for changing state
[Swift] and [Java] immutable collection

Why primitives are not reference type?

I was reading about value and reference types and a question i couldn't find a clear answer was why primitives like int/double, etc are not reference types, like strings for example.
I know strings/arrays/other objects can be pretty big compared to ints (which i saw was the primary pro of reference), so the only reason not to make those primitives reference type would be because it would be an over-kill?

This is only the case in some programming languages, and this is typically done as an optimization (in order to avoid the need to perform memory dereferences or allocations for such simple types). However, there are languages that make basic numeric types and programmer-defined objects look and behave identically, sometimes selecting between a true object and a simple object automatically in the compiler or interpreter to maintain efficiency when the object-like capabilities are not used.
Python and Scala are examples where basic integers and regular objects are indistinguishable; Java, C++, and C are examples where builtin types are distinct from programmer-defined types.

How to achieve data polymorphism for multiple external formats in Haskell?

I need to process multiple formats and versions for semantically equivalent data.
I can generate Haskell data types for each schema (XSD for example), they will be technically different, but semantically and structurally identical in many cases.
The data is complex, includes references, and service components must process whole graph and produce also similar response (a component might just update a field, but might need to analyze whole graph to collect all required information, might call other services as well).
How can I represent ns1:address and ns2:adress as one polymorphic type that has country and street elements and application needs process them as identical, but keeps serialization context for writing response in correct format (one representation might encode them in single string while other might carry also superfluous complex data)?
How close can I get to writing mostly code that defines semantic equivalence of data, business logic and generate all else? What features in Haskell language or libraries should I evaluate as building blocks for potential solution?

An option is to create a data type for each schema and create a function to map them to a common data type. Process it as you wish. You don't need to create polymorphic types.
This approach is similar to Pandoc's: you get a bunch of readers to parse documents to a common document structure, then use writers to convert that common structure to a particular format.
You just need the libraries to read your complex input data (and write it back, if necessary). The rest is functions and data types.
If you are really handling graphs, you can look at the Data.Graph module.

It sounds like this is a problems that is well served by the Type Class infrastructure, and the Lens library.
Use a Type Class to define a standard and consistent high-level interface to the various implementations. Make sure that you focus on the operations you wish to perform, not on the underlying implementation or process.
Use Lenses and Prisms to reach into the underlying datatypes and return answers to queries, and modify values "in-place", without resorting to full serialisation/de-serialisation.

Is it possible, in any language, to implement rules that will affect every instance of an object?

For example, could I implement a rule that would change every string that followed the pattern '1..4' into the array [1,2,3,4]? In JavaScript:
//here you create a rule that changes every string that matches /$([0-9]+)_([0-9]+)*/
//ever created into range($1,$2) (imagine a b are the results of the regexp)
var a = '1..4';
console.log(a);
>> output: [1,2,3,4];
Of course, I'm pretty confident that would be impossible in most languages. My question is: is there any language in which that would be possible? Or have anyone ever proposed something like that? Does this thing have a 'name' for which I can google to read more about?

Modifying the language from whithin itself falls under the umbrell of reflection and metaprogramming. It is referred as behavioral reflection. It differs from structural reflection that opperates at the level of the application (e.g. classes, methods) and not the language level. Support for behavioral reflection varies greatly across languages.
We can broadly categorize language changes in two categories:
changes that modify the semantics (i.e. the rules) of the language itself (e.g. redefine the method lookup algorithm),
changes that modify the syntax (e.g. your syntax '1..4' to create arrays).
For case 1, certain languages expose the structure of the application (structural reflection) and the inner working of their implementation (behavioral reflection) to the application itself via special object, called meta-objects. Meta-objects are reifications of otherwise implicit aspects, that become then explicitely manipulable: the application can modify the meta-objects to redefine part of its structure, or part of the language. When it comes to langauge changes, the focus is usually on modifiying message sending / method invocation since it is the core mechanism of object-oriented language. But the same idea could be applied to expose other aspects of the language, e.g. field accesses, synchronization primitives, foreach enumeration, etc. depending on the language.
For case 2, the program must be representated in a suitable data structure to be modified. For languages of the lisp family, the program manipulates lists, and the program can be itself represented as lists. This is called homoiconicity and is handy for metaprogramming, hence the flexibility of lisp-like languages. For other languages, their representation is usually an AST. Transforming the representation of the program, or rewriting it, is possible with macro, preprocessors, or hooks during compilation or class loading.
The line between 1 and 2 is however blurry. Syntactical changes can appear to modify the semantics of the language. For instance, I can rewrite all fields accesses with proper getter and setter and perform additional logic there, say to implement transactional memory. Did I perform a semantical change of what a field access is, or merely a syntax change?
Also, there are other constructs the fall bewten the lines. For instance, proxies and #doesNotUnderstand trap are popular techniques to simulate the reification of message sends to some extent.
Lisp and Smalltalk have been very influencial in the field of metaprogramming, and I think the two following projects/platform are interesting to look at for a representative of each of these:
Racket, a lisp-like language focused on growing languages from within the langauge
Helvetia, a Smalltalk extension to embed new languages into the host language by leveraging the AST of the host environment.
I hope you enjoyed this even if I did not really address your question ;)
Your desired change require modifying the way literals are created. This is AFAIK not usually exposed to the application. The closed work that I can think of is Virtual Values for Language Extension, that tackled Javascript.

Yes. Common Lisp (and certain other lisps) have "reader macros" which allow the user to reprogram (incrementally) the mapping between the input stream and the actual language construct as parsed.
See http://dorophone.blogspot.com/2008/03/common-lisp-reader-macros-simple.html
If you want to operate on the level of objects, you will want to use a debugging/memory management framework that keeps track of all objects, and processes the rules on each evaluation step (nasty). This seems like the kind of thing you might be able to shoehorn into smalltalk.

CLOS (Common Lisp Object System) allows redefinition of live objects.
Ultimately you need two things to implement this:
Access to the running system's AST (Abstract Syntax Tree), and
Access to the running system's objects.
You'll want to study meta-object protocols and the languages that use them, then the implementations of both the MOPs and the environment within which these programs are executed.
Image-based systems will be the easiest to modify (e.g., Lisp, potentially Smalltalk).
(Image-based systems store a snapshot of a running system, allowing complete shutdown and restarts, redefinitions, etc. of a complete environment, including existing objects, and their definitions.)

Ruby allows you to extend classes. For instance, this example adds functionality to the String class. But you can do more than add methods to classes. You can also overwrite methods, but defining a method that's already been defined. You may want to preserve access to the original method using alias_method.
Putting all this together, you can overload a constructor in Ruby, but in your case, there's a catch: It sounds like you want the constructor to return a different type. Constructors by definition return instances of their class. If you just want it to return the string "[1,2,3,4]", that's simple enough:
class string
alias_method :initialize :old_constructor
def initialize
old_constructor
# code that applies your transformation
end
end
But there's no way to make it return an Array if that's what you want.

DDD: what's the use of the difference between entities and value objects?

Entities and value objects are both domain objects. What's the use of knowing the distinction between the two in DDD? Eg does thinking about domain objects as being either an entity or value object foster a cleaner domain model?

Yes, it is very helpful to be able to tell the difference, particularly when you are designing and implementing your types.
One of the main differences is when it comes to dealing with equality, since Entities should have quite different behavior than Value Objects. Knowing whether your object is an Entity or a Value Object tells you how you should implement equality for the type. This is helpful in itself, but it doesn't stop there.
Entities are mutable types (at least by concept). The whole idea behind an Entity is that it represents a Domain concept with a known lifetime progression (i.e. it is created, it undergoes several transformations, it is archived and perhaps eventually deleted). It represents the same particular 'thing' even if months or years pass by, and it changes state along the way.
Value Objects, on the other hand, simply represent values without any inherent identity. Although you don't have to do this, they lend themselves tremendously well to be implemented as immutable types. This is very interesting because any immutable type is by definition thread-safe. As we are moving into the multi-core age, knowing when to implement an object as an immutable type is very valuable.
It also helps a lot in unit testing when the equality semantics are well-known. In both cases, equality is well-defined. I don't know what language you use, but in many languages (C#, Java, VB.NET) equality is determined by reference by default, which in many cases isn't particularly useful.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string