Remote Locking - multithreading

I am designing a remote threading primitive protocol. Currently we only needs mutexes (i.e. Monitors) and semaphores. The main idea is that there doesn't need to be a central authority - the primitives should be orchestrated amongst the peers that are interested in them.
I have bashed a few ideas around on paper and in my head for a few weeks; but I think I should really have a look at prior literature. Is there any?
It will run over XMPP - but that is an implementation detail. I am just looking for a specification or such on the actual protocol flow - so it doesn't really matter what protocol the literature originates from.
Thanks a million.

Distributed mutexes are tricky structures. You need to handle all sort of weird conditions that do not exist with a single machine based implementation. In particular you need to handle situations where agents lose comms with the group and still hold a lock on a shared resource. Additional to that scenario there are complex scenarios where your group is fractured and you grab a lock on a resource. When the fractured groups join into a big group you need some way to reconcile the locks which is far from trivial.
I would strongly recommend looking into some messaging based middleware such as Erlang and JBoss
I would also recommend posting separate questions on the particular distributed algorithms / data structures you need implemented. It's possible that you could get away with an out-of-the-box implementation in a middleware library that could be adjusted to meet your needs.

Related

Microservices dependence management - Governance or Domain Driven Design?

Background: an international company with a federation model is transforming into Microservices due to chronic Monolithic pain. Autonomous teams with quick deployment is highly desirable. In spite of theory, services are indeed dependent on each other for higher functionality, but are autonomous (independently developed and deployed). Since this is a federation model and decentralized control, we cannot impose strict rules - just like the UN. Without a governance platform that will manage dependencies else due to the multiple versions in production in different countries, we foresee uncontrollable chaos.
Let's call set of Microservices that needs to collaborate a "Compatibility Set". A service can be deployed but may not satisfy the higher functionality in its Compatibility Set. For example MicroService A-4.3 is fully autonomous, deployed and working perfectly. However to satisfy BusinessFunctionality 8.6 it must work together with MicroService B-5.4 and MicroService C-2.9. Together (A-4.3 , B-5.4 and C-2.9) they form a "Compatibility Set"
There are two approaches to this dilemma. Microservice in real life where the rubber hits the road and the learning from experience begins...
Approach 1) Governance Platform
Rationale: Federal model in an International company in 100+ countries. Which means Central IT can lay down the model but individual countries can choose their own destiny - and they frequently do. It frequently devolves to chaos and the Central IT team is on the hook. DDD is the solution for an ideal world where version inconsistencies do not derail functionality like releasing services which do not fit into the Compatibility set, individually blameless but together they fall apart or result in flawed or inconsistent functionality.
There is no homogeneity, there isn't even standardization of terminology
Developers are mixed skill, many junior, and many learning reactive programming and cloud native technologies
Bounded Context heavily depends on Shared Vocabulary and it can get subtle, but this is impossible to enforce and naive to assume in an International, mixed skill, fragmented scenario with multiple versions running
Standardization on a Single Business Model is not realistic in such a heterogeneous system (but ideal)
How what is Central IT to do when they're held responsible for this Chaos?
Enforce a Governance Platform
Create a Microservices governance system or framework to enforce dependency management. It verifies and enforces at design and run time dependencies on a particular Microservice through a manifest and performs some checks and balances to verify the service implementations being offered - the "Compatibility Set".
Approach 2) Domain Driven Design (DDD)
DDD is about modelling domains that are constantly evolving, where domain experts (typically a business stakeholder, or perhaps an analyst) will work alongside developers to design the system. Within each domain, a ubiquitous language is formed, such that within that context, the same word always means the same thing. An important thing to realise is that in one part of your system, “Order” might mean one thing, it might mean for example a list of products. In another part of your system, “Order” might mean something else, it might mean a financial transaction that happened. This is where the model you describe can fall down, if my service needs to get a list of orders, perhaps there is a capability out there that supplies a list of orders, but which orders are they? The list of products or the financial transaction? Trying to coordinate as many developers as you have to all use the same language here is an impossible task that is doomed to fail.
In DDD, rather than trying to manage this at a system level and force every service to use the same definition of Order, DDD embraces the inherent complexity in coordinating very large deployments with huge numbers of developers involved, and allows each team to work independently, coordinating with other teams as needed, not through some centralised dependency management system. The term used in DDD is bounded contexts, where in one context, Order means one thing, and in another bounded context, Order can mean another thing. These contexts can function truly autonomously – you describe your services as being autonomous, but if they have to match their definition of order with the entire system by registering and supplied dependencies to a central registry, then really they are tightly coupled to the rest of the system and what it considers an order to be – you end up with all the painful coupling of a monolith, with all the pain of building a distributed system, and you won’t realise many of the benefits of microservices if you try to take this approach.
So a DDD based approach doesn’t ever try to take a heavy handed approach of enforcing dependencies or capabilities at the system level, rather, it allows individual teams to work without needing central coordination, if Service A needs to interact with Service B, then the team who manages Service A will work with the team that manages service B, they can build an interface between their bounded contexts, and come to an agreement on language for that interface. It is up to these teams to manage their dependencies with each other, at the system level things can remain quite opaque / unenforced.
Too often we see people implement “Microservices” but end up with a system that is just as, if not more inflexible, and often more fragile, than a monolith. Also called a "Minilith" or "Monolith 2.0" Microservices require a complete rethink of architecture and software development processes, and require not just allowing services to be autonomous and independently managed, but also for teams to be independent, not centrally managed. Centralising the management of dependencies and capabilities in a system is likely to be an inhibitor to successfully building a microservice based system.
Intelligent and Pragmatic comments invited...
Approach 1 (Governance) is pragmatic and tactical and intended to solve very real challenges. Question is - will it undermine the long term strategic DDD model of the Enterprise?
Approach 2 (DDD) is ideal and aspirational but doesn't address the very real challenges that we have to deal with right now.
Opinions? Thought? Comments?
I've seen multi-national companies try to cooperate on a project (or be controlled from a central IT team) and it's a nightmare. This response is highly subjective to what I've personally read and seen, so it's just my opinion, it's probably not everyone's opinion. Generally broad questions aren't encouraged on Stack Overflow as they attract highly opinionated answers.
I'd say DDD probably isn't the answer. You'd need a large number of a developers to buy into the DDD idea. If you don't have that buy-in then (unless you have a team of exceptionally self-motivated people) you'll see the developers try to build the new system on-top of the existing database.
I'd also argue that microservices aren't the answer. Companies that have used microservices to their advantage are essentially using them to compartmentalise their code into small, stacks of individually running services/apps that each do a single job. These microservices (from the success stories I've seen) tend to be loosely coupled. I imagine that if you have a large number of services that are highly coupled, then you've still got the spaghetti aspects of a monolith, but one that's spread out over a network.
It sounds like you just need a well architected system, designed to your specific needs. I agree that using DDD would be great, but is it a realistic goal across a multi-national project?
I also dealt with the problem described in the question. And I came up with an approach in which I use API-definitions like OpenAPI-definitions to check compatibility between two services. The API-definitions must be attached as metadata to each service and therefore it is possible to do the check at run and design time. Important is that the API-definitions are part of the metadata as well when the API is offered and when the API is used. With tools like Swagger-Diff or OpenAPI-Diff it is possible to do the compatibility-check automated.

SCORM - RTE reporting mechanism security

I am investigating SCORM compliance as an option for a software project I am involved in. If this is too esoteric for SO, I am sorry - not sure where else to turn.
I am a little confused as to how the SCO (Sharable Content Object) reports a quiz score, for example, to the LMS. From what I can gather from the official documentation, this is to be done using using LMSSetValue function in the RTE API object, which is just a bunch of Javascript.
This seems wildly insecure to me, as it takes nothing to rewrite the values passed to the LMS this way.
My question is therefore, am I missing something? Are SCOs meant simply to not report such values to the LMS? It is my impression it is the only permitted mode of communication between SCOs and the LMS.
The JavaScript API is the way data is passed from the SCO to the LMS. Are there more secure ways to pass data? Sure. But the implementation is not brand-spanking new, remember. In addition, because of portability constraints, many of the most highly secure ways of passing data are not available to SCORM developers. Portability was the main priority of the standard, not security. There is a community of experts talking about what should replace SCORM. It's called Project Tin Can. And different ways of exchanging data, including cross-domain and server-side, are being discussed there.

Architectural strategies to minimize cloud lock-in risk?

I would like to understand what is the best way to mitigate risk of vendor lock-in for cloud-based systems.
For example, I'd like to deploy a multitude of different systems to, say, Amazon EC2 or Windows Azure, but I'd like to minimize the cost of migrating those systems to an alternative cloud vendor if/when necessary.
At the very least, it seems like the more I rely on vendor-specific solutions (like Amazon Queue Service), the more I'm inherently locked in (at least I think so), but I'd like to understand this risk better and any beyond it.
Are there architectural strategies I can use to mitigate this (e.g., rely on map reduce, since my scripts will be portable to another map reduce cloud env)? Are there O/S or stacks that are better than others (Linux, LAMP?). Is using JClouds helpful?
Ideally, I'd like to design virtual systems that can be deployed on EC2, for example, but then easily migrated to Azure or App Engine (or vice versa).
I generally write in Java, but am considering selective use of Scala and Python (or Jython) and am generally still trying to stay JVM-based. I tend to do a lot of parallel processing, and rely on both SQL and non-SQL (but not necessary NoSQL) storage and data manipulation technologies.
Thanks in advance. Hope I'm not being too unrealistic here.
In my opinion, the only architectural pattern to the problem you describe is: abstraction
Make sure to stick to using resources that are offered across various vendors,like storage, queue, etc. Create abstraction layers for each of them.
Hope this helps. I don't think its a super simple task to do, given the variability of the services across cloud providers
I agree with IgoreK - if you're doing this in code, it'll take a lot of abstraction, that's about it.
Another option is to take an IaaS cloud approach - design your application based on Virtual machine roles only. Most Cloud providers offer some form of Virtual machine role - Amazon, Azure, Rackspace etc. Migration then means far less code changes, but a bit more admin on your side.
Microsoft's Customer Advisory Team has an excellent sample on how to do that (I think I downloaded the project from here). There's a whole lot of code in it, and some really good abstractions to make things "free". Obviously, as with any abstraction, you also introduce a new layer of complexity so make sure you really all of it before applying it.
In most cases, less is more. And even though a lock-in is not something you want, it's probably not that hard to "fix" if the need arises. But ask yourself if it's important for that need to be satisfied now, or should you finish the project, and refactor later.
Honestly, your question is based on a bit of a false premise. You're looking to avoid lock-in rather than trying to take full advantage of the platform you've chosen to use.
The better way of approaching the issue is not to try to have your infrastructure be hot-swappable (e.g., avoid vendor lock-in), but to actually make a decision about the IaaS provider you want to use and leverage it as best as you possibly can.

What came before web services and SOA?

I'm interested in the history of distributed, collaborative, cross-organisational programming paradigms - web services and SOA are de-facto now, but what came before? What models have been superceded by SOA?
Thanks
Well, I suppose there was RPC - which is really what SOAP is, only they didn't piggy-back the data payload on top of a standard protocol (http in SOAP's case). So CORBA and DCE-RPC and ONC RPC all did the same thing, but only over internal networks, not over the internet.
There was also EDI as a 'standard' for exchanging data between disparate entities. This was effectively a way of defining what the data payload would look like (similar the the XML part of SOAP).
But these are still not SOAs really, they provide the same functionality but the big difference was how people thought of using them. Once you could write a machine-to-machine 'website' and have different machines talk to each other through them, it took off. You could do it before using CORBA, say, but it wasn't as easy or as widely known about. You can tell this has happened by the fact we have several terms used for effectively the same thing - SOA, SaaS, Web Services... all the same thing (but lots of money to be made 'consulting' on the difference ;) )
Maybe Silos?
...where services are just not shared across an enterprise, at least in a standard way. This is why products like BizTalk are used: to get silos to talk to each other via standard interfaces.
I don't really think you'll find anything that's been superceded by SOA. You will find that there's been progress in organizing computer programs to take advantage of the SOA type principles. As for programming models that have been in reasonably common use, well, let's see... CORBA, RPC, more generic client-server applications. Of course, computer-to-computer communications were preceded by process-to-process communication using a wide variety of conventions.
SOA as a philosophy of breaking large problems into smaller ones and then composing the results has been known and applied since humans started making bricks instead of building complete walls. Of course, that was mostly implicit. Explicit statements for SOA really started to come about with CORBA and, while SOA is independent of Web Services, the advent of HTTP and XML, and then SOAP, really started to make development of non-specialized "services" easier, more worthwhile and thus common.
This pdf A Note on Distributed Computing should be an interesting read. It is pre-SOA and would give an idea of the history up to that point (1994).
I would say distributed object technology. And before it remote procedure calls.
RPC is one of the earlier approaches and gained popularity from the Sun implementation. One of the famous uses is NFS (network file system).
As object oriented programming became more popular, distributed objects followed. Most important was Microsoft DCOM (and later COM+) and, more industry wide, CORBA.
SOA is a divide and conquer approach that is critically dependent on the concept of services. Which is different from objects as used by CORBA et al, as well as being different from resources as in REST.
Objects are created and their lifetime is typically controlled by the client. On the other hand, services are assumed to be always there provided by the server. This is one reason why SOA is not equivalent to distributed objects.
Services are also stateless, which means that the server when considering the response to a service request need not look at the history of interaction with the client. This was not a consideration when originally devising the RPC concept as scalability wasn't such an important issue then. Interestingly, large scale users of RPC did notice the relationship between scalability and statelessness. The NFS RFC explicitly mentions stateless servers, though with reliability as the main concern. Anyway, statelessness is one of the main difference between services and plain old RPC.
In short, no. I don't believe in the revisionist history of SOA being since the dawn of time. Any more than the universe being written in Lisp (or Perl for that matter). Nor is it equivalent to divide and conquer or division of labour.
SOA started as a concept at some point in the nineties. Overlapping with the development of CORBA. It is much harder to pinpoint an actual date or event and there are more than a few claims to the conceptualisation of it.

What are the best programmatic security controls and design patterns?

There's a lot of security advice out there to tell programmers what not to do. What in your opinion are the best practices that should be followed when coding for good security?
Please add your suggested security control / design pattern below. Suggested format is a bold headline summarising the idea, followed by a description and examples e.g.:
Deny by default
Deny everything that is not explicitly permitted...
Please vote up or comment with improvements rather than duplicating an existing answer. Please also put different patterns and controls in their own answer rather than adding an answer with your 3 or 4 preferred controls.
edit: I am making this a community wiki to encourage voting.
Principle of Least Privilege -- a process should only hold those privileges it actually needs, and should only hold those privileges for the shortest time necessary. So, for example, it's better to use sudo make install than to su to open a shell and then work as superuser.
All these ideas that people are listing (isolation, least privilege, white-listing) are tools.
But you first have to know what "security" means for your application. Often it means something like
Availability: The program will not fail to serve one client because another client submitted bad data.
Privacy: The program will not leak one user's data to another user
Isolation: The program will not interact with data the user did not intend it to.
Reviewability: The program obviously functions correctly -- a desirable property of a vote counter.
Trusted Path: The user knows which entity they are interacting with.
Once you know what security means for your application, then you can start designing around that.
One design practice that doesn't get mentioned as often as it should is Object Capabilities.
Many secure systems need to make authorizing decisions -- should this piece of code be able to access this file or open a socket to that machine.
Access Control Lists are one way to do that -- specify the files that can be accessed. Such systems though require a lot of maintenance overhead. They work for security agencies where people have clearances, and they work for databases where the company deploying the database hires a DB admin. But they work poorly for secure end-user software since the user often has neither the skills nor the inclination to keep lists up to date.
Object Capabilities solve this problem by piggy-backing access decisions on object references -- by using all the work that programmers already do in well-designed object-oriented systems to minimize the amount of authority any individual piece of code has. See CapDesk for an example of how this works in practice.
DARPA ran a secure systems design experiment called the DARPA Browser project which found that a system designed this way -- although it had the same rate of bugs as other Object Oriented systems -- had a far lower rate of exploitable vulnerabilities. Since the designers followed POLA using object capabilities, it was much harder for attackers to find a way to use a bug to compromise the system.
White listing
Opt in what you know you accept
(Yeah, I know, it's very similar to "deny by default", but I like to use positive thinking.)
Model threats before making security design decisions -- think about what possible threats there might be, and how likely they are. For, for example, someone stealing your computer is more likely with a laptop than with a desktop. Then worry about these more probable threats first.
Limit the "attack surface". Expose your system to the fewest attacks possible, via firewalls, limited access, etc.
Remember physical security. If someone can take your hard drive, that may be the most effective attack of all.
(I recall an intrusion red team exercise in which we showed up with a clipboard and an official-looking form, and walked away with the entire "secure" system.)
Encryption ≠ security.
Hire security professionals
Security is a specialized skill. Don't try to do it yourself. If you can't afford to contract out your security, then at least hire a professional to test your implementation.
Reuse proven code
Use proven encryption algorithms, cryptographic random number generators, hash functions, authentication schemes, access control systems, rather than rolling your own.
Design security in from the start
It's a lot easier to get security wrong when you're adding it to an existing system.
Isolation. Code should have strong isolation between, eg, processes in order that failures in one component can't easily compromise others.
Express risk and hazard in terms of cost. Money. It concentrates the mind wonderfully.
Well understanding of underlying assumptions on crypto building blocks can be important. E.g., stream ciphers such as RC4 are very useful but can be easily used to build an insecure system (i.e., WEP and alike).
If you encrypt your data for security, the highest risk data in your enterprise becomes your keys. Lose the keys, and data is lost; compromise the keys and all your data is compromised.
Use risk to make security decisions. Once you determine the probability of different threats, then consider the harm that each could do. Risk is, by definition
R = Pe × H
where Pe is the probability of the undersired event, and H is the hazard, or the amount of harm that could come from the undesired event.
Separate concerns. Architect your system and design your code so that security-critical components can be kept together.
KISS (Keep It Simple, Stupid)
If you need to make a very convoluted and difficult to follow argument as to why your system is secure, then it probably isn't secure.
Formal security designs sometimes refer to a thing called the TCB (Trusted Computing Base). But even an informal design has something like this - the security enforcing part of your code, the part you can't avoid relying on. This needs to be well encapsulated and as simple and small as possible.

Resources