shared resources between scripts

shared resources between scripts - terraform

Sorry for the noob question... I'm trying to figure out a way to have shared resources between my tf scripts, but I can't find anything, so probably I'm looking for the wrong keywords...
Let's say I have 3 scripts:
base/base.tf
one/one.tf
two/two.tf
base creates an aws vpc and a network load balancer
one and two are two ecs fargate services. they create the task definition and add the mappind to the network load balancer.
My goal is to have something to keep track of the mapped port in the load balancer and read it and update from one and two.
Something like
base sets last_port to 14000
one reads last_port, increases by 1 and updates the value
two reads last_port, increases by 1 and updates the value
Is it possible at all ?
thanks

The general solution to this problem in Terraform is Data Sources, which are special resources that retrieve data from elsewhere rather than creating and managing objects themselves.
In order to use data sources, the data you wish to share must be published somewhere. For sharing between Terraform configurations, you need a data storage location that can be both written to and read from by Terraform.
Since you mentioned ECS Fargate I will assume you're using AWS, in which case a reasonable choice is to store your data in AWS SSM Parameter Store and then have other configurations read it out.
The configuration that creates the data would use the aws_ssm_parameter resource type to create a new parameter:
resource "aws_ssm_parameter" "foo" {
name = "port_number"
type = "String"
value = aws_lb_listener.example.port
}
The configurations that will make use of this data can then read it using the corresponding data source:
data "aws_ssm_parameter" "foo" {
name = "port_number"
}
However, your question talks about the possibility of one configuration reading the value, incrementing it, and writing the new value back into the same place. That is not possible with Terraform because Terraform is a declarative system that works with descriptions of a static desired state. Only one configuration can be managing each object, though many configurations can read an an object.
Instead of dynamically allocating port numbers then, Terraform will require one of two solutions:
Use some other system to manage the port allocations persistently such that once a port number is allocated for a particular caller it will always get the same port number. I don't know of any existing system that is built for this, so this may not be a tenable option in this case, but if such a system did exist then we'd model it in Terraform with a resource type representing a particular port allocation, which Terraform can eventually destroy along with all of the other infrastructure when the port is no longer needed.
Decide on a systematic way to assign consistent port numbers to each system such that each configuration can just know (either by hard-coding or by some deterministic calculation) which port number it should use.

Related

Is there a way to listen for operations / change events at the container level rather than the individual DDS level in the Fluid Framework?

Scenario:
I have a service running that is keeping a global search or query index up to date for all containers in my “system”. This service is notified any time a container is opened by a client and opens its own reference to that container to listen for changes to content in that container so it can update the “global container index” storage. The container is potentially large and partitioned into may individual DDS entities, and I would like to avoid loading every DDS in the container in order to listen for changes in each of those DDS’s.
Ideally I would be able to listen for any “operations / changes” at the container level and dynamically load the impacted DDS to be able to transcribe the information that was updated into this external index storage.

I originally left this as a comment to SamBroner's response but it got too long.
The ContainerRuntime raises an "op" event for every op, so you can listen to that to implement something similar to #1. This is missing in the docs currently so it's not obvious.
I think interpreting ops without loading the DDS code itself might be possible for DDSes with simpler merge logic, like SharedMap, but very challenging for SharedSequence, for example.
I guess it depends on the granularity of information you're trying to glean from the ops with general purpose code. Knowing just that a DDS was edited may be feasible, but knowing its resulting state... more difficult.

There are actually two questions here: 1. How do I listen to container-level operations? 2. How do I load just one DDS?
How do I listen to operations?
This is certainly possible, but is not included as a capability in the reference implementation of the service. There are multiple ways of architect-ing a solution to this.
Use the EventEmitter on the Container object itself. The sequencedDocumentMessage will have a type property. When type === "op", the message contents will include metadata about a change to data. You can use the below to get a feel for this.
const container = await getTinyliciousContainer(documentId, DiceRollerContainerRuntimeFactory, createNew);
if (sequenceDocumentMessage.type === "op") {
console.log(sequenceDocumentMessage.contents)
}
If you're looking for all of the message types and message interfaces, the enum and the generic interface for ISequenceDocumentMessage are both here.
Listen directly to the Total Order Broadcast with a bespoke lambda
If you're running the reference implementation of Fluid, you can just add a new lambda that is directly listening to Kafka (default Total Order Broadcast) and doing this job. The lambdas that are already running are located here: server/routerlicious/packages/lambdas. Deli is actually doing a fairly similar job already by listening into and labeling operations from Kafka by their Container.
Use the Foreman "lambda" in R11S specifically for spawning jobs
I'd prefer an architecture where the lambda is actually just a "job runner". This would give you something along the lines of "Fluid Lambdas" where these lambdas can just react to operations coming off the Kafka Stream. Functionality like this is included, but poorly documented or tested in the Foreman lambda
Critically, listening to just the Ops is not a very good way to know the current state of a Distributed Data Structure. The Distributed Data Structures manage the merging of new operations into the state. Therefore easiest way to get the current state of DDS is to load the DDS.
How do I load just one DDS?
This is actually fairly straightforward, if not well documented. You'll need to provide a requestHandler that can fetch just the DDS from the container. Ultimately, the Container does have the ability to virtualize everything that isn't requested. You'll need to load the container, but just request the specific DDS.
In pseudocode...
const container = loader.resolve("fluid://URIToContainer");
const dds = await container.request("/ddspaths/uuid");
dds.getIndexData();

Store Terraform "data consul_keys ..." as file?

In a base.tf file I have:
data "consul_keys" "project_emails"{
datacenter = "mine1"
key {
name = "notification_list"
path = "project/notification_email_list"
}
}
I would like to use these consul variables in my python code.
The way I'm thinking about this is by outputting this to a file. (so not just another terraform file using the "${project_emails.notification_list.construct}" with either version 11 or 12.).
How would I save all these keys to a file to access the keys?

The general mechanism for exporting data from a Terraform configuration is Output Values.
You can define an output value that passes out the value read from Consul like this:
output "project_emails" {
value = data.consul_keys.project_emails.var.notification_list
}
After you've run terraform apply to perform the operations in your configuration, you can use the terraform output command to retrieve the output values from the root module. Because you want to read it from another program, you'll probably want to retrieve the outputs in JSON format:
terraform output -json
You can either arrange for your program to run that command itself, or redirect the output from that command to a static file on disk first and then have your program read that file.
The above assumes that the Python code you mention will run as part of your provisioning process on the same machine where you run Terraform. If instead you are asking about access to those settings from software running on a virtual machine provisioned by Terraform, you could use the mechanism provided by your cloud platform to pass user data to your instance. The details of that vary by provider.
For long-lived applications that consume data from Consul, a more common solution is to run consul-template on your virtual server and have it access Consul directly. An advantage of this approach is that if the data changes in Consul then consul-template can recognize that and update the template file immediately, restarting your program if necessary. Terraform can only read from Consul at the time you run terraform plan or terraform apply, so it cannot react automatically to changes like consul-template can.

Why is the partitioning strategy for a service fabric service tied to the partition instead of to the service?

I am just getting started writing some dynamic endpoint discovery for my Service Fabric application and was looking for examples as to how to resolve service endpoints. I found the following code example on stackoverflow:
https://stackoverflow.com/a/38562986/4787510
I did some minor variations to this, so here is my code:
private readonly FabricClient m_fabricClient
public async Task RefreshEndpointList()
{
var appList = await m_fabricClient.QueryManager.GetApplicationListAsync();
var app = appList.Single(x => x.ApplicationName.ToString().Contains("<MyFabricDeploymentName>"));
// Go through all running services
foreach (var service in await m_fabricClient.QueryManager.GetServiceListAsync(app.ApplicationName))
{
var partitions = await m_fabricClient.QueryManager.GetPartitionListAsync(service.ServiceName);
// Go through all partitions
foreach (var partition in partitions)
{
// Check what kind of service we have - depending on that the resolver figures out the endpoints.
// E.g. Singleton is easy as it is just one endpoint, otherwise we need some load balancing later on
ServicePartitionKey key;
switch (partition.PartitionInformation.Kind)
{
case ServicePartitionKind.Singleton:
key = ServicePartitionKey.Singleton;
break;
case ServicePartitionKind.Int64Range:
var longKey = (Int64RangePartitionInformation)partition.PartitionInformation;
key = new ServicePartitionKey(longKey.LowKey);
break;
case ServicePartitionKind.Named:
var namedKey = (NamedPartitionInformation)partition.PartitionInformation;
key = new ServicePartitionKey(namedKey.Name);
break;
default:
throw new ArgumentOutOfRangeException($"Can't resolve partition kind for partition with id {partition.PartitionInformation.Id}");
}
var resolvedServicePartition = await ServicePartitionResolver.GetDefault().ResolveAsync(service.ServiceName, key, CancellationToken.None);
m_endpointCache.PutItem(service.ServiceTypeName, new ServiceDetail(service.ServiceTypeName, service.ServiceKind, ServicePartitionKind.Int64Range, resolvedServicePartition.Endpoints));
}
}
}
}
I'm quite happy I found this snippet, but while working through it, I found one thing where I am getting a little bit confused.
So, after reading through the SF docs, this seems to be the architecture it follows from top to bottom as far as I understood it:
Service Fabric Cluster -> Service Fabric application (E.g. myApp_Fabric) -> Services (E.g, frontend service, profile picture microservice, backend services)
From the services we can drill down to partitions, while a partition basically resembles a "container" on a node in my cluster on which multiple instances (replicas) can reside, instances being actual deployments of a service.
I'm not quite sure if I got the node / partition / replica difference right, though.
However, back to my confusion and actual question:
Why is the information regarding partition strategy (singleton, intRange, named) attached to the partitioninformation, rather than the service itself? As far as I understood it, a partition is basically the product of how I configured my service to be distributed across the service fabric nodes.
So, why is a partitionstrategy not directly tied to a service?

Regarding the services in Service Fabric, there are two types: stateful services and stateless services.
Stateless services do not deal with state using the reliable collections. If they need to maintain state they have to rely on external persistency solutions like databases etc. Since they do not deal with state provided by reliable collections they get assigned a Singelton Partition type.
Stateful services have the ability to store state in reliable collections. In order to be able to scale those services the data in those collections should be divided over partitions. Each service instance is assigned a specific partition. The amount of partitions is specified per service, like in the example below:
<Service Name="Processing">
<StatefulService ServiceTypeName="ProcessingType" TargetReplicaSetSize="3" MinReplicaSetSize="3">
<UniformInt64Partition PartitionCount="26" LowKey="0" HighKey="25" />
</StatefulService>
</Service>
So, given the example above, I do not understand your last remark about the partition strategy not being directly tied to a service.
Given the situation above, there will be 26 instances of that service running, one for each partition, multiplied by the number of replicas.
In case of a stateless services, there will be just one partition (the singleton partition) so the number of actual instances is 1 * 3 (the replica count) = 3. (3 replicas is just an example. Most times the instance count of a stateless service is set to -1, meaning 1 instance for every node in the cluster.)
One other thing: in your code you have a comment line in the piece of code iteration the partitions:
// E.g. Singleton is easy as it is just one endpoint, otherwise we need some load balancing later on
This comment is wrong stating that the partitioning has to do with load balancing. It is not, it has to do with how data is partitioned over the service instances and you need to get the address of the service that deals with a specific partition. Say I have a service with 26 partitions and I want to get data that is stored in, let's say, the 5th partition. I then need to get the endpoint for the instance that serves that partition.
You probably already read the docs. If not, I suggest reading it as well.
Addressing your comments:
I was just wondering, is it not possible that multiple services run on the same partition?
Reliable collections are coupled to the service using them, so are the underlying partitions. Hence, not more than one service can run on the same partition.
But, service instances can. If a service has a replica size of, let's say, 3, there will be 3 instances serving that partition. But only 1 is the primary instance, reading and writing the data that gets replicated to the secondary instances.

Imagine your service like a pizza, when you request a pizza, you request the flavor of the pizza(type of service), you generally don't specify how you want that pizza sliced(i.e: 8 pieces), generally the pizzeria handles that for you and some might come sliced in 4, 8 or more depending on the size of the pizza.
When you create an instance of the service, you can see in a similar way, you need a service, this service will hold your data and you shouldn't care how the data is stored.
As a consumer, when you need to understand the partitioning of your service, is like you call the pizzeria and ask them to cut the pizza in 4 slices, instead of 8, you still get the same pizza, but now your concern is how many pieces it will be sliced. The main problem with service partitioning, is that many applications designs leak this partitioning to the client, and the client need to be aware of how many partitions it has or where they are placed before consuming it.
You shouldn't care about Service Partitioning as a consumer, but should as a provider(pizzeria), let's say you order a big pizza and the pizzeria run out of boxes(node) to put the pizza, they can split the pizza in two small boxes. In the end the consumer receive the same pizza, but in separate boxes and will have to handle it to find the slices in it.
With this analogy, we can see the comparison as:
Flavor = Service Type
Pizza = Service
Size and How is sliced = Partition Scheme
Slice = Partition
Box = Node
Number of Pizzas = Replicas
In Service Fabric, the reason to have it decoupled, is because the consumer can ask for a service and the provider can decide how it want to partition it, in most cases, the partitions are statically defined at application creation, but it could be dynamic, as seem in the UniformInt64Partition, you can define how many partitions you need for a specific service instance, you could have multiple instances of the same service with different partitions or different schemes without changing a line of code. How you will expose these partitions to the client, is an implementation detail.

Preventing duplicate entries in Multi Instance Application Environment

I am writing an application to serve facebook APIs; share, like etc.. I am keeping all those shared objects from my appliction in a database and I do not want to share the same object if it already been shared.
Considering I will deploy application on different servers there could be a case where both instance tries to insert the same object to table.
How can I manage this concurrency problem with blocking the applications fully ? I mean two threads will try to insert same object and they must sync but they should not block a 3rd thread where it is inserting totally different object.

If there's a way to derive primary key of data entry from data itself, database will resolve such concurrency issue by itself -- 2nd insert will fail with 'Primary Key constraint violation'. Perhaps, data supplied by Facebook API already have some unique ID?
Or, you can consider some distributed lock solution, for example, based on Hazelcast or on similar data grid. This would allow to have record state shared by different JVMs, so it will be possible to avoid unneeded INSERTS.

ExpressJS as an Orchestration layer?

How would one use ExpressJS as an orchestration layer?
I have five NodeJS / ExpressJS "API applications" for different business functions (security, human resources, asset management, fleet management, etc.). Each provides raw object / document APIs and has its own database, app server, routes, etc.. I would like to build ANOTHER ExpressJS application to sit IN FRONT OF those five "stacks" and provide higher-level business operations (ie, TerminateEmployee, etc.) by funneling multiple calls into the other five stacks via REST.
Am I insane? Is this common? Maybe I don't know what to search for, but I'm not finding any examples of doing this.
BTW: I'm also thinking of building highly-reusable "widgets" (basically, individual AngularJS services and UI elements) to call into that sixth front-end stack.

Whoa, an old question left unanswered.
SOA, Microservice, or whatever the name is only the abstract thought of system architecture, they need to be applied, now let us define the problem
Problem: need orchestration for middleware
Parameter: we need to define what middleware need to be run (maybe a String for service name and Number for service port) and this value need to be passed as CLI args or process.env
First we need to store our parameter in JS readable format, could be an Array or Object, in my case i need to run multiple service at one port and the app need to be exposed in multiple port eg:
from String
graphql:80 auth:5000 usercrud:5000 trx:6969 docs:4000
to JSON
{
"80":["graphql"],
"5000":["auth", "usercrud"],
"6969":["trx"],
"7070":["ssr"],
"4000":["docs"]
}
we can make CLI accept args by using the popular yargs, but also need to be able read from string passed from .env file, when parsing a string this is quite easy since we can control the format of the string by simply using String.split() function we can make an Array, i prefer using space when separating context (in this case a middleware/service) followed by : and port of this particular service, like auth:5000 and then from this Array we can map the value to an Object (note "5000":["auth", "usercrud"] is sharing the same port so we need to accommodate this two service).
from this config we can iterate the keys which is reflected by its port, so doing Object.keys(service) return an Iterable/Array that we can Map/ForEach according to modern EcmaScript standard. In each iteration we make
app.use(require(`./service/${service[port]}`))
http.createServer(app)
.listen(port, () =>
log(`http://127.0.0.1:${port} >>> ${service[port]}`)
)
usually when we http.createServer(app) we do this once, but now we do it as many as the port we need to exposed, this increased atomicity and decrease dependency inter services.
advantage of this approach is:
can share library / helper
single codebase & consistency
single container development / staging / integration test
controlled service based on resource consumption (not only task, if one of the service is idle / consuming very little resource we can join it with the other service)
up & running source: https://github.com/nsnull0/eService/blob/master/packages/%40nodejs-express/index.js

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string