Difference between Data source and Output block in terraform - terraform

I am not able to figure out what's the difference between data source block and output block in terms of functionality because both are used for getting information about that resource from the console like id, public_ip etc. Can anyone please help me in understanding this because I could'nt find out a suitable resource for this
I have tried to search online for this difference but couldnt find the actual answer.

data essentially represents a dependency on an object that isn't managed by the current Terraform configuration but the current Terraform configuration still needs to make use of it. Mechanically that typically means making a Get or Read request to a specific API endpoint and then exporting the data from that API response in the resulting attributes.
output represents is one of the two ways that data can flow from one module into another. variable blocks represent data moving from the parent module into the child module, and output blocks represent data moving from the child module out to the parent.
There is no strong relationship between these two concepts but one way they sometimes connect is if you use the tfe_outputs data source belonging to the hashicorp/tfe provider, or if you use the terraform_remote_state data source from the terraform.io/builtin/terraform provider. Both of those data sources treat the output values from the root module of some other Terraform configuration as the external object to fetch, and so you can use these as one way to use the results from one configuration as part of another configuration, as long as the second configuration will be run in a context that has access to the state of the first.

Related

Can Terraform mention multiple module instances in one resource?

In Terraform is there some way to refer to the set of instances of a module?
I have several workloads that each need very similar (but still separate) infrastructure, and I also want to configure another item of infrastructure to be used in common between them.
For example, say each needs several pieces of infrastructure (AWS S3 bucket, SQS queue, and IAM role..) but with mostly equivalent attributes. I want to achieve this without code duplication (e.g., by writing a Terraform module to be reused for each instance, with input variables for name prefixes and specific IAM policies).
Is there a Terraform syntax for then making a reference to all of those instances in a single resource, and with minimal boilerplate? (Maybe something analogous to a classmethod, to define a common resource to only be created once, even if multiple instances of that module get created?) For example, to make a shared kubernetes config-map that contains an index of the generated addresses (bucket names or SQS URLs), as a mechanism for passing those addresses to the containerised workloads that will use them? Another example might be setting up a single load balancer or DNS server with rules referring individually to every service from this group.
Or does the problem have to be approached in the other direction, by starting with a list of parameter sets, and somehow looping over that list to create the infrastructure? (Requiring every instance of this kind to be specified together in the same file?)
The Terraform terminology for modularity is a child module that is called from multiple places in the configuration. The call uses the module block (where parameter values are passed). Since modules are directories, the child module will be defined somewhere outside of the configuration root module directory tree. The calling module can access output values exported from the child module.
You can use a single module call to create multiple instances, by putting a for_each argument in the module block, and passing a map (or set) through this meta-argument. The other argument expressions in the block can use the each object to refer to the particular for_each element corresponding to the current instance, as Terraform iterates through them. From outside of the module block (elsewhere in the calling module), the output values can themselves be accessed like a map.
You can use [for .. in .. : .. if ..] expressions to filter and transform maps. (Some transformations can also be performed more concisely by splat expressions.) Some resources (such as kubernetes_config_map) have arguments that take maps directly. (Also, some data or resources can accept a sequence of nested blocks, which can be generated from a map by using dynamic block syntax.)
Note, do not use the older feature count as an alternative to for_each. Otherwise, there is a documented tendency to produce unintended destruction and recreation of other infrastructure if one element of the list has to be decommissioned. Similarly, passing a list-derived set, instead of a map, to for_each can make indexing more cumbersome.
Thus for the OP's example, the approach would be to first create a parameterised child module defining the nearly-duplicated parts of the infrastructure (e.g., bucket, queue, role, etc). Make sure the child module has either inputs or outputs for any aspect that needs customisation (for example, output a handle for the created bucket resource, or at least for its auto-generated globally-unique name). Then have a module in your configuration that creates your whole collection of instances (by a single module block that sources the child module and using for_each). The customisation of individual instances (e.g., some having additional policies or other specific infrastructure) can be achieved by a combination of the parameter sets initially passed to the call, and supplementary resource blocks that each refer to outputs from an individual instance. Furthermore, the outputs can also be referred to collectively, for example parsing the module call outputs to form a list of bucket names (or addresses) and then passing this list to another resource (i.e., to a k8s config map). Again this must be done from the calling module; the child module does not have access to a list of instances of itself.

How should Terraform provider handle resource error when it consists of multiple entities?

NOTE: I'm using the v2 SDK.
In my provider my 'resource' isn't a single API call.
My resource is actually multiple 'things'.
For example...
resource "my_resource" "example" {
foo {
...
}
bar {
...
}
baz {
...
}
}
The resource and each of the nested blocks are all separate 'things' that each have their own API calls.
So when 'creating' this resource I need to actually make multiple API calls. One API call to create the resource itself, then I need to make an API call to create a 'foo', then another API for 'bar', 'baz' etc. Finally, once those nested things are created I need to call my API one last time to activate my main resource.
The problem I've found is that if there's an error in the creation of one of the nested blocks, I'm finding the state is getting messed up and reflecting the 'planned' diff even though I return an error from the API call as part of the Create step.
I'm interested to know how other people are handling errors in a provider that has a structure like this?
I've tried using Partial(). I've also tried to trigger another Read of each 'thing' but although the final state data looks to be correct (when printing it as part of a debug run with trace logs), once I've done a read, because my 'Create' function has to return an error, the state data that's read is dropped and the original planned diff is persisted (I've even stopped returning an error altogether and tried to return just the result of the Read, which is successful, and STILL the state reflects the planned diff rather than the modified state after a Read).
Since you mentioned Partial I'm assuming for this answer that you are using the older SDKv2 rather than the modern Terraform provider framework.
The programming model for SDKv2 is for the action functions like Create to receive a mutable value representing the planned values, encapsulated in a schema.ResourceData object, and then the provider will modify that value through that wrapping object to make it describe the object that was actually created (or updated).
Terraform Core itself expects a provider to respond to the "apply" request by returning the closest possible representation of what was actually created in the remote system. If the value is returned without an error then Terraform will require that the object conforms to the plan and will raise an error saying that there's a bug in the provider if not. If the provider returns a value and an error then Terraform Core will propagate that error to the user and save whatever value was returned, as long as it matches the schema of the resource type.
Unfortunately this mismatch in models between Terraform Core and the SDK makes the situation you've described quite tricky: if you don't call d.Set at all in your Create function then by default the SDK will just return whatever values were in the plan, even if parts of it weren't actually created yet. To make your provider behave in the way that Terraform is expecting you'd need to do something like this:
At the beginning of Create, decode all of the nested block data into some local variables of data types that are useful for making the API calls you intend to make. For example, you might at this step decode the data from the ResourceData object into whatever struct types the underlying SDK expects.
Before you take any other actions, use d.Set to remove all of the blocks of the types that will require separate requests each. This means you'll need to pass an empty value of whatever type is appropriate for the type you chose for that block's value.
In your loop where you're gradually creating the separate objects that each block represents, gradually append the results into a growing set of objects representing the blocks you've already successfully created. Each time you add a new item to that set, call d.Set again to reset the attribute representing the appropriate block type to now include the object that you created.
If you get to the end without any errors then your attributes should now again describe all of the objects requested in the configuration and you can return without an error. If you encounter an error partway through then you can return that error and the SDK will automatically also return the partially-updated value encapsulated inside the ResourceData object.
If you return an accurate description of which of the blocks were created and exclude the ones that weren't then on the next plan the SDK logic should notice that some of the blocks declared in the configuration aren't present in the prior state and so it should propose to update the object to include those additional blocks. Your provider's Update function can then follow a similar principle as above to gradually append only the nested objects it successfully created, so that it'll once again return a complete set if successful or a partial set in case of any errors.
SDKv2 was optimized for the more common case where a single resource block represents a single remote API call, and so its default behavior deals with either fully-successful or fully-failed responses. Dealing with partial failure requires more subtlety that is difficult to represent in that SDK's API.
The newer Terraform Plugin Framework has a different design for these operations which separates the request data from the response data, thereby making it less confusing to return only a partial result. The Resource interface has a Create method which has a request object containing the config and the plan and a response object containing a representation of the final state.
It pre-populates the response state with the planned values similarly to SDKv2 to still handle that common case of entirely-failing vs. entirely-succeeding, but it does also allow totally overwriting that default with a locally-constructed object representing a partial result, to better support situations like yours where one resource in Terraform is representing a number of different fallible calls to the underlying API.

Azure durable entity or static variables?

Question: Is it thread-safe to use static variables (as a shared storage between orchestrations) or better to save/retrieve data to durable-entity?
There are couple of azure functions in the same namespace: hub-trigger, durable-entity, 2 orchestrations (main process and the one that monitors the whole process) and activity.
They all need some shared variables. In my case I need to know the number of main orchestration instances (start new or hold on). It's done in another orchestration (monitor)
I've tried both options and ask because I see different results.
Static variables: in my case there is a generic List, where SomeMyType holds the Id of the task, state, number of attempts, records it processed and other info.
When I need to start new orchestration and List.Add(), when I need to retrieve and modify it I use simple List.First(id_of_the_task). First() - I know for sure needed task is there.
With static variables I sometimes see that tasks become duplicated for some reason - I retrieve the task with List.First(id_of_the_task) - change something on result variable and that is it. Not a lot of code.
Durable-entity: the major difference is that I add List on a durable entity and each time I need to retrieve it I call for .CallEntityAsync("getTask") and .CallEntityAsync("saveTask") that might slow done the app.
With this approach more code and calls is required however it looks more stable, I don't see any duplicates.
Please, advice
Can't answer why you would see duplicates with the static variables approach without the code, may be because list is not thread safe and it may need ConcurrentBag but not sure. One issue with static variable is if the function app is not always on or if it can have multiple instances. Because when function unloads (or crashes) the state would be lost. Static variables are not shared across instances either so during high loads it wont work (if there can be many instances).
Durable entities seem better here. Yes they can be shared across many concurrent function instances and each entity can only execute one operation at a time so they are for sure a better option. The performance cost is a bit higher but they should not be slower than orchestrators since they perform a lot of common operations, writing to Table Storage, checking for events etc.
Can't say if its right for you but instead of List.First(id_of_the_task) you should just be able to access the orchestrators properties through the client which can hold custom data. Another idea depending on the usage is that you may be able to query the Table Storages directly with CloudTable class for the information about the running orchestrators.
Although not entirely related you can look at some settings for parallelism for durable functions Azure (Durable) Functions - Managing parallelism
Please ask any questions if I should clarify anything or if I misunderstood your question.

What possible states does an Azure PSContainerGroup have

I am running a container on Azure Container Instances. In my code I use the PowerShell command Get-AzContainerGroup to find my running container. As a result I get the PSContainerGroup class which has a property called State.
The type of the property is string, but to me it seems more like an enum that has certain possible values. I want to handle the container programmatically so a state property without a certain set of values is useless to me. What possible values for a state are there?
It should be: Running, Terminated, Waiting, or Unknown. (Disclaimer: purely based on testing, may not be exhaustive list). Reference: https://learn.microsoft.com/en-us/azure/container-instances/container-instances-get-logs
I agree with you it could be better PSContainerGroup State property being an enum. You can raise feedback in User Voice https://feedback.azure.com/forums/602224-azure-container-instances

How to deal with Command which is depend on existing records in application using CQRS and Event sourcing

We are using CQRS with EventSourcing.
In our application we can add resources(it is business term for a single item) from ui and we are sending command accordingly to add resources.
So we have x number of resources present in application which were added previously.
Now, we have one special type of resource(I am calling it as SpecialResource).
When we add this SpecialResource , id needs to be linked with all existing resources in application.
Linked means this SpecialResource should have List of ids(guids) (List)of existing resources.
The solution which we tried to get all resource ids in applcation before adding the special
resource(i.e before firing the AddSpecialResource command).
Assign these List to SpecialResource, Then send AddSpecialResource command.
But we are not suppose to do so , because as per cqrs command should not query.
I.e. command cant depend upon query as query can have stale records.
How can we achieve this business scenario without querying existing records in application?
But we are not suppose to do so , because as per cqrs command should not query. I.e. command cant depend upon query as query can have stale records.
This isn't quite right.
"Commands" run queries all the time. If you are using event sourcing, in most cases your commands are queries -- "if this command were permitted, what events would be generated?"
The difference between this, and the situation you described, is the aggregate boundary, which in an event sourced domain is a fancy name for the event stream. An aggregate is allowed to run a query against its own event stream (which is to say, its own state) when processing a command. It's the other aggregates (event streams) that are out of bounds.
In practical terms, this means that if SpecialResource really does need to be transactionally consistent with the other resource ids, then all of that data needs to be part of the same aggregate, and therefore part of the same event stream, and everything from that point is pretty straight forward.
So if you have been modeling the resources with separate streams up to this point, and now you need SpecialResource to work as you have described, then you have a fairly significant change to your domain model to do.
The good news: that's probably not your real requirement. Consider what you have described so far - if resourceId:99652 is created one millisecond before SpecialResource, then it should be included in the state of SpecialResource, but if it is created one millisecond after, then it shouldn't. So what's the cost to the business if the resource created one millisecond before the SpecialResource is missed?
Because, a priori, that doesn't sound like something that should be too expensive.
More commonly, the real requirement looks something more like "SpecialResource needs to include all of the resource ids created prior to close of business", but you don't actually need SpecialResource until 5 minutes after close of business. In other words, you've got an SLA here, and you can use that SLA to better inform your command.
How can we achieve this business scenario without querying existing records in application?
Turn it around; run the query, copy the results of the query (the resource ids) into the command that creates SpecialResource, then dispatch the command to be passed to your domain model. The CreateSpecialResource command includes within it the correct list of resource ids, so the aggregate doesn't need to worry about how to discover that information.
It is hard to tell what your database is capable of, but the most consistent way of adding a "snapshot" is at the database layer, because there is no other common place in pure CQRS for that. (There are some articles on doing CQRS+ES snapshots, if that is what you actually try to achieve with SpecialResource).
One way may be to materialize list of ids using some kind of stored procedure with the arrival of AddSpecialResource command (at the database).
Another way is to capture "all existing resources (up to the moment)" with some marker (timestamp), never delete old resources, and add "SpecialResource" condition in the queries, which will use the SpecialResource data.
Ok, one more option (depends on your case at hand) is to always have the list of ids handy with the same query, which served the UI. This way the definition of "all resources" changes to "all resources as seen by the user (at some moment)".
I do not think any computer system is ever going to be 100% consistent simply because life does not, and can not, work like this. Apparently we are all also living in the past since it takes time for your brain to process input.
The point is that you do the best you can with the information at hand but ensure that your system is able to smooth out any edges. So if you need to associate one or two resources with your SpecialResource then you should be able to do so.
So even if you could associate your SpecialResource with all existing entries in your data store what is to say that there isn't another resource that has not yet been entered into the system that also needs to be associated.
It all, as usual, will depend on your specific use-case. This is why process managers, along with their state, enable one to massage that state until the process can complete.
I hope I didn't misinterpret your question :)
You can do two things in order to solve that problem:
make a distinction between write and read model. You know what read model is, right? So "write model" of data in contrast is a combination of data structures and behaviors that is just enough to enforce all invariants and generate consistent event(s) as a result of every executed command.
don't take a rule which states "Event Store is a single source of truth" too literally. Consider the following interpretation: ES is a single source of ALL truth for your application, however, for each specific command you can create "write models" which will provide just enough "truth" in order to make this command consistent.

Resources