Where do old,new := d.GetChange() come from in CustomizeDiff and DiffSuppressFunc? - terraform

There're 2 methods:
CustomizeDiff
DiffSuppressFunc
Corresponding objects (schema.ResourceDiff and schema.Resource) support old, new := d.GetChange("foo"), however I'm confused about where these values are coming from.
I've been thinking that
DiffSuppressFunc: func(k, old, new string, d *schema.ResourceData) bool
taked old from TF state and new from the result from running readResource(). What if there's 0 diff and then user changes main.tf -- is it old or new value?
and for CustomizeDiff:
old, new := d.GetChange("foo")
it seems like new is from TF state / main.tf but old is from readResource().
Where can I read about it more? I was always thinking that TF state is old and then response is new -- when looking at output from TF drift.

The DiffSuppressFunc abstraction in this old Terraform SDK is unfortunately one of the parts that still retains some outdated assumptions from older versions of Terraform, since it was those older versions that this SDK was originally designed to serve.
Specifically, in Terraform v0.11 and earlier the model of resource state and plan data was a flat map from strings to strings and the SDK internally translated between that and the heirarchical structures described in the schema. Under this model, a list in the provider schema serializes as a bunch of separate entries in the flat map, like example.# giving the number of elements, example.0 giving the first element, etc.
DiffSuppressFunc is one place where that internal implementation detail leaked up into the API, because "diff suppressing" is an operation done against the already-flattened data structure that's describing the changes, and so the schema type information has all been lost.
You shouldn't typically need to worry about exactly what old and new mean because the purpose of DiffSuppressFunc is only to determine whether the two values are functionally equivalent. The function only needs to compare the two and return true if they represent alternative serializations of the same information.
However, if you're curious about the implementation details then you can review the part of the SDK which calls this function.
CustomizeDiff's behavior is more specialized than DiffSuppressFunc, because it's used for one purpose and one purpose only: adding special behaviors to run during Terraform's "plan" step.
In this case then the old value is always the most recent known value for a particular argument, and the new value starts off being a value from the current configuration but you can override it using SetNew or SetNewComputed methods of ResourceDiff.
To emulate what might normally be done by a CustomizeDiff you'd write logic something like this:
old, new := d.GetChange("foo")
if functionallyEquivalent(old, new) {
d.SetNew(old)
}
The definition of functionallyEquivalent is for you to write based on your knowledge of the system which you are wrapping with this provider. If foo is a string attribute then you can use type assertions like old.(string) and new.(string) to get the actual string values to compare.
SDKv2 is essentially a legacy system at this point, designed around the behaviors of an obsolete version of Terraform. It's still available primarily to support existing providers which were themselves originally written for those obsolete versions of Terraform.
The new Terraform Plugin Framework is built for modern Terraform and so has fewer "gotchas" resulting from inconsistencies between how the SDK works and how Terraform itself works.
The modern equivalent of CustomizeDiff in the plugin framework is plan modification, and a plan modifier for a string attribute would be an implementation of planmodifier.String.
The new API makes it a bit more explicit where all of these values are coming from: the StringRequest type differentiates between the value from the configuration, the value from the prior state, and the value from the proposed new state, which is the framework's initial attempt to construct a plan prior to any custom modifications in the provider.
Therefore a plan modifier for normalizing a string attribute in a similar manner to DiffSuppressFunc in the old SDK would be:
func (m ExampleStringModifier) PlanModifyString(ctx context.Context, req StringRequest, resp *StringResponse) {
if functionallyEquivalent(req.StateValue, req.PlanValue) {
// Preserve the value from the prior state if the
// new value is equivalent to it.
resp.PlanValue = req.StateValue
}
}
Again you'll need to define and implement the exact rule for what functionallyEquivalent means for this particular attribute.

Related

How can I implement it using DiffSuppressFunc()?

Context: I'm developing a TF Provider.
There's an attribute foo of type string in one of my resources. Different representations of values of foo can map to the same normalized version but only backend can return a normalized version of value of foo.
When implementing the resources, I was thinking I could store any user value for foo (i.e., it's not necessarily normalized). And then I could leverage DiffSuppressFunc to detect any potential differences. For example, main.tf stores any user input (by definition), TF state could store either normalized version return from a backend or user input version (don't matter a lot). And then, the biggest challenge is to differentiate between structural update (requires an update) and syntactic update (doesn't require update since it can be converted to the same normalized version).
In order to implement this I could use
"foo": {
...
DiffSuppressFunc: func(k, old, new string, d *schema.ResourceData) bool {
// Option #1
normalizedOld := network.GetNormalized(old)
normalizedNew := network.GetNormalized(new)
return normalizedOld == normalizedNew
// Option #2
// Backend also supports a check whether such a value exists already
// and returns such a object
if obj, ok := network.Exists(new); ok { return obj.Id == d.GetObjId(); }
}
}
However it seems like I can't send network requests in DiffSuppressFunc since it doesn't accept meta interface{} from:
func resourceCreate(ctx context.Context, d *schema.ResourceData, meta interface{})
So I can't access my specific http client (even though I could send some generic network request).
Is there a smart way to avoid this limitation to pass meta interface{} to DiffSuppressFunc:
// The interface{} parameter is the result of the Provider type
// ConfigureFunc field execution. If the Provider does not define
// a ConfigureFunc, this will be nil. This parameter is conventionally
// used to store API clients and other provider instance specific data.
//
// The diagnostics return parameter, if not nil, can contain any
// combination and multiple of warning and/or error diagnostics.
ReadContext ReadContextFunc
The intention for DiffSuppressFunc is that it be only syntactic normalization that doesn't rely on information from outside of the provider. A DiffSuppressFunc should not typically interact with anything outside of the provider because the SDK can call it at various steps and expects it to return a consistent result each time, rather than varying based on the state of the remote system.
If you need to rely on information from the remote system then you'll need to implement the logic you're discussing in the CustomizeDiff function instead. That function is the lowest level of abstraction for diff customization in the SDK but in return for the low level of abstraction it also allows more flexibility than the higher-level built-in behaviors in the SDK.
In the CustomizeDiff function you will have access to meta and so you can make API requests if you need to.
Inside your CustomizeDiff function you can use d.GetChange to obtain both the previous value and the new value from the configuration to use in the same way as the old and new arguments to DiffSuppressFunc.
You can then use d.SetNew to change the planned value for a particular attribute based on what you learned. To approximate what DiffSuppressFunc would do you would call d.SetNew with the value from the prior state -- the "old" value.
When implementing CustomizeDiff you must respect the consistency rules that apply to all Terraform providers, which include:
When planning initial creation of an object, if the module author provided a specific value in the configuration then you must preserve exactly that value, without normalization.
When planning an update to an existing object, if the module author has provided a specific value in the configuration then you must return either the exact value they wrote without normalization or return exactly the value from the prior state to indicate that the new configuration value is functionally equivalent to the previous value.
When implementing Read there is also a similar consistency rule:
If the value you read from the remote system is not equal to what was in the prior state but the new value is functionally equivalent to the prior state then you must return the value from the prior state to preserve the way the author originally wrote it, rather than the way the remote system normalized it.
All of these rules exist to help ensure that a particular Terraform configuration can converge, which is to say that after running terraform apply it should be possible to immediately run terraform plan and see it report "No changes". If you don't stick to these rules then Terraform may return an explicit error (for problems it's able to detect) or it may just behave strangely due to the provider producing confusing information that doesn't match the assumptions of the protocol.

Shall I use block or attirbute when designing a Terraform resource?

Context: I'm developing a terraform provider.
I can see that some of the providers (like AWS) use an attribute (e.g., connection_id) when referencing ID:
resource "aws_dx_connection_confirmation" "confirmation" {
connection_id = "dxcon-ffabc123"
}
whereas others use blocks:
resource "aws_dx_connection_confirmation" "confirmation" {
connection {
id = "dxcon-ffabc123"
}
}
Is there a specific pattern around it? From what I can see,
Use block if there're mulitple kinda enum values (bar, bar_2) and only one of them can be specified:
resource "aws_foo" "temp" {
bar {
id = "dxcon-ffabc123"
}
// bar_2 {
// id = "abcde"
//}
}
Use block to group multiple related attributes:
resource "aws_devicefarm_test_grid_project" "example" {
name = "example"
vpc_config {
vpc_id = aws_vpc.example.id
subnet_ids = aws_subnet.example.*.id
security_group_ids = aws_security_group.example.*.id
}
}
Use block when there's a plan to add more attributes to the object that block represents:
resource "aws_dx_connection_confirmation" "confirmation" {
connection {
id = "dxcon-ffabc123"
// TODO: later on, `name` will be added as a second input option that could be used to identify connection instead of `id`
}
}
I found Attributes as Blocks doc but it's a bit confusing.
In general the direct comparison here is between an argument (attribute) with a Terraform map type value (as distinguished from Golang map, which can also be used to specify a Terraform block value), and a Terraform block. These are essentially equivalent in the fact that they allow passing key value pairs as values, but there are some differences. Here is a bit a of a binary decision tree for which to use:
Is the ideal Terraform value type a map or an object (i.e. should the keys follow a naming schema, or can the keys be named almost anything)?
map: attribute
object: block
If the value is changed, does that force a Delete and Create, or it is possible to instead Update?
DC: usually block
U: usually attribute
Is there another resource in the provider that could replace the usage in the current resource (different API usage) e.g. in your example above would there be another resource exclusively devoted to assigning the connection?
yes: usually block
no: usually attribute
Is the value multi-level (multiple levels of key value pairs), or single level?
single level: attribute leads to better code because simpler and cleaner
multi-level: block leads to better code because nested blocks
There may be other deciding factors I cannot recall, but these will hopefully guide in the right direction.
If you are developing an entirely greenfield provider, and so you don't need to remain compatible with any existing usage, I would suggest considering using the Terraform Plugin Framework (instead of "SDKv2") which is being designed around the type system and behaviors of modern Terraform, whereas the older SDK was designed for much older Terraform versions which had a much more restrictive configuration language.
In particular, the new framework encourages using the attribute-style syntax exclusively, by allowing you to declare certain attributes as having nested attributes, which then support most of the same internal structures that blocks would allow but using the syntax of assigning a value to a name, rather than the nested block syntax.
The original intent of nested blocks was to represent the sense of declaring a separate object that happened to "belong to" the containing object, rather than declaring an argument of that top-level object. That distinction was murky in practice, since underlying APIs often represent these nested objects as JSON arrays or maps inside the top-level object anyway, and so the additional abstraction of showing them as separate objects ends up hurting rather than helping, because it obscures the nature of the underlying data structure. Particularly if the physical representation of the concept in the API is as a nested data structure inside the containing object, I think it's most helpful to use a comparable data structure in Terraform.
Many uses of nested block types in existing providers are either concessions to backward compatibility or constraints caused by those providers still being written against SDKv2, and thus not having the ability to declare a structured attribute type -- such a concept did not exist in Terraform v0.11 and earlier, which is what SDKv2 was designed for.
The plugin framework does still support declaring blocks, and there are some situations where the nested item really is a separate object that just happens to be conceptually contained within another where the block syntax could still be a reasonable choice. However, my recommendation would be to default to using the attribute syntax in most cases.
At the time I'm writing this, the Plugin Framework is still relatively new and its design not entirely settled. Therefore when considering whether to use it, I suggest to consult Which SDK Should I Use? in order to make an informed decision.

How can a nested block have a label in a custom terraform provider?

I am writing a provider for terraform and I'm trying to work out how to get labels on nested blocks.
https://www.terraform.io/docs/language/syntax/configuration.html#blocks states "A particular block type may have any number of required labels, or it may require none as with the nested network_interface block type."
If I have something like this in my config
resource "myprovider_thing" "test" {
x = 1
settings "set1" {
y = 2
}
}
where the settings in the schema has a Type of schema.TypeSet with an Elem of type &schema.Resource. When I try a plan I get told there is an extraneous label and no labels are expected for the block.
I can't find anything explaining how to set that an element requires a label or how to access it.
Is it possible to have a nested block with a label or am I misunderstanding what is written on the configuration page?
The syntax documentation you referred to is making a general statement about the Terraform language grammar, but the mechanism of labeled blocks is generally reserved for constructs built in to the Terraform language, like resource blocks as we can see in your example.
The usual approach within resource-type-specific arguments is to create map-typed arguments and have the user assign map values to them, rather than using the block syntax. That approach also makes it easier for users to dynamically construct the map, for situations where statically-defined labels are not sufficient, because they can use arbitrary Terraform language expressions to generate the value.
The current Terraform SDK is built around the capabilities of all historical versions of Terraform and so it only supports maps of primitive types as a result of design constraints in Terraform v0.11 and earlier. That means there isn't a way to specify a map of objects, which would be the closest analog to a nested block type with a label.
At the time I'm writing this answer there is a new library under active development called Terraform Plugin Framework, which ends support for Terraform versions prior to v0.12 but then in return gets to make use of Terraform features introduced in that release, including the possibility of declaring maps of objects in your provider schema. It remains experimental at the time I'm writing this because the team is still iterating on the best way to represent all of the different capabilities, but for a green-field provider it could be a reasonable foundation if you're willing to respond to any potential breaking changes that might occur on the way to it reaching its first stable release.

Difference between local values and null_data_source for intermediate values on Terraform

I have a situation where I need to store some intermediate values so I can reuse them in other parts of the root module. I know about local values and I know about null_data_source except I do not know which one is the recommended option for holding re-usable values. Both descriptions look somewhat similar to me
local values (https://www.terraform.io/docs/configuration/locals.html)
Local values can be helpful to avoid repeating the same values or expressions multiple times in a >configuration, but if overused they can also make a configuration hard to read by future >maintainers by hiding the actual values used.
and null_data_source (https://www.terraform.io/docs/providers/null/data_source.html)
The primary use-case for the null data source is to gather together collections of intermediate >values to re-use elsewhere in configuration:
So both appear to be a valid choice for this scenario.
Here is my example code
locals {
my_string_A = "This is string A"
}
data "null_data_source" "my_string_B" {
inputs = {
my_string_B = "This is string B"
}
}
output "my_output_a" {
value = "${local.my_string_A}"
}
output "my_output_b" {
value = "${data.null_data_source.my_string_B.outputs["my_string_B"]}"
}
Could you suggest on when to use the one over the other for holding intermediate values and what is the pros/cons of each approach?
Thank you
The null_data_source data source was introduced prior to the local values mechanism as an interim solution to meet that use-case before that capability became first-class in the language. It continues to be supported only for backward-compatibility with existing configurations using it.
All new configurations should use the Local Values mechanism instead. It's fully integrated into the Terraform language, supports values of any type (while null_data_source can support only strings), and has a much more concise/readable syntax.

V8 JavaScript Object vs Binary Tree

Is there a faster way to search data in JavaScript (specifically on V8 via node.js, but without c/c++ modules) than using the JavaScript Object?
This may be outdated but it suggests a new class is dynamically generated for every single property. Which made me wonder if a binary tree implementation might be faster, however this does not appear to be the case.
The binary tree implementation isn't well balanced so it might get better with balancing (only the first 26 values are roughly balanced by hand.)
Does anyone have an idea on why or how it might be improved? On another note: does the dynamic class notion mean there are actually ~260,000 properties (in the jsperf benchmark test of the second link) and subsequently chains of dynamic class definitions held in memory?
V8 uses the concepts of 'maps', which describe the layout of the data in an object.
These maps can be "fast maps" which specify a fixed offset from the start of the object at which a particular property can be found, or they can be "dictionary map", which use a hashtable to provide a lookup mechanism.
Each object has a pointer to the map that describes it.
Generally, objects start off with a fast map. When a property is added to an object with a fast map, the map is transitioned to a new one which describes the location of the new property within the object. The object is re-allocated with enough space for the new data item if necessary, and the object's map pointer is set to the new map.
The old map keeps a record of the transitions from it, including a pointer to the new map and a description of the property whose addition caused the map transition.
If another object which has the old map gets the same property added (which is very common, since objects of the same type tend to get used in the same way), that object will just use the new map - V8 doesn't create a new map in this case.
However, once the number of properties goes over a certain theshold (in fact, the current metric is to do with the storage space used, not the actual number of properties), the object is changed to use a dictionary map. At this point the object is re-written using a hashtable. In general, it won't undergo any more map transitions - any more properties that are added will just go in the hashtable.
Fast maps allow V8 to generate optimized code (using Crankshaft) where the offset of a property within an object is hard-coded into the machine code. This makes it very fast for cases where it can do this - it avoids the need for doing any lookup.
Obviously, the generated machine code is then dependent on the map - if the object's data layout changes, the code has to be discarded and re-optimized when necessary. V8 has a type profiling mechanism which collects information about what the types of various objects are during execution of unoptimized code. It doesn't trigger optimization of the code until certain stability constraints are met - one of these is that the maps of objects used in the function aren't changing frequently.
Here's a more detailed description of how this stuff works.
Here's a video where one of the lead developers of V8 describes stuff like map transitions and lots more.
For your particular test case, I would think that it goes through a few hundred map transitions while properties are being added in the preparation loop, then it will eventually transition to a dictionary based object. It certainly won't go through 260,000 of them.
Regarding your question about binary trees: a properly sized hashtable (with a sensible hash function and a significant number of objects in it) will always outperform a binary tree for a use-case where you're just searching, as your test code seems to do (all of the insertion is done in the setup phase).

Resources