Terraform: dynamically create list of resources - terraform

Due to some prior decisions, there is a script that creates ALBs and a completely separate one to setup alarms for each ALB created (odd, but I can't change this).
I could hard code a list of all the ALBs and iterate thru them, ie:
albs = ['a', 'b']
I know how to iterate thru a list with a "for_each".
What I need is to build the list dynamically so I don't have to manually maintain the list. I know I can get a list of ALBs using:
terraform state list [options] ## https://www.terraform.io/docs/commands/state/list.html
but that doesn't really help (sure, I can pipe that to a file and iterate through the lines in the file and pass them as parameters - but that is ugly as sin)
How do I dynamically build the list with all my ALBs? Something like:
albs = state_list([options])
Thanks! Using AWS.


Snakemake: Parameter as wildcard used in parallel script runs

I'm fairly new to snakemake and inherited a kind of huge worflow that consists in a sequence of 17 rules that run in serial.
Each rule takes outputs from the previous rules and uses them to run a python script. Everything has worked great so far except that now I'm trying to improve the worflow since some of the rules can be run in parallel.
A rough example of what I'm trying to achieve, my understanding is that wildcards should allow me to solve this.
grid = [ 10 , 20 ]
rule all:
expand("path/to/C/{grid}/file_C" ,grid = grid)
rule process_A:
path_A = "path/to/A/file_A"
path_B = "path/to/B/{grid}/file_B" # A rule further in the worflow could need a file from a previous rule saved with this structure
grid = lambda wc: wc.get(grid)
path_C = "path/to/C/{grid}/file_C"
And inside the script I retrieve the grid size parameter:
grid = snakemake.params.grid
In the end the whole rule process_A should be rerun with grid = 10 and with grid = 20 and save each result to a folder whose path depends on grid also.
I know there are several things wrong with this, but I can't seem to find were to start from to figure this out. The error I'm getting now is:
name 'params' is not defined
Any help as to where to start from?
It would be useful to post the error stack trace of name 'params' is not defined to know exactly what is causing it. For now...
And inside the script I retrieve the grid size parameter:
grid = snakemake.params.grid
I suspect you are mixing the script directive with the shell directive. Probably you want something like:
rule process_A:
input: ...
output: ...
params: ...
inside script_A.py snakemake will replace snakemake.params.grid with the actual param value.
Alternatively, write a standalone python script that parses command line arguments and you execute like any other program using the shell directive. (I tend to prefer this solution as it makes things more explicit and easier to debug but it also means more boiler-plate code to write a standalone script).

fastest way to match data from two massive lists with differing data types?

I have data regarding a directory structure of unknown (and massive) size and data regarding the same structure from perforce. Using Python, I need to be able to match the local data with the perforce data and generate a list of files that reflects all of the data on the users workspace (local directory), including all of the files missing from perforce, as well as all the data in the depot that is missing from the workspace.
Local Directory Structure Data:
I have full control over how I mine out that data (currently using os.walk)
Perforce Data:
Not much control over how the data is returned
Currently comes as a list of dictionaries
Data returns very fast regardless of size.
#this list is hundreds of thousands of entries.
p4data_example = [{'depotFile': '//Path/To/Data/file.extension', 'clientFile': 'X:\\Path\\To\\Data\\file.extension', 'isMapped': '', 'headAction': 'add', 'headType': 'text', 'headTime': '00000', 'headRev': '1', 'headChange': '0000', 'headModTime': '00000', 'haveRev': '', 'otherOpen': ['stuff'], 'otherAction': ['move/delete'], 'otherChange': ['00000'], 'otherOpens': '1'}]
I need to operate on the local directory files whether or not they have matching p4 data.
path_to_data = "X:\Path\To\Data"
p4data = p4.run('fstat', "%s\..." % path_to_data)
for root, dirs, files in os.walk(path_to_data, topdown = False):
for file in files:
matchingp4 = None
for p4item in p4Data:
if p4item['clientFile'] == file_name:
matchingp4 = p4item
do_stuff_with_data(foo, bar)
I am confident this is not the most efficient way to handle this.
The extended time seems to come from:
Getting all of the local data
Needing to loop over the data so many times to find matches.
I need this to run as fast as possible. Ideally this would run in just a couple seconds but I understand that not knowing how large the data set can get will cause this to vary by an unknown amount.
Using Python, I need to be able to match the local data with the perforce data and find all of the local files missing from perforce and all of the perforce data that differs from the local data.
I am confident this is not the most efficient way to handle this.
Correct. Just run p4 reconcile and Perforce will do all of this automatically. :)
reconcile does essentially what you're trying to do, but much more efficiently -- the client walks the local tree, sends a list of files to the server, and then instead of doing an NxN comparison the server uses the mapping information to directly request additional client checks (i.e. checksumming to detect differences) as appropriate for individual files.

How to provide a list of arguments to cookiecutter?

I am trying to build a cookiecutter template for a terraform repository.
The repo is used to create buckets. I would like to add a "buckets" argument, where bucket is a list.
I tried something like this in the cookiecutter.json.
"buckets": ["bucket_1", "bucket_2", "bucket_3"],
"arg2": value
but it appears that doing this just asks if you want to pick bucket_1/2 or 3. I would actually to allow creating 1 bucket or several and not just pick an option in that list. Is it something feasible with cookiecutter ?

Puppet 6 how to split or truncate domian name

I have array of domains like this:
'us1.domain.com', 'us2.domain.com', 'us3.domain.com', 'anotherdomain.com', 'yet.third.com'
I would split or truncate these domain names to:
Could anybody prompt me, please?
This new array will use for certificate file name.
Thanks in advance
You can solve that problem like this:
$array = [
'us1.domain.com', 'us2.domain.com', 'us3.domain.com',
'anotherdomain.com', 'yet.third.com'
notice($array.map |$x| { $y=$x.split(/\./); [$y[-2], $y[-1]].join('.') }.unique)
▶ puppet apply test.pp
Notice: Scope(Class[main]): [domain.com, anotherdomain.com, third.com]
Notice: Compiled catalog for 192-168-1-103.tpgi.com.au in environment production in 0.05 seconds
Notice: Applied catalog in 0.01 seconds
Key insights there:
You can split each element on a period using using the split function.
You can take the last and second last elements of an array using $arr[-1] and $arr[-2].
You can join it all back together again using the join function.
You can transform the list into a new list using the map function.
You can remove the duplicates using the unique function.

Create automated report from web data

I have a set of multiple API's I need to source data from and need four different data categories. This data is then used for reporting purposes in Excel.
I initially created web queries in Excel, but my Laptop just crashes because there is too many querie which have to be updated. Do you guys know a smart workaround?
This is an example of the API I will source data from (40 different ones in total)
The data points I need are:
EstimatedMonthlyVisits, TopOrganicKeywords, OrganicSearchShare, TrafficSources
Any ideas how I can create an automated report which queries the above data on request?
Thanks so much.
If Excel is crashing due to the demand, and that doesn't surprise me, you should consider using Python or R for this task.
Next we need to set our working directory and parse the XML file as a matter of practice, so we're sure that R can access the data within the file. This is basically reading the file into R. Then, just to confirm that R knows our file is in XML, we check the class. Indeed, R is aware that it's XML.
setwd("C:/Users/Tobi/Documents/R/InformIT") #you will need to change the filepath on your machine
class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
Now we can begin to explore our XML. Perhaps we want to confirm that our HTTP query on Entrez pulled the correct results, just as when we query PubMed's website. We start by looking at the contents of the first node or root, PubmedArticleSet. We can also find out how many child nodes the root has and their names. This process corresponds to checking how many entries are in the XML file. The root's child nodes are all named PubmedArticle.
xmltop = xmlRoot(xmlfile) #gives content of root
class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" "XMLAbstractNode"
xmlName(xmltop) #give name of node, PubmedArticleSet
xmlSize(xmltop) #how many children in node, 19
xmlName(xmltop[[1]]) #name of root's children
To see the first two entries, we can do the following.
# have a look at the content of the first child entry
# have a look at the content of the 2nd child entry
Our exploration continues by looking at subnodes of the root. As with the root node, we can list the name and size of the subnodes as well as their attributes. In this case, the subnodes are MedlineCitation and PubmedData.
#Root Node's children
xmlSize(xmltop[[1]]) #number of nodes in each child
xmlSApply(xmltop[[1]], xmlName) #name(s)
xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s)
xmlSApply(xmltop[[1]], xmlSize) #size
We can also separate each of the 19 entries by these subnodes. Here we do so for the first and second entries:
#take a look at the MedlineCitation subnode of 1st child
#take a look at the PubmedData subnode of 1st child
#subnodes of 2nd child
The separation of entries is really just us, indexing into the tree structure of the XML. We can continue to do this until we exhaust a path—or, in XML terminology, reach the end of the branch. We can do this via the numbers of the child nodes or their actual names:
#we can keep going till we reach the end of a branch
xmltop[[1]][[1]][[5]][[2]] #title of first article
xmltop[['PubmedArticle']][['MedlineCitation']][['Article']][['ArticleTitle']] #same command, but more readable
Finally, we can transform the XML into a more familiar structure—a dataframe. Our command completes with errors due to non-uniform formatting of data and nodes. So we must check that all the data from the XML is properly inputted into our dataframe. Indeed, there are duplicate rows, due to the creation of separate rows for tag attributes. For instance, the ELocationID node has two attributes, ValidYN and EIDType. Take the time to note how the duplicates arise from this separation.
#Turning XML into a dataframe
Madhu2012=ldply(xmlToList("pubmed_sample.xml"), data.frame) #completes with errors: "row names were found from a short variable and have been discarded"
View(Madhu2012) #for easy checking that the data is properly formatted
Madhu2012.Clean=Madhu2012[Madhu2012[25]=='Y',] #gets rid of duplicated rows
Here is a link that should help you get started.
If you have never used R before, it will take a little getting used to, but it's worth it. I've been using it for a few years now and when compared to Excel, I have seen R perform anywhere from a couple hundred percent faster to many thousands of percent faster than Excel. Good luck.
