Intake Cache specify filename/location - intake

I'm trying to use intake and the intake-xarray to open and store remote files. I have a minimized catalog file here:
/isibhv/projects/paleo_pool/boundary_conditions/ice_sheet_reconstructions/ice_sheet_reconstructions.yaml
It looks like this:
metadata:
version: 1
sources:
glac1d:
description: The GLAC-1D Reconstruction
driver: netcdf
args:
urlpath: "https://sharebox.lsce.ipsl.fr/index.php/s/yfuUw91ruuJXroC/download?path=%2F&files=TOPicemsk.GLACD26kN9894GE90227A6005GGrBgic.nc"
cache_dir: "{{ CATALOG_DIR }}/glac1d"
cache:
- argkey: urlpath
type: file
I can open the files in Python:
import intake
cat = intake.open_catalog("ice_sheet_reconstructions.yaml")
ds = cat.glac1d.read()
This all works wonderfully; and I get the file as I would expect it. However, the cache doesn't show up where I would expect. I would have guessed a new folder is made under:
/isibhv/projects/paleo_pool/boundary_conditions/ice_sheet_reconstructions/glac1d
Instead, I get something in my home directory.
Did I specify the cache directory incorrectly?
As a second question: is it possible to directly specify how the cached files should be called when they are saved?
Thanks!
Paul

The location of the cache is specified by the config, which is a YAML file typically in ~/.intake/conf.yaml (key "cache_dir"), but can be elsewhere according to the INTAKE_CONF(_FILE) environment variable OR the metadata of the source, key "catalog_dir" (<- this may be incorrect?). The special value "catdir" means "in the directory where the catalog is".
However
With the appearance of caching in fsspec, the following will be possible:
sources:
glac1d:
description: The GLAC-1D Reconstruction
driver: netcdf
args:
urlpath: "filecache://sharebox.lsce.ipsl.fr/index.php/s/yfuUw91ruuJXroC/download?path=%2F&files=TOPicemsk.GLACD26kN9894GE90227A6005GGrBgic.nc"
storage_options:
target_protocol: https
cache_storage: "{{ CATALOG_DIR }}/glac1d"
unfortunately, the required change is not yet in intake-xarray.

Related

how to avoid mixture of \ and / in file paths when joining paths in Docker containerized Python code

As far as I'm aware I'm using best practices to define paths (using raw strings) and how I go about joining them (using os.path.join()), e.g.
import os
fdir = r'C:\Code\...\samples'
fpath = os.path.join(fdir, 'fname.ext')
and doing so has not caused me any problems when running my code within a Python or command shell. If I print fpath to the console I get consistent use of \s in the path:
C:\Code...\samples\fname.ext
But when I run a Docker containerized version of the code and run the image I get the error:
FileNotFoundError: [Errno 2] No such file or directory:
'C:\Code\...\samples/fname.ext'
I don't understand why os.path.join() has used a / to join fdir and fname.ext when the rest of the path included \\. It doesn't do this when I run the code outside of the container.
I have tried using os.path.normpath():
fpath = os.path.join(fdir, 'fname.ext')
fpath = os.path.normpath(fpath)
as discussed here, and os.sep.join():
fpath = os.sep.join([fdir, 'fname.ext'])
as covered here, and Path().joinpath():
from pathlib import Path
fpath = Path(fdir).joinpath('fname.ext')
as well as Path() / 'path_to_add':
fpath = Path(fdir) / 'fname.ext'
as discussed here, but in every case I end up with the same result using os.path.join().
Can someone please help me to understand what is going on and how to create consistent paths that will work whether I run the code in Python in a Windows environment, or in a Docker container?
Update Nov. 16:
In trying to keep my question brief I think I've left out details that are crucial. Apologies to those who have kindly taken the time to offer suggestions based on my incomplete description of the problem.
My code needs to import/export files from/to directories that are defined within a user-specified configuration file.
So the configuration file has a section of code where the user defines variables and paths, e.g.
samplesDir = r"path-to-samples-directory"
The variables are stored in a dictionary of dictionaris and stored as a .json.
At the start of the code the user defines the key that selects the dictionary of interest so that at various parts in my code when a file needs to be imported/exported, the paths are at hand.
So back to my example, samplesDir is stored in the configuration dictionary, cfgDict, so all I need to do is append the file name:
sampleFpath = os.path.join(sampleDir, sampleFname)
and sampleFname is determined based on other variables.
Because of the dynamic nature of the variables (including directory paths and file paths), I think it rules out the use of static path defined in a .yml with Docker Compose.
Update Nov. 18:
It may help to include a few more details and some screenshots.
The above screenshot shows the file and folder structure of the src directory containing the source code, the main app.py script for command-line use, the Dockerfile, etc.
The configs folder contains JSON files that includes variables, paths to directories and files. The user can create configuration files either by copying an existing one and modifying the entries, or configuration files can be generated by calling config.py.
Within config.py I have pre-set variables and paths, so that the directory path to the configuration files (configs), sample files (sample_DROs) and others (e.g. fiducials) are all within src.
I don't anticipate any reason why the user would want to store the config files anywhere else, nor do I expect them to want to use different sample files (or move them elsewhere). However, they will undoubtedly create their own fiducials and may decide not to store them in the fiducials directory (i.e. somewhere not within the src directory).
Likewise I have pre-set the download directory (based on the parameters stored within the configuration files, files are fetched from a server and downloaded) to be the default Downloads directory:
rootDownloadDir = os.path.join(Path.home(), "Downloads", "xnat_downloads")
Those files are later imported, processed, and the outputs are (by default) exported into sub-directories within rootDownloadDir.
Within Dockerfile I set the working directory of the container to be that of the source code and copy all of the contents of src (with the exception of some directories defined in .dockerignore):
WORKDIR C:/Code/WP1.3_multiple_modalities/src
...
COPY . .
so that the structure of the container mimics that of WORKDIR:
Hence I have allowed for flexibility in import/export directories, and they are by default a combination of paths within and outside of the src directory. And so, the code executed within the container will need to access files both within and outside of src.
That said, I don't know what rootDownloadDir will look like when os.path.join(Path.home(), "Downloads", "xnat_downloads") is run within the container.
This has got me thinking - Is it bad practice to set the download directory outside of src?
Returning to the original error:
the sample file is in the container:
From the actual behavior I can suppose that the container is based on Unix-like image. Path separator is / in such systems.
To build an environment-independent path which works inside and outside of the container you need the following steps:
Mounting of host folder to container directory.
Environment variable inside and outside the container.
I can show an example of how this is achievable via docker-compose tool and its configuration file docker-compose.yml:
# docker-compose.yml file
version: '3'
services:
<service_name>: # your service name here
image: <image_name> # name of image your container is built on
environment:
- SAMPLES_PATH=/samples
volumes:
- C:\Code\somepath\samples:/samples
In your python code you can use the following structure:
import os
fdir = os.getenv('SAMPLES_PATH', r'C:\Code\...\samples')
fpath = os.path.join(fdir, 'fname.ext')

cookie cutter: what's the easiest way to specify variables for the prompts

Is there anything that offers replay-type functionality, by pointing at a predefined prompt-answer file?
What works and what I'd like to achieve.
Let's take an example, using a cookiecutter to prep a Python package for pypi
cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git
You've downloaded /Users/jluc/.cookiecutters/cookiecutter-pypackage before. Is it okay to delete and re-download it? [yes]:
full_name [Audrey Roy Greenfeld]: Spartacus 👈 constant for me/my organization
email [audreyr#example.com]: spartacus#example.com 👈 constant for me/my organization
...
project_name [Python Boilerplate]: GladiatorRevolt 👈 this will vary.
project_slug [q]: gladiator-revolt 👈 this too
...
OK, done.
Now, I can easily redo this, for this project, via:
cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git --replay
This is great!
What I want:
Say I create another project, UnleashHell.
I want to prep a file somehow that has my developer-info and project level info for Unleash. And I want to be able to run it multiple times against this template, without having to deal with prompts. This particular pypi template gets regular updates, for example python 2.7 support has been dropped.
The problem:
A --replay will just inject the last run for this cookiecutter template. If it was run against a different pypi project, too bad.
I'm good with my developer-level info, but I need to vary all the project level info.
I tried copying the replay file via:
cp ~/.cookiecutter_replay/cookiecutter-pypackage.json unleash.json
Edit unleash.json to reflect necessary changes.
Then specify it via --config-file flag
cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git --config-file unleash.json
I get an ugly error, it wants YAML, apparently.
cookiecutter.exceptions.InvalidConfiguration: Unable to parse YAML file .../000.packaging/unleash.json. Error: None of the known patterns match for {
"cookiecutter": {
"full_name": "Spartacus",
No problem, json2yaml to the rescue.
That doesn't work either.
cookiecutter.exceptions.InvalidConfiguration: Unable to parse YAML file ./cookie.yaml. Error: Unable to determine type for "
full_name: "Spartacus"
I also tried a < stdin redirect:
cookiecutter.prompts.txt:
yes
Spartacus
...
It doesn't seem to use it and just aborts.
cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git < ./cookiecutter.prompts.txt
You've downloaded ~/.cookiecutters/cookiecutter-pypackage before. Is it okay to delete and re-download it? [yes]
: full_name [Audrey Roy Greenfeld]
: email [audreyr#example.com]
: Aborted
I suspect I am missing something obvious, not sure what. To start with, what is the intent and format expected for the --config file?
Debrief - how I got it working from accepted answer.
Took accepted answer, but adjusted it for ~/.cookiecutterrc usage. It works but the format is not super clear. Especially not on the rc which has to be yaml, though that's not always/often the case with rc files.
This ended up working:
file ~/.cookiecutterrc:
without nesting under default_context... tons of unhelpful yaml parse errors (on a valid yaml doc).
default_context:
#... cut out for privacy
add_pyup_badge: y
command_line_interface: "Click"
create_author_file: "y"
open_source_license: "MIT license"
# the names to use here are:
# full_name:
# email:
# github_username:
# project_name:
# project_slug:
# project_short_description:
# pypi_username:
# version:
# use_pytest:
# use_pypi_deployment_with_travis:
# add_pyup_badge:
# command_line_interface:
# create_author_file:
# open_source_license:
I still could not get a combination of ~/.cookiecutterrc and a project-specific config.yaml to work. Too bad that expected configuration format is so lightly documented.
So I will use the .rc but enter the project name, slug and description each time. Oh well, good enough for now.
You are near.
Try this cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git --no-input --config-file config.yaml
The --no-input parameter will suppress the terminal user input, it is optional of course.
The config.yaml file could look like this:
default_context:
full_name: "Audrey Roy"
email: "audreyr#example.com"
github_username: "audreyr"
cookiecutters_dir: "/home/audreyr/my-custom-cookiecutters-dir/"
replay_dir: "/home/audreyr/my-custom-replay-dir/"
abbreviations:
pp: https://github.com/audreyr/cookiecutter-pypackage.git
gh: https://github.com/{0}.git
bb: https://bitbucket.org/{0}
Reference to this example file: https://cookiecutter.readthedocs.io/en/1.7.0/advanced/user_config.html
You probably just need the default_context block since that is where the user input goes.

Not able to look up class parameter in hiera

I have look at other questions like Using hiera to set class parameters? and others which discusses hiera 3. I am using hiera 5.
Here is my hiera.yaml
[root#e64a2e5c7c79 fisherman]# cat /fisherman/fisherman/hiera/hiera.yaml
---
version: 5
defaults: # Used for any hierarchy level that omits these keys.
datadir: data # This path is relative to hiera.yaml's directory.
data_hash: yaml_data # Use the built-in YAML backend.
hierarchy:
- name: "Apps" # Uses custom facts.
path: "apps/%{facts.appname}.yaml"
I also have this hiera data file:
[root#e64a2e5c7c79 fisherman]# cat /fisherman/fisherman/hiera/apps/HelloWorld.yaml
---
fisherman::create_new_component::component_name: 'HelloWord'
But when I run my puppet agent like so ...
export FACTER_appname=HelloWorld
hiera_config=/fisherman/fisherman/hiera/hiera.yaml
modulepath=/fisherman/fisherman/modules
puppet apply --modulepath=$modulepath --hiera_config=$hiera_config -e 'include fisherman'
... I get this error ...
Error: Evaluation Error: Error while evaluating a Function Call, Class[Fisherman::Create_new_component]: expects a value for parameter $component_name (file: /fisherman/fisherman/modules/fish
erman/manifests/init.pp, line: 12, column: 9) on node e64a2e5c7c79
I tried debugging hiera with puppet lookup like so:
[root#e64a2e5c7c79 /]# export FACTER_appname=HelloWorld
[root#e64a2e5c7c79 /]# hiera_config=/fisherman/fisherman/hiera/hiera.yaml
[root#e64a2e5c7c79 /]# modulepath=/fisherman/fisherman/modules
[root#e64a2e5c7c79 /]# puppet lookup --modulepath=$modulepath --hiera_config=$hiera_config --node agent.local --explain fisherman::create_new_component::component_name
Searching for "lookup_options"
Global Data Provider (hiera configuration version 5)
Using configuration "/fisherman/fisherman/hiera/hiera.yaml"
Hierarchy entry "Apps"
Path "/fisherman/fisherman/hiera/data/apps/.yaml"
Original path: "apps/%{facts.appname}.yaml"
Path not found
Environment Data Provider (hiera configuration version 5)
Using configuration "/etc/puppetlabs/code/environments/production/hiera.yaml"
Merge strategy hash
Hierarchy entry "Per-node data (yaml version)"
Path "/etc/puppetlabs/code/environments/production/data/nodes/.yaml"
Original path: "nodes/%{::trusted.certname}.yaml"
Path not found
Hierarchy entry "Other YAML hierarchy levels"
Path "/etc/puppetlabs/code/environments/production/data/common.yaml"
Original path: "common.yaml"
Path not found
Module data provider for module "fisherman" not found
Searching for "fisherman::create_new_component::component_name"
Global Data Provider (hiera configuration version 5)
Using configuration "/fisherman/fisherman/hiera/hiera.yaml"
Hierarchy entry "Apps"
Path "/fisherman/fisherman/hiera/data/apps/.yaml"
Original path: "apps/%{facts.appname}.yaml"
Path not found
Environment Data Provider (hiera configuration version 5)
Using configuration "/etc/puppetlabs/code/environments/production/hiera.yaml"
Hierarchy entry "Per-node data (yaml version)"
Path "/etc/puppetlabs/code/environments/production/data/nodes/.yaml"
Original path: "nodes/%{::trusted.certname}.yaml"
Path not found
Hierarchy entry "Other YAML hierarchy levels"
Path "/etc/puppetlabs/code/environments/production/data/common.yaml"
Original path: "common.yaml"
Path not found
Module data provider for module "fisherman" not found
Function lookup() did not find a value for the name 'fisherman::create_new_component::component_name'
I noticed this in the above output:
Hierarchy entry "Apps"
Path "/fisherman/fisherman/hiera/data/apps/.yaml"
Original path: "apps/%{facts.appname}.yaml"
Path not found
It looks like facts.appname is empty and not HelloWorld as I had expected.
What am I doing wrong here?
Thanks
Based on the information in the question I can't reproduce this. Here is my setup if it helps:
# init.pp
class test (
String $component_name,
) {
notify { $facts['appname']:
message => "Component name: $component_name for fact appname of ${facts['appname']}"
}
}
# hiera.yaml
---
version: 5
defaults:
datadir: data
data_hash: yaml_data
hierarchy:
- name: "Apps" # Uses custom facts.
path: "apps/%{facts.appname}.yaml"
# data/apps/HelloWorld.yaml
---
test::component_name: 'MyComponentName'
# spec/classes/test_spec.rb
require 'spec_helper'
describe 'test' do
let(:hiera_config) { 'spec/fixtures/hiera/hiera.yaml' }
let(:facts) {{ 'appname' => 'HelloWorld' }}
it {
is_expected.to contain_notify("HelloWorld")
.with({
'message' => "Component name: MyComponentName for fact appname of HelloWorld"
})
}
end
Tested on Puppet version:
â–¶ bundle exec puppet -V
6.6.0
Output:
â–¶ bundle exec rake spec
I, [2019-07-07T16:42:51.219559 #22140] INFO -- : Creating symlink from spec/fixtures/modules/test to /Users/alexharvey/git/home/puppet-test
/Users/alexharvey/.rvm/rubies/ruby-2.4.1/bin/ruby -I/Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-core-3.8.2/lib:/Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-support-3.8.2/lib /Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-core-3.8.2/exe/rspec --pattern spec/\{aliases,classes,defines,functions,hosts,integration,plans,tasks,type_aliases,types,unit\}/\*\*/\*_spec.rb
test
should contain Notify[HelloWorld] with message => "Component name: MyComponentName for fact appname of HelloWorld"
Finished in 0.1444 seconds (files took 0.9699 seconds to load)
1 example, 0 failures
You also can query the Hiera hierarchy directly using puppet lookup like this:
â–¶ FACTER_appname=HelloWorld bundle exec puppet lookup \
--hiera_config=spec/fixtures/hiera/hiera.yaml test::component_name
--- MyComponentName

Find and copy files with ansible

I want to make a script with ansible, to search on folder "/software" all the files, that were edited in the last day, then move them to "/tmp"
This my code so far (redhat 7.4):
- name: recurse path
find:
path: /software
recurse: yes
file_type: file
age: 1d
register: files
- name: copy files to tmp
copy:
src: ~/{{files}}
dest: /tmp
I get the error:
an exeption occurred during task execution ... could not find or access '~{u'files': []}, u'changed': False, 'failed': False, u'examined': 4....
The folders have full access, so I dont think its permissions.
What am I doing wrong?
As documented, the returned value of the find module is an object containing the list of found files under the files keyword.
You cannot use it directly: copy module takes only 1 file or folder as src, so you have to loop over all the files:
- name: check the content and structure of the variable
debug:
var: files
- name: copy files to tmp
copy:
src: "{{item.path}}"
dest: /tmp
with_items: "{{files.files}}"
It's always a good practice to debug a var after register while developing to check it's structure (so you can find how to use it) and it's content (in your case, looks like the file list is empty).
BTW: you need to know that the find module search on the remote host but, by default, the copy module copies from the ansible executor machine, so in your case, the src may not exists! So if you are copying from local to remote, simply add delegate_to: localhost and run_once: yes to your find task, otherwise you need to use the remote_src: yes parameter on the copy task.

puppet hiera fails to read yaml file

I am pretty new to puppet. I configured a hiera file, whose path is /etc/puppetlabs/puppet/hiera.yaml, as so
version: 5
hierarchy: []
backends:
- yaml
yaml:
- datadir: /etc/puppetlabs/puppet/some_dir
and I get this error
Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://docs.puppet.com/puppet/5.3/reference/deprecated_language.html
(file & line not available)
Error: Evaluation Error: Error while evaluating a Function Call, Lookup of key 'user_dir' failed: The Lookup Configuration at '/etc/puppetlabs/puppet/hiera.yaml' has wrong type, unrecognized key 'backends'
The Lookup Configuration at '/etc/puppetlabs/puppet/hiera.yaml' has wrong type, unrecognized key 'yaml' at /etc/puppetlabs/code/environments/production/manifests/site.pp:30:17 on node puppet,some_cluster_DNS.internal
Initially, I had those kinds of format for the keys :backends: :yaml: but it seemed it is not regular one for the 5 version, so that I deleted the : sign
Someone has an idea ?
First, in terms of that warning, you should definitely switch over to the Puppet lookup function from the Hiera hiera functions if you are using Hiera >= 4: https://puppet.com/docs/puppet/4.10/hiera_use_function.html
Second, in terms of that error, I would consult the documentation on how to setup a Hiera 5 config file: https://puppet.com/docs/puppet/4.10/hiera_config_yaml_5.html
Using the proper format, your config file would look like:
# /etc/puppetlabs/puppet/hiera.yaml
version: 5
defaults:
- data_hash: yaml_data
- datadir: /etc/puppetlabs/puppet/some_dir
hierarchy: []
What you are trying to do on the last line (specify a specific datadir for the yaml_data backend) is not allowed in Hiera 5. If you want to specify a datadir for a specific backend, then you need to specify a level of the hierarchy for just that backend (or just that backend's datadir; you can customize in several depths of matrices with it) and specify a datadir there. For example:
hierarchy:
- name: yaml data
data_hash: yaml_data
datadir: /etc/puppetlabs/puppet/some_dir
paths:
- "%{trusted.certname}.yaml"
- common.yaml

Resources