How to install spark-xml library using dbx - databricks

I am trying to install library spark-xml_2.12-0.15.0 using dbx.
The documentation I found is to include it on the conf/deployment.yml file like:
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "10.4.x-cpu-ml-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 2
build:
commands:
- "mvn clean package" #
environments:
default:
workflows:
- name: "charming-aurora-sample-jvm"
libraries:
- jar: "{{ 'file://' + dbx.get_last_modified_file('target/scala-2.12', 'jar') }}" #
tasks:
- task_key: "main"
<<: *basic-static-cluster
deployment_config: #
no_package: true
spark_jar_task:
main_class_name: "org.some.main.ClassName"
You may see documentation page here: https://dbx.readthedocs.io/en/latest/guides/jvm/jvm_devops/?h=maven
I have installed the library on the cluster using Maven file (https://mvnrepository.com/artifact/com.databricks/spark-xml_2.13/0.15.0):
<!-- https://mvnrepository.com/artifact/com.databricks/spark-xml -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.13</artifactId>
<version>0.15.0</version>
</dependency>
I can use it on a notebook level but not from a job deployed using dbx.
Edit
I am using PySpark .
So, I included it as this at conf/deployment.yml:
libraries:
- maven: "com.databricks:spark-xml_2.12:0.15.0"
On the file conf/deployment.yml
- name: "my-job"
libraries:
- maven:
- coordinates:"com.databricks:spark-xml_2.12:0.15.0"
tasks:
- task_key: "first_task"
<<: *basic-static-cluster
python_wheel_task:
package_name: "project_name"
entry_point: "jl" # take a look at the setup.py entry_points section for details on how to define an entrypoint
parameters: ["--conf-file", "file:fuse://conf/tasks/my_job_config.yml"]
Then I go with
dbx deploy my-job
This throwing the following error:
HTTPError: 400 Client Error: Bad Request for url: https://adb-xxxx.azuredatabricks.net/api/2.0/jobs/reset
Response from server:
{ 'error_code': 'MALFORMED_REQUEST',
'message': "Could not parse request object: Expected 'START_OBJECT' not "
"'START_ARRAY'\n"
' at [Source: (ByteArrayInputStream); line: 1, column: 91]\n'
' at [Source: java.io.ByteArrayInputStream#37fda06f; line: 1, '
'column: 91]'}

You were pretty close, and the error you've run into doesn't really say much.
We plan to introduce structure verification to make such that checks are more understandable.
The correct deployment file structure should look as follows:
- name: "my-job"
tasks:
- task_key: "first_task"
<<: *basic-static-cluster
# please note that libraries section is on the task level
libraries:
- maven:
coordinates:"com.databricks:spark-xml_2.12:0.15.0"
python_wheel_task:
package_name: "project_name"
entry_point: "jl" # take a look at the setup.py entry_points section for details on how to define an entrypoint
parameters: ["--conf-file", "file:fuse://conf/tasks/my_job_config.yml"]
Two important points here:
libraries section is on the task level
maven section expects an object, not a list, therefore this will not work:
#THIS IS INCORRECT DON'T DO THIS
libraries:
- maven:
- coordinates:"com.databricks:spark-xml_2.12:0.15.0"
But this will:
# correct structure
libraries:
- maven:
coordinates:"com.databricks:spark-xml_2.12:0.15.0"
I've summarized these detail in this new documentation section.

The documentation says following:
The workflows section of the deployment file fully follows the Databricks Jobs API structures.
If you look into API documentation, you will see that you need to use maven instead of file, and provide Maven coordinate as a string. Something like this (please note that you need to use Scala 2.12, not 2.13):
libraries:
- maven:
coordinates: "com.databricks:spark-xml_2.12:0.15.0"

Related

Serverless Error: No file matches include / exclude patterns

I know there are a lot of similar questions out there, but none of them has a proper answer. I am trying to deploy my code using GitLab cicd pipeline. While executing the deployment stage, my pipeline failed and got this error.
My serverless.yml has this code related to exclude
package:
patterns:
- '!nltk'
- '!node_modules/**'
- '!package-lock.json'
- '!package.json'
- '!__pycache__/**'
- '!.gitlab-ci.yml'
- '!tests/**'
- '!README.md'
The error I am getting is
Serverless Error ----------------------------------------
No file matches include / exclude patterns
I forgot to mention, I have a nltk layer which I am deploying in the same serverless.yml as my lambda function and other resources.
I am not sure what has to be done exactly to get rid of the error. Any help would be appreciated. thank you.
Your directives do not define any inclusive patterns. Perhaps you want to list the files & directories you need packaged. Each directive builds on the next.
Something like:
package:
patterns:
- "**/**"
- '!nltk'
- '!node_modules/**'
- '!package-lock.json'
- '!package.json'
- '!__pycache__/**'
- '!.gitlab-ci.yml'
- '!tests/**'
- '!README.md'
See https://www.serverless.com/framework/docs/providers/aws/guide/packaging/#patterns

Azure DevOps AssemblyInfo task errors with The specified version string does not conform to the required format - major[.minor[.build[.revision]]]

I'm trying to use the following Azure DevOps task in my build.
https://marketplace.visualstudio.com/items?itemName=bleddynrichards.Assembly-Info-Task
My task looks like this:
- task: Assembly-Info-NetCore#3
inputs:
Path: '$(Build.SourcesDirectory)'
FileNames: '**/*.csproj'
InsertAttributes: true
FileEncoding: 'auto'
WriteBOM: false
VersionNumber: '1.0.$(Build.BuildNumber)'
FileVersionNumber: '1.0.$(Build.BuildNumber)'
PackageVersion: '1.0.$(Build.BuildNumber)'
LogLevel: 'verbose'
FailOnWarning: false
DisableTelemetry: false
When the task runs, I see it works:
Assembly neutral language:
Assembly version: 1.0.20220223.2
Assembly file version: 1.0.20220223.2
Informational version:
When I tried to build the app, I get the following error:
AssemblyInfo.cs(15,59): warning CS7035: The specified version string does not conform to the recommended format - major.minor.build.revision...specific csproj
There is a hard limit on the max size of each component of the version number for assemblyversion(possibly AssemblyFileVersion too), UInt16.MaxValue/65535, and anything above that will cause a compile error.
This is detailed here -> https://learn.microsoft.com/en-us/dotnet/api/system.reflection.assemblyversionattribute?view=net-6.0 in the remark section.
I had a similar thing and had to rework the build number format to get it to work.

Not able to look up class parameter in hiera

I have look at other questions like Using hiera to set class parameters? and others which discusses hiera 3. I am using hiera 5.
Here is my hiera.yaml
[root#e64a2e5c7c79 fisherman]# cat /fisherman/fisherman/hiera/hiera.yaml
---
version: 5
defaults: # Used for any hierarchy level that omits these keys.
datadir: data # This path is relative to hiera.yaml's directory.
data_hash: yaml_data # Use the built-in YAML backend.
hierarchy:
- name: "Apps" # Uses custom facts.
path: "apps/%{facts.appname}.yaml"
I also have this hiera data file:
[root#e64a2e5c7c79 fisherman]# cat /fisherman/fisherman/hiera/apps/HelloWorld.yaml
---
fisherman::create_new_component::component_name: 'HelloWord'
But when I run my puppet agent like so ...
export FACTER_appname=HelloWorld
hiera_config=/fisherman/fisherman/hiera/hiera.yaml
modulepath=/fisherman/fisherman/modules
puppet apply --modulepath=$modulepath --hiera_config=$hiera_config -e 'include fisherman'
... I get this error ...
Error: Evaluation Error: Error while evaluating a Function Call, Class[Fisherman::Create_new_component]: expects a value for parameter $component_name (file: /fisherman/fisherman/modules/fish
erman/manifests/init.pp, line: 12, column: 9) on node e64a2e5c7c79
I tried debugging hiera with puppet lookup like so:
[root#e64a2e5c7c79 /]# export FACTER_appname=HelloWorld
[root#e64a2e5c7c79 /]# hiera_config=/fisherman/fisherman/hiera/hiera.yaml
[root#e64a2e5c7c79 /]# modulepath=/fisherman/fisherman/modules
[root#e64a2e5c7c79 /]# puppet lookup --modulepath=$modulepath --hiera_config=$hiera_config --node agent.local --explain fisherman::create_new_component::component_name
Searching for "lookup_options"
Global Data Provider (hiera configuration version 5)
Using configuration "/fisherman/fisherman/hiera/hiera.yaml"
Hierarchy entry "Apps"
Path "/fisherman/fisherman/hiera/data/apps/.yaml"
Original path: "apps/%{facts.appname}.yaml"
Path not found
Environment Data Provider (hiera configuration version 5)
Using configuration "/etc/puppetlabs/code/environments/production/hiera.yaml"
Merge strategy hash
Hierarchy entry "Per-node data (yaml version)"
Path "/etc/puppetlabs/code/environments/production/data/nodes/.yaml"
Original path: "nodes/%{::trusted.certname}.yaml"
Path not found
Hierarchy entry "Other YAML hierarchy levels"
Path "/etc/puppetlabs/code/environments/production/data/common.yaml"
Original path: "common.yaml"
Path not found
Module data provider for module "fisherman" not found
Searching for "fisherman::create_new_component::component_name"
Global Data Provider (hiera configuration version 5)
Using configuration "/fisherman/fisherman/hiera/hiera.yaml"
Hierarchy entry "Apps"
Path "/fisherman/fisherman/hiera/data/apps/.yaml"
Original path: "apps/%{facts.appname}.yaml"
Path not found
Environment Data Provider (hiera configuration version 5)
Using configuration "/etc/puppetlabs/code/environments/production/hiera.yaml"
Hierarchy entry "Per-node data (yaml version)"
Path "/etc/puppetlabs/code/environments/production/data/nodes/.yaml"
Original path: "nodes/%{::trusted.certname}.yaml"
Path not found
Hierarchy entry "Other YAML hierarchy levels"
Path "/etc/puppetlabs/code/environments/production/data/common.yaml"
Original path: "common.yaml"
Path not found
Module data provider for module "fisherman" not found
Function lookup() did not find a value for the name 'fisherman::create_new_component::component_name'
I noticed this in the above output:
Hierarchy entry "Apps"
Path "/fisherman/fisherman/hiera/data/apps/.yaml"
Original path: "apps/%{facts.appname}.yaml"
Path not found
It looks like facts.appname is empty and not HelloWorld as I had expected.
What am I doing wrong here?
Thanks
Based on the information in the question I can't reproduce this. Here is my setup if it helps:
# init.pp
class test (
String $component_name,
) {
notify { $facts['appname']:
message => "Component name: $component_name for fact appname of ${facts['appname']}"
}
}
# hiera.yaml
---
version: 5
defaults:
datadir: data
data_hash: yaml_data
hierarchy:
- name: "Apps" # Uses custom facts.
path: "apps/%{facts.appname}.yaml"
# data/apps/HelloWorld.yaml
---
test::component_name: 'MyComponentName'
# spec/classes/test_spec.rb
require 'spec_helper'
describe 'test' do
let(:hiera_config) { 'spec/fixtures/hiera/hiera.yaml' }
let(:facts) {{ 'appname' => 'HelloWorld' }}
it {
is_expected.to contain_notify("HelloWorld")
.with({
'message' => "Component name: MyComponentName for fact appname of HelloWorld"
})
}
end
Tested on Puppet version:
▶ bundle exec puppet -V
6.6.0
Output:
▶ bundle exec rake spec
I, [2019-07-07T16:42:51.219559 #22140] INFO -- : Creating symlink from spec/fixtures/modules/test to /Users/alexharvey/git/home/puppet-test
/Users/alexharvey/.rvm/rubies/ruby-2.4.1/bin/ruby -I/Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-core-3.8.2/lib:/Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-support-3.8.2/lib /Users/alexharvey/.rvm/gems/ruby-2.4.1/gems/rspec-core-3.8.2/exe/rspec --pattern spec/\{aliases,classes,defines,functions,hosts,integration,plans,tasks,type_aliases,types,unit\}/\*\*/\*_spec.rb
test
should contain Notify[HelloWorld] with message => "Component name: MyComponentName for fact appname of HelloWorld"
Finished in 0.1444 seconds (files took 0.9699 seconds to load)
1 example, 0 failures
You also can query the Hiera hierarchy directly using puppet lookup like this:
▶ FACTER_appname=HelloWorld bundle exec puppet lookup \
--hiera_config=spec/fixtures/hiera/hiera.yaml test::component_name
--- MyComponentName

Puppet - Multiple Roles in Hiera

I'm trying (and struggling) to get a (multiple) role model implemented in Hiera.
I've worked in the last 2 years with exact the same model as a user and now want to rebuild the same structure on my own. For example, my node.yaml should contain only the roles I want to apply onto the host:
/etc/puppetlabs/code/environments/production/nodes/my.host.de.yaml
classes:
- ydixken_baseinstall
- additional_modules
[...]
For me it's way more intuitive, to place a yaml in the roles/ directory, with the name of the role, and avoid dealing with profiles:
/etc/puppetlabs/code/environments/production/roles/ydixken_baseinstall.yaml
classes:
- apt
- unattended_upgrades
- [...]
apt::update:
frequency: 'daily'
loglevel: 'debug'
[...]
Placing the role definitions as a node fact is not practicable for me. It's also nice-to-have to allow a customization of the already defined values inside of the node configuration, if needed.
Right now my directory, hiera.yaml & file-structure looks like this:
/etc/puppetlabs/puppet/hiera.yaml
version: 5
defaults:
datadir: /etc/puppetlabs/code/environments/production
data_hash: yaml_data
hierarchy:
- name: "Per-node data (yaml version)"
paths:
- "nodes/%{fqdn}.yaml"
- "roles/%{role}.yaml"
- common
/etc/puppetlabs/code/environments/production/hiera.yaml
version: 5
defaults:
hierarchy:
- name: "FQDN"
path: "nodes/%{fqdn}.yaml"
- name: "Roles"
path: "roles/%{role}.yaml"
- name: "Common Data"
path: "common.yaml"
/etc/puppetlabs/code/environments/production/manifests/site.pp
hiera_include('classes')
How can i achieve this?
My current error:
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::ydixken_baseinstall for my.host.de (file: /etc/puppetlabs/code/environments/production/manifests/site.pp, line: 1, column: 1) on node my.host.de
I've found exactly, what i was looking for: r10k

puppet hiera fails to read yaml file

I am pretty new to puppet. I configured a hiera file, whose path is /etc/puppetlabs/puppet/hiera.yaml, as so
version: 5
hierarchy: []
backends:
- yaml
yaml:
- datadir: /etc/puppetlabs/puppet/some_dir
and I get this error
Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://docs.puppet.com/puppet/5.3/reference/deprecated_language.html
(file & line not available)
Error: Evaluation Error: Error while evaluating a Function Call, Lookup of key 'user_dir' failed: The Lookup Configuration at '/etc/puppetlabs/puppet/hiera.yaml' has wrong type, unrecognized key 'backends'
The Lookup Configuration at '/etc/puppetlabs/puppet/hiera.yaml' has wrong type, unrecognized key 'yaml' at /etc/puppetlabs/code/environments/production/manifests/site.pp:30:17 on node puppet,some_cluster_DNS.internal
Initially, I had those kinds of format for the keys :backends: :yaml: but it seemed it is not regular one for the 5 version, so that I deleted the : sign
Someone has an idea ?
First, in terms of that warning, you should definitely switch over to the Puppet lookup function from the Hiera hiera functions if you are using Hiera >= 4: https://puppet.com/docs/puppet/4.10/hiera_use_function.html
Second, in terms of that error, I would consult the documentation on how to setup a Hiera 5 config file: https://puppet.com/docs/puppet/4.10/hiera_config_yaml_5.html
Using the proper format, your config file would look like:
# /etc/puppetlabs/puppet/hiera.yaml
version: 5
defaults:
- data_hash: yaml_data
- datadir: /etc/puppetlabs/puppet/some_dir
hierarchy: []
What you are trying to do on the last line (specify a specific datadir for the yaml_data backend) is not allowed in Hiera 5. If you want to specify a datadir for a specific backend, then you need to specify a level of the hierarchy for just that backend (or just that backend's datadir; you can customize in several depths of matrices with it) and specify a datadir there. For example:
hierarchy:
- name: yaml data
data_hash: yaml_data
datadir: /etc/puppetlabs/puppet/some_dir
paths:
- "%{trusted.certname}.yaml"
- common.yaml

Resources