Airflow triggering Spark application results in error "Too large frame" - apache-spark

I have a Docker-compose pipeline with containers for Airflow and for Spark. I want to schedule a SparkSubmitOperator job, but it fails with the error java.lang.IllegalArgumentException: Too large frame: 5211883372140375593.
The Spark application consists only of creating a Spark session (I already commented out all other stuff). When I manually run the Spark app (by going to the bash of the Spark container and executing a spark-submit), everything works fine! Also, when I don't create a Spark session but just a SparkContext, it works!
Here is my docker-compose.yml:
version: '3'
x-airflow-common:
&airflow-common
build: ./airflow/
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.0.2}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'false'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
AIRFLOW__CORE__DEFAULT_TIMEZONE: 'Europe/Berlin'
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
- ./airflow/plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
networks:
- app-tier
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
container_name: airflowPostgres
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
networks:
- app-tier
redis:
container_name: airflowRedis
image: redis:latest
ports:
- 6380:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
networks:
- app-tier
airflow-webserver:
<<: *airflow-common
container_name: airflowWebserver
command: webserver
ports:
- 8081:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
container_name: airflowScheduler
command: scheduler
restart: always
airflow-worker:
<<: *airflow-common
container_name: airflowWorker
command: celery worker
restart: always
airflow-init:
<<: *airflow-common
container_name: airflowInit
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
spark:
image: docker.io/bitnami/spark:3
user: root
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '8080:8080'
volumes:
- ./:/app
networks:
- app-tier
spark-worker-1:
image: docker.io/bitnami/spark:3
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
networks:
- app-tier
spark-worker-2:
image: docker.io/bitnami/spark:3
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
networks:
- app-tier
volumes:
postgres-db-volume:
networks:
app-tier:
driver: bridge
name: app-tier
My Airflow DAG:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
from functions import send_to_kafka, send_to_mongo
# * AIRFLOW ################################
# default arguments
default_args = {
'owner': 'daniel',
'start_date': datetime(2021, 5, 9),
'email': [''],
'email_on_failure': False,
'email_on_retry': False,
"retries": 3,
"retry_delay": timedelta(minutes = 1)
}
# * spark DAG
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
dag_spark = DAG('spark',
description = '', catchup = False, schedule_interval = "#once", default_args = default_args)
s1 = SparkSubmitOperator(
task_id = "spark-job",
application = "/opt/airflow/dags/application.py",
conn_id = "spark_default", # defined under Admin/Connections in Airflow webserver
packages = "org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,postgresql:postgresql:9.1-901-1.jdbc4",
dag = dag_spark
)
My application (application.py) which does NOT work:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("myApp") \
.getOrCreate()
The application which DOES work:
from pyspark import SparkContext
sc = SparkContext("local", "First App")
The connection defined in the Admin menu of Airflow:
And here is the log created by the DAG: https://pastebin.com/FMW3kJ9g
Any ideas why this fails?

Problem was solved by adding a .master("local") to the SparkSession.

Related

could not translate host name to address (Data lineage- tokern)

version: '3.6'
services:
tokern-demo-catalog:
image: tokern/demo-catalog:latest
container_name: tokern-demo-catalog
restart: unless-stopped
networks:
- tokern-internal
volumes:
- tokern_demo_catalog_data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: xxx
POSTGRES_USER: xxx
POSTGRES_DB: table1
tokern-api:
image: tokern/data-lineage:latest
container_name: tokern-data-lineage
restart: unless-stopped
networks:
- tokern-internal
environment:
CATALOG_PASSWORD: xxx
CATALOG_USER: xxx
CATALOG_DB: table1
CATALOG_HOST: "xxxxxxxx.amazon.com"
GUNICORN_CMD_ARGS: "--bind 0.0.0.0:4142"
toker-viz:
image: tokern/data-lineage-viz:latest
container_name: tokern-data-lineage-visualizer
restart: unless-stopped
networks:
- tokern-internal
- tokern-net
ports:
- "39284:80"
networks:
tokern-net: # Exposed by your host.
# external: true
name: "tokern-net"
driver: bridge
ipam:
driver: default
config:
- subnet: 10.10.0.0/24
tokern-internal:
name: "tokern-internal"
driver: bridge
internal: true
ipam:
driver: default
config:
- subnet: 10.11.0.0/24
volumes:
tokern_demo_catalog_data:
trying to implement data lineage into my database
i have followed according to this documentation "https://pypi.org/project/data-lineage/" and https://tokern.io/docs/data-lineage/installation/
not able to solve this error
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "xxx.amazonaws.com" to address: Temporary failure in name resolution

Taigaio when i create a project the page doesn't refresh automaticaly

Hello i try to configure a linux server for use Taigaio project and when i try to create, delete or save a modification the page doesn't auto refresh (he turn in void) but when i do by myself (CTRL + Maj + R or F5) i can see the data. did i forget to configure something ? i have install rabbitMq and configure like the doc.
exemple :
exemple
this is my docker-compose.yml :
version: "3.5"
x-environment:
&default-back-environment
# Database settings
POSTGRES_DB: taiga
POSTGRES_USER: taiga
POSTGRES_PASSWORD: taiga
POSTGRES_HOST: taiga-db
# Taiga settings
TAIGA_SECRET_KEY: "taiga-back-secret-key"
TAIGA_SITES_DOMAIN: "192.168.60.25:9000"
TAIGA_SITES_SCHEME: "http"
# Email settings. Uncomment following lines and configure your SMTP server
# EMAIL_BACKEND: "django.core.mail.backends.smtp.EmailBackend"
# DEFAULT_FROM_EMAIL: "no-reply#example.com"
# EMAIL_USE_TLS: "False"
# EMAIL_USE_SSL: "False"
# EMAIL_HOST: "smtp.host.example.com"
# EMAIL_PORT: 587
# EMAIL_HOST_USER: "user"
# EMAIL_HOST_PASSWORD: "password"
# Rabbitmq settings
# Should be the same as in taiga-async-rabbitmq and taiga-events-rabbitmq
RABBITMQ_USER: taiga
RABBITMQ_PASS: taiga
# Telemetry settings
ENABLE_TELEMETRY: "True"
x-volumes:
&default-back-volumes
- taiga-static-data:/taiga-back/static
- taiga-media-data:/taiga-back/media
# - ./config.py:/taiga-back/settings/config.py
services:
taiga-db:
image: postgres:12.3
environment:
POSTGRES_DB: taiga
POSTGRES_USER: taiga
POSTGRES_PASSWORD: taiga
volumes:
- taiga-db-data:/var/lib/postgresql/data
networks:
- taiga
taiga-back:
image: taigaio/taiga-back:latest
environment: *default-back-environment
volumes: *default-back-volumes
networks:
- taiga
depends_on:
- taiga-db
- taiga-events-rabbitmq
- taiga-async-rabbitmq
taiga-async:
image: taigaio/taiga-back:latest
entrypoint: ["/taiga-back/docker/async_entrypoint.sh"]
environment: *default-back-environment
volumes: *default-back-volumes
networks:
- taiga
depends_on:
- taiga-db
- taiga-back
- taiga-async-rabbitmq
taiga-async-rabbitmq:
image: rabbitmq:3-management-alpine
environment:
RABBITMQ_ERLANG_COOKIE: secret-erlang-cookie
RABBITMQ_DEFAULT_USER: taiga
RABBITMQ_DEFAULT_PASS: taiga
RABBITMQ_DEFAULT_VHOST: taiga
volumes:
- taiga-async-rabbitmq-data:/var/lib/rabbitmq
networks:
- taiga
taiga-front:
image: taigaio/taiga-front:latest
environment:
TAIGA_URL: "http://192.168.60.25:9000"
TAIGA_WEBSOCKETS_URL: "ws://192.168.60.25:9000"
networks:
- taiga
# volumes:
# - ./conf.json:/usr/share/nginx/html/conf.json
taiga-events:
image: taigaio/taiga-events:latest
environment:
RABBITMQ_USER: taiga
RABBITMQ_PASS: taiga
TAIGA_SECRET_KEY: "taiga-back-secret-key"
networks:
- taiga
depends_on:
- taiga-events-rabbitmq
taiga-events-rabbitmq:
image: rabbitmq:3-management-alpine
environment:
RABBITMQ_ERLANG_COOKIE: secret-erlang-cookie
RABBITMQ_DEFAULT_USER: taiga
RABBITMQ_DEFAULT_PASS: taiga
RABBITMQ_DEFAULT_VHOST: taiga
volumes:
- taiga-events-rabbitmq-data:/var/lib/rabbitmq
networks:
- taiga
taiga-protected:
image: taigaio/taiga-protected:latest
environment:
MAX_AGE: 360
SECRET_KEY: "taiga-back-secret-key"
networks:
- taiga
taiga-gateway:
image: nginx:1.19-alpine
ports:
- "9000:80"
volumes:
- ./taiga-gateway/taiga.conf:/etc/nginx/conf.d/default.conf
- taiga-static-data:/taiga/static
- taiga-media-data:/taiga/media
networks:
- taiga
depends_on:
- taiga-front
- taiga-back
- taiga-events
volumes:
taiga-static-data:
taiga-media-data:
taiga-db-data:
taiga-async-rabbitmq-data:
taiga-events-rabbitmq-data:
networks:
taiga:

Grafana Loki does not trigger or push alert on alertmanager

I have configured PLG (Promtail, Grafana & Loki) on an AWS EC2 instance for log management. The Loki uses BoltDB shipper & AWS store.
Grafana - 7.4.5,
Loki - 2.2,
Prommtail - 2.2,
AlertManager - 0.21
The issue I am facing is that the Loki does not trigger or push alerts on alertmanager. I cannot see any alert on the AlertManager dashboard though I can run a LogQL query on Grafana which shows the condition was met for triggering an alert.
The following is a screenshot of my query on Grafana.
LogQL Query Screenshot
The following are my configs.
Docker Compose
$ cat docker-compose.yml
version: "3.4"
services:
alertmanager:
image: prom/alertmanager:v0.21.0
container_name: alertmanager
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
volumes:
- ./config/alertmanager/alertmanager.yml:/etc/alertmanager/config.yml
ports:
- 9093:9093
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
loki:
image: grafana/loki:2.2.0-amd64
container_name: loki
volumes:
- ./config/loki/loki.yml:/etc/config/loki.yml:ro
- ./config/loki/rules/rules.yml:/etc/loki/rules/rules.yml
entrypoint:
- /usr/bin/loki
- -config.file=/etc/config/loki.yml
ports:
- "3100:3100"
depends_on:
- alertmanager
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
grafana:
image: grafana/grafana:7.4.5
container_name: grafana
volumes:
- ./config/grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml
- ./config/grafana/defaults.ini:/usr/share/grafana/conf/defaults.ini
- grafana:/var/lib/grafana
ports:
- "3000:3000"
depends_on:
- loki
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
promtail:
image: grafana/promtail:2.2.0-amd64
container_name: promtail
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers
- /var/log:/var/log
- ./config/promtail/promtail.yml:/etc/promtail/promtail.yml:ro
command: -config.file=/etc/promtail/promtail.yml
restart: unless-stopped
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
tag: "{{.Name}}"
networks:
- loki-br
nginx:
image: nginx:latest
container_name: nginx
volumes:
- ./config/nginx/nginx.conf:/etc/nginx/nginx.conf
- ./config/nginx/default.conf:/etc/nginx/conf.d/default.conf
- ./config/nginx/loki.conf:/etc/nginx/conf.d/loki.conf
- ./config/nginx/ssl:/etc/ssl
ports:
- "80:80"
- "443:443"
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "10m"
loki-url: http://localhost:3100/loki/api/v1/push
loki-external-labels: job=containerlogs
tag: "{{.Name}}"
depends_on:
- grafana
networks:
- loki-br
networks:
loki-br:
driver: bridge
ipam:
config:
- subnet: 192.168.0.0/24
volumes:
grafana: {}
Loki Config
$ cat config/loki/loki.yml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-11-20
store: boltdb-shipper
#object_store: filesystem
object_store: s3 # Config for AWS S3 storage.
schema: v11
index:
prefix: index_loki_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: s3 # Config for AWS S3 storage.
#filesystem:
# directory: /tmp/loki/chunks
# Config for AWS S3 storage.
aws:
s3: s3://eu-west-1/loki #Uses AWS IAM roles on AWS EC2 instance.
region: eu-west-1
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: aws
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 720h
ruler:
storage:
type: local
local:
directory: /etc/loki/rules
rule_path: /tmp/loki/rules-temp
evaluation_interval: 1m
alertmanager_url: http://alertmanager:9093
ring:
kvstore:
store: inmemory
enable_api: true
enable_alertmanager_v2: true
Loki Rules
$ cat config/loki/rules/rules.yml
groups:
- name: rate-alerting
rules:
- alert: HighLogRate
expr: |
sum by (job, compose_service)
(rate({job="containerlogs"}[1m]))
> 60
for: 1m
labels:
severity: warning
team: devops
category: logs
annotations:
title: "High LogRate Alert"
description: "something is logging a lot"
impact: "impact"
action: "action"
dashboard: "https://grafana.com/service-dashboard"
runbook: "https://wiki.com"
logurl: "https://grafana.com/log-explorer"
AlertManager config
$ cat config/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity', 'instance']
group_wait: 45s
group_interval: 10m
repeat_interval: 12h
receiver: 'email-notifications'
receivers:
- name: email-notifications
email_configs:
- to: me#example.com
from: 'alerts#example.com'
smarthost: smtp.gmail.com:587
auth_username: alerts#example.com
auth_identity: alerts#example.com
auth_password: PassW0rD
send_resolved: true
Let me know if I am missing something. I followed Ruan Bekker's blog to set things up
If Loki is running in single tenant mode, the required ID is fake (yes we know this might seem alarming but it’s totally fine, no it can’t be changed).
mkdir /etc/loki/rules/fake
mkdir /tmp/loki/rules-temp/fake
copy your rule files into /etc/loki/rules/fake
So you have to add a fake sub-directory to the rule directory in single tenant mode and everthing worked perfectly.
https://grafana.com/docs/loki/latest/alerting/#interacting-with-the-ruler

Fix DNS on a docker-compose selenium grid so the selenium node connects to a docker-compose hostname

I have a selenium grid running under docker-compose on a Jenkins machine. My docker-compose includes a simple web server that serves up a single page application, and a test-runner container that orchestrates tests.
version: "3"
services:
hub:
image: selenium/hub
networks:
- selenium
privileged: true
restart: unless-stopped
container_name: hub
ports:
- "4444:4444"
environment:
- SE_OPTS=-browserTimeout 10 -timeout 20
chrome:
image: selenium/node-chrome-debug
networks:
- selenium
privileged: true
restart: unless-stopped
volumes:
- /dev/shm:/dev/shm
depends_on:
- hub
environment:
- HUB_HOST=hub
- HUB_PORT=4444
- SE_OPTS=-browserTimeout 10 -timeout 20
ports:
- "5900:5900"
firefox:
image: selenium/node-firefox-debug
networks:
- selenium
privileged: true
restart: unless-stopped
volumes:
- /dev/shm:/dev/shm
depends_on:
- hub
environment:
- HUB_HOST=hub
- HUB_PORT=4444
- SE_OPTS=-browserTimeout 10 -timeout 20
ports:
- "5901:5900"
runner:
build:
context: ./
dockerfile: ./python.dockerfile
security_opt:
- seccomp=unconfined
cap_add:
- SYS_PTRACE
command: sleep infinity
networks:
- selenium
volumes:
- ./:/app
depends_on:
- hub
- app
- chrome
- firefox
environment:
HUB_CONNECTION_STRING: http://hub:4444/wd/hub
TEST_DOMAIN: "app"
app:
image: nginx:alpine
networks:
- selenium
volumes:
- ../dist:/usr/share/nginx/html
ports:
- "8081:80"
networks:
selenium:
When my tests run (in the runner container above) I can load the home page as long as I use an ip address -
def test_home_page_loads(self):
host = socket.gethostbyname(self.test_domain) // this is the TEST_DOMAIN env var above
self.driver.get(f"http://{host}")
header = WebDriverWait(self.driver, 40).until(
EC.presence_of_element_located((By.ID, 'welcome-message')))
assert(self.driver.title == "My Page Title")
assert(header.text == "My Header")
But I can't use the host name app. The following times out -
def test_home_page_with_hostname(self):
self.driver.get("http://app/")
email = WebDriverWait(self.driver, 10).until(
EC.presence_of_element_located((By.ID, 'email')))
The problem I'm facing is that I can't do all this using IP addresses because the web app is connecting to an external IP and I need to configure the API for CORS requests.
I'd assumed the problem was that the chrome container couldn't reach the app container - the issue was that the web server on the app container wasn't serving pages for the hostname I was using. Updating the Nginx conf to include the correct server has fixed the issue.
I can now add the hostname to the access-control-allow-origin settings on the api's that the webpage is using.
I'm attaching a basic working config here for anyone else looking to do something similar.
docker-compose.yml
version: "3"
services:
hub:
image: selenium/hub
networks:
- selenium
privileged: true
restart: unless-stopped
container_name: hub
ports:
- "4444:4444"
environment:
- SE_OPTS=-browserTimeout 10 -timeout 20
chrome:
image: selenium/node-chrome-debug
networks:
- selenium
privileged: true
restart: unless-stopped
volumes:
- /dev/shm:/dev/shm
depends_on:
- hub
environment:
- HUB_HOST=hub
- HUB_PORT=4444
- SE_OPTS=-browserTimeout 10 -timeout 20
ports:
- "5900:5900"
firefox:
image: selenium/node-firefox-debug
networks:
- selenium
privileged: true
restart: unless-stopped
volumes:
- /dev/shm:/dev/shm
depends_on:
- hub
environment:
- HUB_HOST=hub
- HUB_PORT=4444
- SE_OPTS=-browserTimeout 10 -timeout 20
ports:
- "5901:5900"
runner:
build:
context: ./
dockerfile: ./python.dockerfile
security_opt:
- seccomp=unconfined
cap_add:
- SYS_PTRACE
command: sleep infinity
networks:
- selenium
volumes:
- ./:/app
depends_on:
- hub
- webserver
- chrome
- firefox
environment:
HUB_CONNECTION_STRING: http://hub:4444/wd/hub
TEST_DOMAIN: "webserver"
webserver:
image: nginx:alpine
networks:
- selenium
volumes:
- ../dist:/usr/share/nginx/html
- ./nginx_conf:/etc/nginx/conf.d
ports:
- "8081:80"
networks:
selenium:
default.conf
server {
listen 80;
server_name webserver;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
}
The 'runner' container is based on the docker image from python:3 and includes pytest. A simple working test looks like -
test.py
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import os
import pytest
import socket
#Fixture for Chrome
#pytest.fixture(scope="class")
def chrome_driver_init(request):
hub_connection_string = os.getenv('HUB_CONNECTION_STRING')
test_domain = os.getenv('TEST_DOMAIN')
chrome_driver = webdriver.Remote(
command_executor=hub_connection_string,
desired_capabilities={
'browserName': 'chrome',
'version': '',
"chrome.switches": ["disable-web-security"],
'platform': 'ANY'})
request.cls.driver = chrome_driver
request.cls.test_domain = test_domain
yield
chrome_driver.close()
#pytest.mark.usefixtures("chrome_driver_init")
class Basic_Chrome_Test:
driver = None
test_domain = None
pass
class Test_Atlas(Basic_Chrome_Test):
def test_home_page_loads(self):
self.driver.get(f"http://{self.test_domain}")
header = WebDriverWait(self.driver, 40).until(
EC.presence_of_element_located((By.ID, 'welcome-message')))
assert(self.driver.title == "My Page Title")
assert(header.text == "My Header")
This can be run with something like docker exec -it $(docker-compose ps -q runner) pytest test.py (exec into the runner container and run the tests using pytest).
This framework can then be added to a Jenkins step -
Jenkinsfile
stage('Run Functional Tests') {
steps {
echo 'Running Selenium Grid'
dir("${env.WORKSPACE}/functional_testing") {
sh "/usr/local/bin/docker-compose -f ${env.WORKSPACE}/functional_testing/docker-compose.yml -p ${currentBuild.displayName} run runner ./wait-for-webserver.sh pytest tests/atlas_test.py"
}
}
}
wait-for-webserver.sh
#!/bin/bash
# wait-for-webserver.sh
set -e
cmd="$#"
while ! curl -sSL "http://hub:4444/wd/hub/status" 2>&1 \
| jq -r '.value.ready' 2>&1 | grep "true" >/dev/null; do
echo 'Waiting for the Grid'
sleep 1
done
while [[ "$(curl -s -o /dev/null -w ''%{http_code}'' http://webserver)" != "200" ]]; do
echo 'Waiting for Webserver'
sleep 1;
done
>&2 echo "Grid & Webserver are ready - executing tests"
exec $cmd
Hope this is useful for someone.

how to create keyspace for cassandra using docker compose v3

I am trying to create keyspace using docker-compose v3, but it is not working out, my docker-compose.yaml looks like following :
version: '3'
services:
cassandra:
image: cassandra:latest
networks:
- default
ports:
- "9042:9042"
volumes:
- ../compi${COMPI}/data/cassandra:/var/lib/cassandra
- ../../sql:/compi/sql
- ../compi${COMPI}/docker-entrypoint-initdb.d:/compi/docker-entrypoint-initdb.d:ro
healthcheck:
test: ["CMD-SHELL", "[ $$(nodetool statusgossip) = running ]"]
interval: 30s
timeout: 10s
retries: 5
compi:
environment:
- DOCKER=true
depends_on:
- cassandra
links:
- cassandra
build:
context: ../..
dockerfile: ./docker.local/compi/Dockerfile
volumes:
- ../config:/compi/config
- ../compi${COMPI}/log:/compi/log
- ../compi${COMPI}/data:/compi/data
ports:
- "717${compi}:717${compi}"
volumes:
data:
config:
my docker-entrypoint-initdb.d/init.cql looks like following:
CREATE KEYSPACE IF NOT EXISTS sample WITH REPLICATION = {
'class''SimpleStrategy', 'replication_factor' : 1 } AND DURABLE_WRITES
= true;

Resources