Spark home access denied to Airflow - apache-spark

I am really stuck in this one. I am trying to run in local mode a Spark Submit Operator in Airflow. And while the access right to the needed folder have read and execute right for any user, I have the following error. Any help would be great please.
[2020-12-08 11:00:54,456] {base_hook.py:89} INFO - Using connection to: id: spark_local. Host: local[*], Port: None, Schema: None, Login: None, Password: None, extra: XXXXXXXX
[2020-12-08 11:00:54,458] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: /opt/spark/bin/ --master local[*] --name airflow-spark --queue root.default --deploy-mode client /home/ubuntu/market_risk/utils/etl.py
[2020-12-08 11:00:54,463] {taskinstance.py:1150} ERROR - [Errno 13] Permission denied: '/opt/spark/bin/'
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/home/ubuntu/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
self._hook.submit(self._application)
File "/home/ubuntu/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 395, in submit
**kwargs)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/opt/spark/bin/'
My setting for Spark connection in Airflow are as follow:
In airflow UI, I have not set any environment variables.

Related

Error 13 Permission Denied pysftp library

Error 13 Permission Error to a folder.
I am trying to write a code to upload a document to sftp host using python.
I am getting the following error [Error 13] Permission Denied. Can you assist, please? I am using Windows 10 Operating System
import pysftp
hostname = "host.com"
username = "username"
password = "password"
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
with pysftp.Connection(host = hostname, username = username, password = password, cnopts=cnopts) as sftp:
remoteFilePath = r'\Home\FolderName\file.txt'
localFilePath = r'C:\Users\user\Desktop\file.txt'
sftp.put(localFilePath,remoteFilePath)
Here's the error that I am getting:
File "<pyshell#23>", line 5, in <module>
sftp.put(localFilePath,remoteFilePath)
File "C:\Users\user\\AppData\Roaming\Python\Python39\site-packages\pysftp\__init__.py", line 363, in put
sftpattrs = self._sftp.put(localpath, remotepath, callback=callback,
File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\paramiko\sftp_client.py", line 759, in put
return self.putfo(fl, remotepath, file_size, callback, confirm)
File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\paramiko\sftp_client.py", line 714, in putfo
with self.file(remotepath, "wb") as fr:
File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\paramiko\sftp_client.py", line 372, in open
t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\paramiko\sftp_client.py", line 813, in _request
return self._read_response(num)
File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\paramiko\sftp_client.py", line 865, in _read_response
self._convert_status(msg)
File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\paramiko\sftp_client.py", line 896, in _convert_status
raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Permission denied

Get indices from elastic search server

I have installed elastic search through docker-compose on a machine. The version of elastic search in the yaml file is 1.11.0 and pyelasticsearch 7.6.0. I am trying to get all the indices from the elastic search server. The connection to the server is successful, I can see the message "Connection to ES Server successful", but I cannot get the indices. the line "es.indices.get_alias("*")" fails
try:
es = Elasticsearch("https://admin:admin#localhost:9200",
use_ssl = False,
ca_certs=False,
verify_certs=False)
print("Connection to ES Server successful")
except:
print("Unable to connect to server")
exit(1)
# get all wazuh indexes
lsindex = []
for i in es.indices.get_alias("*"):
if i.startswith(ALERTS_PREFIX):
lsindex.append(i)
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/usr/local/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 246, in perform_request
method, url, body, retries=Retry(False), headers=request_headers, **kw
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 344, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f6255479110>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./mcode.py", line 185, in <module>
aut_for_tool()
File "./mcode.py", line 78, in aut_for_tool
for i in es.indices.get("*"):
File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/indices.py", line 193, in get
"GET", _make_path(index), params=params, headers=headers
File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 390, in perform_request
raise e
File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 365, in perform_request
timeout=timeout,
File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 258, in perform_request
raise ConnectionError("N/A", str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6255479110>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6255479110>: Failed to establish a new connection: [Errno 111] Connection refused)
This is my code example:
#!/usr/bin/env python
import sys
try:
from elasticsearch import Elasticsearch
except Exception as e:
print("No module 'elasticsearch' found.")
sys.exit()
try:
es = Elasticsearch("https://admin:admin#localhost:9200",
use_ssl = True,
ca_certs=True,
verify_certs=False)
print("Connection to ES Server successful")
except:
print("Unable to connect to server")
exit(1)
# get all wazuh indexes
lsindex = []
for i in es.indices.get("*"):
print(str(i))
And it produces the following output:
# python3 test.py
.opendistro_security
security-auditlog-2021.02.19
wazuh-alerts-4.x-2021.02.19
Can I ask why are you getting all Wazuh indices in your script? Understanding your use case could help me to point you to a better solution. If it's something related to reporting, Opendistro for elasticsearch has a module for it.
Regards,
Alberto R

Trouble deploying django app to AWS lambda using zappa

I am trying to deploy my django app to AWS Lambda. I have viewed many tutorials on how to do so. However, whenever I run zappa deploy, I get the following error at the end:
Deploying API Gateway.. Error: Warning! Status check on the deployed lambda failed. A GET request to '/' yielded a 502 response code.
Running zappa tail gives me the following:
[1604786373569] [DEBUG] 2020-11-07T21:59:33.569Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc Read environment variables from: /var/task/my_cool_project/.env
[1604786373570] [DEBUG] 2020-11-07T21:59:33.569Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'SECRET_KEY' casted as 'None' with default '<NoValue>'
[1604786373582] [DEBUG] 2020-11-07T21:59:33.582Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'DB_ENGINE' casted as 'None' with default '<NoValue>'
[1604786373582] [DEBUG] 2020-11-07T21:59:33.582Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'DB_NAME' casted as 'None' with default '<NoValue>'
[1604786373582] [DEBUG] 2020-11-07T21:59:33.582Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'DB_USER' casted as 'None' with default '<NoValue>'
[1604786373582] [DEBUG] 2020-11-07T21:59:33.582Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'DB_PASSWORD' casted as 'None' with default '<NoValue>'
[1604786373582] [DEBUG] 2020-11-07T21:59:33.582Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'DB_HOST' casted as 'None' with default '<NoValue>'
[1604786373582] [DEBUG] 2020-11-07T21:59:33.582Z 1a7e8cd4-1bf2-4af4-b11b-2baa6c6691dc get 'DB_PORT' casted as 'None' with default '<NoValue>'
[1604786374005] [ERROR] NameError: name '_mysql' is not defined
Traceback (most recent call last):
File "/var/task/handler.py", line 609, in lambda_handler
return LambdaHandler.lambda_handler(event, context)
File "/var/task/handler.py", line 240, in lambda_handler
handler = cls()
File "/var/task/handler.py", line 146, in __init__
wsgi_app_function = get_django_wsgi(self.settings.DJANGO_SETTINGS)
File "/var/task/zappa/ext/django_zappa.py", line 20, in get_django_wsgi
return get_wsgi_application()
File "/var/task/django/core/wsgi.py", line 12, in get_wsgi_application
django.setup(set_prefix=False)
File "/var/task/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/var/task/django/apps/registry.py", line 114, in populate
app_config.import_models()
File "/var/task/django/apps/config.py", line 211, in import_models
self.models_module = import_module(models_module_name)
File "/var/lang/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/var/task/django/contrib/auth/models.py", line 2, in <module>
from django.contrib.auth.base_user import AbstractBaseUser, BaseUserManager
File "/var/task/django/contrib/auth/base_user.py", line 48, in <module>
class AbstractBaseUser(models.Model):
File "/var/task/django/db/models/base.py", line 122, in __new__
new_class.add_to_class('_meta', Options(meta, app_label))
File "/var/task/django/db/models/base.py", line 326, in add_to_class
value.contribute_to_class(cls, name)
File "/var/task/django/db/models/options.py", line 206, in contribute_to_class
self.db_table = truncate_name(self.db_table, connection.ops.max_name_length())
File "/var/task/django/db/__init__.py", line 28, in __getattr__
return getattr(connections[DEFAULT_DB_ALIAS], item)
File "/var/task/django/db/utils.py", line 214, in __getitem__
backend = load_backend(db['ENGINE'])
File "/var/task/django/db/utils.py", line 111, in load_backend
return import_module('%s.base' % backend_name)
File "/var/lang/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/var/task/django/db/backends/mysql/base.py", line 15, in <module>
import MySQLdb as Database
File "/var/task/MySQLdb/__init__.py", line 24, in <module>
version_info, _mysql.version_info, _mysql.__file__
What am I doing wrong? I have settings.py with all my env variables defined. It seems that those variables aren't being read properly? Thanks too all of those in advance.
Not a true fix but a work around. I was using AWS RDS MySQL, and just changed everything over to AWS RDS PostgreSQL. Have had no issues. This has me to believe that the error has something to do with connecting to MySQL.

how to hit a flask app running in azure batch node using a task?

I was able to configure the following setup,
[]
I am able to add a task in a job to run the testapi.py file which takes a simple string as input and send it to the flask app running inside the docker container.
But,
The task execution is throwing an error
> Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 355, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/local/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 183, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fe6ff71e630>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 641, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=6789): Max retries exceeded with url: /upload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe6ff71e630>: Failed to establish a new connection: [Errno 111] Connection refused',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "preload_testapi.py", line 37, in <module>
r = requests.post(url, json=got_json)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6789): Max retries exceeded with url: /upload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe6ff71e630>: Failed to establish a new connection: [Errno 111] Connection refused',))
Things i have done
I made sure that the start task(which runs the flask app) is
executed by sudo user created by me (not _azbatch user), tasks
inside the jobs are also executed by the same user.
I tested the same command executed by the task by doing ssh into the
same node with the user login created by me and the command runs
fine.
when I run the command from _azbatch it throws the same error.
It is not a flask port issue.
#fpark my start task execution is done properly, during node creation I download an image from Azure container registry, download 2 files from azure storage blob as resource files, start task is to run one of these files(shell script) which makes a container, run a flask app inside it from that image. added tasks are to hit the running flask app using the 2nd file(testapi.py)with different parameters for each task.
You can hit the flask API using a task. Follow this way:
Give a job that has two tasks to your compute node where task 1 runs the flask API and task 2 hits the API. And also enable parallel task execution for your nodes before trying this.
Hope this helps (~ ̄▽ ̄)~

Add proxy credential to google-calendar connection

Im trying to get event from google calendar by python, Im using google quickstart example, I get browser opening to confirm credential but on my pc I get following error:
Traceback (most recent call last):
File "C:\Programs\Python\Python35-32\lib\site-packages\httplib2\__init__.py", line 987, in _conn_request
conn.connect()
File "C:\Programs\Python\Python35-32\lib\http\client.py", line 1252, in connect
super().connect()
File "C:\Programs\Python\Python35-32\lib\http\client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Programs\Python\Python35-32\lib\socket.py", line 693, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "C:\Programs\Python\Python35-32\lib\socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11002] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\myUser\Documents\Python_projects\quickstart.py", line 79, in <module>
main()
File "C:\Users\myUser\Documents\Python_projects\quickstart.py", line 60, in main
credentials = get_credentials()
File "C:\Users\myUser\Documents\Python_projects\quickstart.py", line 48, in get_credentials
credentials = tools.run_flow(flow, store, flags)
File "C:\Programs\Python\Python35-32\lib\site-packages\oauth2client\util.py", line 137, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Programs\Python\Python35-32\lib\site-packages\oauth2client\tools.py", line 243, in run_flow
credential = flow.step2_exchange(code, http=http)
File "C:\Programs\Python\Python35-32\lib\site-packages\oauth2client\util.py", line 137, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Programs\Python\Python35-32\lib\site-packages\oauth2client\client.py", line 2027, in step2_exchange
headers=headers)
File "C:\Programs\Python\Python35-32\lib\site-packages\httplib2\__init__.py", line 1314, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "C:\Programs\Python\Python35-32\lib\site-packages\httplib2\__init__.py", line 1064, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "C:\Programs\Python\Python35-32\lib\site-packages\httplib2\__init__.py", line 994, in _conn_request
raise ServerNotFoundError("Unable to find the server at %s" % conn.host)
httplib2.ServerNotFoundError: Unable to find the server at accounts.google.com
I suppose the problem is due to my proxy but where can I set proxy setting?
Thanks in advance

Resources