I am very new to external script and python and was trying with very simple code.
Trying to print the data from a csv file.
execute sp_execute_external_script
#language = N'Python',
#script=N'
import pandas as pd
import csv
data=open("C:/Users/xxxxxx/Desktop/xxxxxx/Python/Pandas/olympics - Copy.csv")
data=csv.reader(data)
print(data)'
But I get below error
"FileNotFoundError: [Errno 2] No such file or directory: "
when I run the same code in jupyter notebook this runs fine.
import pandas as pd
oo=pd.read_csv('C:/Users/xxxxxx/Desktop/xxxxxx/Python/Pandas/olympics - Copy.csv')
oo.head()
what am i missing in SQL ? Can anyone please help me with the syntax?
Also, are there any good resources where I can learn more of using python in SQL 2017?
The SQL Server you are calling when executing sp_execute_external_script (SPEES), where is that installed; on your machine, or?
Don't forget when you execute SPEES it runs from the SQL box, so unless it is on your machine, it won't work. Even if it is on your machine it may not have permissions to the directory your file is in.
If the SQL is installed on your box, I suggest you create a new directory which you five EVERYONE access to and try with that directory.
Related
I am trying to fetch a file using FTP (kept on Hostinger) using Pyspark in Databricks community.
Everything works fine until I try to read that file using spark.read.csv('MyFile.csv').
Following is the code and an error,
PySpark Code:
res=spark.sparkContext.addFile('ftp://USERNAME:PASSWORD#URL/<folder_name>/MyFile.csv')
myfile=SparkFiles.get("MyFile.csv")
spark.read.csv(path=myfile) # Errors out here
print(myfile, sc.getRootDirectory()) # Outputs almost same path (except dbfs://)
Error:
AnalysisException: Path does not exist: dbfs:/local_disk0/spark-ce4b313e-00cf-4f52-80d8-dff98fc3eea5/userFiles-90cd99b4-00df-4d59-8cc2-2e18050d395/MyFile.csv
As the spark.addfile downloads file on driver while databricks uses dbfs as default storage you are getting the error please try below code to see if it fixes your issue
res=spark.sparkContext.addFile('ftp://USERNAME:PASSWORD#URL/<folder_name>/MyFile.csv')
myfile=SparkFiles.get("MyFile.csv")
spark.read.csv(path='file:'+myfile)
print(myfile, sc.getRootDirectory())
I am trying to import airflow variables from a json file in my local ie command line,but suddenly i see "missing variable file" error.
The same command works every time for importing variables,i dont have clue why is it not able to import now
And FYI using apace airflow=1.10.15
I am using Windows machine and have created container for airflow.
I am able to read data on the local filesystem through DAG but I am unable to write data to a file. I have also tried giving full path, also tried on different operators: Python and Bash but still it doesn't work.
The DAG succeeds there isn't any failures to show.
Note: /opt/airflow : is the $AIRFLOW_HOME path
what may be the reason?
A snippet of code:
from airflow import DAG
from datetime import datetime
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
def pre_process():
f = open("/opt/airflow/write.txt", "w")
f.write("world")
f.close()
with DAG(dag_id="test_data", start_date=datetime(2021, 11, 24), schedule_interval='#daily') as dag:
check_file = BashOperator(
task_id="check_file",
bash_command="echo Hi > /opt/airflow/hi.txt "
)
pre_processing = PythonOperator(
task_id="pre_process",
python_callable=pre_process
)
check_file >> pre_processing
It likely is written but in the container that is running airflow.
You need to understand how containers work. They provide isolation, but this also means that unless you do some data sharing, whatever you create in the container, stays in the container and you do not see it outside of it (that's what container isolation is all about).
You can usually enter the container via docker exec command https://docs.docker.com/engine/reference/commandline/exec/ or you can - for example - mount a folder from your host to your container and write your files there (as far as I know, by default in Windows some folders are mounted for you - but you need to check docker documentation for that).
In your pre_process code, add os.chdir('your/path') before write your data to a file.
So, I have a python script, that I can run in cmd using python [path to script]
I have it set in scheduler, but it doesn't run through scheduler and finds an error. Cmd closes out before being able to read the error. I created a batch file to launch to script, and it shows an error that a package doesn't exist [lxml]. But, the package exists as the script will run when manually executed
Any thoughts?
Script scrapes data from a website, creates a dataframe, posts dataframe to a google sheet, then pulls the full google sheet that it posts to, turns that into a dataframe with all of the data, then creates a plotly graph, turns that plotly into an html file, then sends the html file to a SFTP server
Figured it out...
import sys
import platform
import imp
print("Python EXE : " + sys.executable)
print("Architecture : " + platform.architecture()[0])
#print("Path to arcpy : " + imp.find_module("arcpy")[1])
#raw_input("\n\nPress ENTER to quit")
Run this to get the proper path to your Python.exe, and place this in Program/Script.
Then you need to verify that EVERY directory in your script starts at C:/ and works its way through the full path.
I was in the need to move files with a aws-lambda from a SFTP server to my AWS account,
then I've found this article:
https://aws.amazon.com/blogs/compute/scheduling-ssh-jobs-using-aws-lambda/
Talking about paramiko as a SSHclient candidate to move files over ssh.
Then I've written this calss wrapper in python to be used from my serverless handler file:
import paramiko
import sys
class FTPClient(object):
def __init__(self, hostname, username, password):
"""
creates ftp connection
Args:
hostname (string): endpoint of the ftp server
username (string): username for logging in on the ftp server
password (string): password for logging in on the ftp server
"""
try:
self._host = hostname
self._port = 22
#lets you save results of the download into a log file.
#paramiko.util.log_to_file("path/to/log/file.txt")
self._sftpTransport = paramiko.Transport((self._host, self._port))
self._sftpTransport.connect(username=username, password=password)
self._sftp = paramiko.SFTPClient.from_transport(self._sftpTransport)
except:
print ("Unexpected error" , sys.exc_info())
raise
def get(self, sftpPath):
"""
creates ftp connection
Args:
sftpPath = "path/to/file/on/sftp/to/be/downloaded"
"""
localPath="/tmp/temp-download.txt"
self._sftp.get(sftpPath, localPath)
self._sftp.close()
tmpfile = open(localPath, 'r')
return tmpfile.read()
def close(self):
self._sftpTransport.close()
On my local machine it works as expected (test.py):
import ftp_client
sftp = ftp_client.FTPClient(
"host",
"myuser",
"password")
file = sftp.get('/testFile.txt')
print(file)
But when I deploy it with serverless and run the handler.py function (same as the test.py above) I get back the error:
Unable to import module 'handler': No module named 'paramiko'
Looks like the deploy is unable to import paramiko (by the article above it seems like it should be available for lambda python 3 on AWS) isn't it?
If not what's the best practice for this case? Should I include the library into my local project and package/deploy it to aws?
A comprehensive guide tutorial exists at :
https://serverless.com/blog/serverless-python-packaging/
Using the serverless-python-requirements package
as serverless node plugin.
Creating a virtual env and Docker Deamon will be required to packup your serverless project before deploying on AWS lambda
In the case you use
custom:
pythonRequirements:
zip: true
in your serverless.yml, you have to use this code snippet at the start of your handler
try:
import unzip_requirements
except ImportError:
pass
all details possible to find in Serverless Python Requirements documentation
You have to create a virtualenv, install your dependencies and then zip all files under sites-packages/
sudo pip install virtualenv
virtualenv -p python3 myvirtualenv
source myvirtualenv/bin/activate
pip install paramiko
cp handler.py myvirtualenv/lib/python
zip -r myvirtualenv/lib/python3.6/site-packages/ -O package.zip
then upload package.zip to lambda
You have to provide all dependencies that are not installed in AWS' Python runtime.
Take a look at Step 7 in the tutorial. Looks like he is adding the dependencies from the virtual environment to the zip file. So I'd assume your ZIP file to contain the following:
your worker_function.py on top level
a folder paramico with the files installed in virtual env
Please let me know if this helps.
I tried various blogs and guides like:
web scraping with lambda
AWS Layers for Pandas
spending hours of trying out things. Facing SIZE issues like that or being unable to import modules etc.
.. and I nearly reached the end (that is to invoke LOCALLY my handler function), but then my function even though it was fully deployed correctly and even invoked LOCALLY with no problems, then it was impossible to invoke it on AWS.
The most comprehensive and best by far guide or example that is ACTUALLY working is the above mentioned by #koalaok ! Thanks buddy!
actual link