I've been playing with docker for a while. Recently, I encountered a "bug" that I cannot identify the reason / cause.
I'm currently on windows 8.1 and have docker toolbox installed, which includes docker 1.8.2, docker-machine 0.4.1, and virtualbox 5.0.4 (these are the important ones, presumably). I used to be with pure boot2docker.
I'm not really sure about what is going on, so the description could be vague and unhelpful, please ask me for clarification if you need any. Here we go:
When I write to some files that are located in the shared folders, the vm only gets the file length update, but cannot pick up the new content.
Let's use my app.py as an example (I've been playing with flask)
app.py:
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from werkzeug.contrib.fixers import LighttpdCGIRootFix
import os
app = Flask(__name__)
app.config.from_object(os.getenv('APP_SETTINGS'))
app.wsgi_app = LighttpdCGIRootFix(app.wsgi_app)
db = SQLAlchemy(app)
#app.route('/')
def hello():
return "My bio!"
if __name__ == '__main__':
app.run(host='0.0.0.0')
and when I cat it in the vm:
Now, lets update it to the following, notice the extra exclamation marks:
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from werkzeug.contrib.fixers import LighttpdCGIRootFix
import os
app = Flask(__name__)
app.config.from_object(os.getenv('APP_SETTINGS'))
app.wsgi_app = LighttpdCGIRootFix(app.wsgi_app)
db = SQLAlchemy(app)
#app.route('/')
def hello():
return "My bio!!!!!!!"
if __name__ == '__main__':
app.run(host='0.0.0.0')
And when I cat it again:
Notice 2 things:
the extra exclamation marks are not there
the EOF sign moved, the number of the spaces, which appeared in front of the EOF sign, is exactly the number of the exclamation marks.
I suspect that the OS somehow picked up the change in file size, but failed to pick the new content. When I delete characters from the file, the EOF sign also moves, and the cat output is chopped off by exactly how many characters I deleted.
It's not only cat that fails to pick up the change, all programs in the vm do. Hence I cannot develop anything when it happens. The changes I make are simply not affecting anything. And I have to kill the vm and spin it up again to get any changes I make, not so efficient.
Any help will be greatly appreciated! Thank you for reading the long question!
Looks like this is a known issue.
https://github.com/gliderlabs/pagebuilder/issues/2
which links to
https://forums.virtualbox.org/viewtopic.php?f=3&t=33201
Thanks to Matt Aitchison for replying to my github issue at gliderlabs/docker-alpine
sync; echo 3 > /proc/sys/vm/drop_caches is the temporary fix.
A permanent fix doesn't seem to be coming any time soon...
I assume that you mounted app.py as a file, using something like
-v /host/path/to/app.py:/container/path/to/app.py
Sadly, the container will not recognize changes to a file mounted that way.
Try putting the file in a folder and mount the folder instead. Then changes to that file will be visable in the container.
Assuming app.py is located in $(pwd)/work, try running the container with
-v $(pwd)/work:/work
and adjust the command being run to your code as /work/app.py.
Related
The following code takes in path to an image, removes the background, and saves it to desktop under a name the user decides:
from rembg import remove
from PIL import Image
input_path = input("Please Drag & Drop an image:") # Takes image path
output_name = input("Give it a name:")
output_path = rf'C:\Users\user\Desktop\{output_name}.png'
input_image = Image.open(input_path)
output = remove(input_image)
output.save(output_path)
It works fine when I run it in PyCharm. I copy the image file, the code executes flawlessly and the output is seen in the desktop. But when I run the code through CMD terminal, it asks for image, and I drag the image to the terminal and it copies the path and everything works well, but when I press enter to begin I get this error and I cannot figure out what is the issue:
Does anyone know what the issue is?
You may encounter the loadlibrary failed with error 126 when the problematic application does not have the privileges to access a protected system resource. In this case, launching the problematic application as an administrator may solve the problem. I would recommend running the CMD terminal as an administrator
I am using Windows machine and have created container for airflow.
I am able to read data on the local filesystem through DAG but I am unable to write data to a file. I have also tried giving full path, also tried on different operators: Python and Bash but still it doesn't work.
The DAG succeeds there isn't any failures to show.
Note: /opt/airflow : is the $AIRFLOW_HOME path
what may be the reason?
A snippet of code:
from airflow import DAG
from datetime import datetime
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
def pre_process():
f = open("/opt/airflow/write.txt", "w")
f.write("world")
f.close()
with DAG(dag_id="test_data", start_date=datetime(2021, 11, 24), schedule_interval='#daily') as dag:
check_file = BashOperator(
task_id="check_file",
bash_command="echo Hi > /opt/airflow/hi.txt "
)
pre_processing = PythonOperator(
task_id="pre_process",
python_callable=pre_process
)
check_file >> pre_processing
It likely is written but in the container that is running airflow.
You need to understand how containers work. They provide isolation, but this also means that unless you do some data sharing, whatever you create in the container, stays in the container and you do not see it outside of it (that's what container isolation is all about).
You can usually enter the container via docker exec command https://docs.docker.com/engine/reference/commandline/exec/ or you can - for example - mount a folder from your host to your container and write your files there (as far as I know, by default in Windows some folders are mounted for you - but you need to check docker documentation for that).
In your pre_process code, add os.chdir('your/path') before write your data to a file.
I have written a service using fastapi and uvicorn. I have a main in my service that starts uvicorn (see below). In that main, the first thing I do is load configuration settings. I have some INFO outputs that output the settings when I load the configuration. I notice when I start my service, the configuration loading method seems to be running twice.
# INITIALIZE
if __name__ == "__main__":
# Load the config once at bootstrap time. This outputs the string "Loading configuration settings..."
config = CdfAuthConfig()
print("Loaded Configuration")
# Create FastAPI object
app = FastAPI()
# Start uvicorn
uvicorn.run(app, host="127.0.0.1", port=5050)
The output when I run the service looks like:
Loading configuration settings...
Loading configuration settings...
Loaded Configuration
Why is the "CdfAuthConfig()" class being instantiated twice? It obviously has something to do with the "uvicorn.run" command.
I had a similar setup and this behavior made me curious, I did some tests and now I see probably why.
Your if __name__ == "__main__": is being reached only once, this is a fact.
How can you test this.
Add the following line before your if:
print(__name__)
If you run your code as is, but adding the line I mentioned, it will print:
__main__ # in the first run
Then uvicorn will call your program again and will print something like:
__mp_main__ # after uvicorn starts your code again
And right after it will also print:
app # since this is the argument you gave to uvicorn
If you want to avoid that, you should call uvicorn from the command line, like:
uvicorn main:app --reload --host 0.0.0.0 --port 5000 # assuming main.py is your file name
uvicorn will reload your code since you are calling it from inside the code. Maybe a work around would be to have the uvicorn call in a separate file, or as I said, just use the command line.
If you don't wanna write the command with the arguments all the time, you can write a small script (app_start.sh)
I hope this helps you understand a little bit better.
I was in the need to move files with a aws-lambda from a SFTP server to my AWS account,
then I've found this article:
https://aws.amazon.com/blogs/compute/scheduling-ssh-jobs-using-aws-lambda/
Talking about paramiko as a SSHclient candidate to move files over ssh.
Then I've written this calss wrapper in python to be used from my serverless handler file:
import paramiko
import sys
class FTPClient(object):
def __init__(self, hostname, username, password):
"""
creates ftp connection
Args:
hostname (string): endpoint of the ftp server
username (string): username for logging in on the ftp server
password (string): password for logging in on the ftp server
"""
try:
self._host = hostname
self._port = 22
#lets you save results of the download into a log file.
#paramiko.util.log_to_file("path/to/log/file.txt")
self._sftpTransport = paramiko.Transport((self._host, self._port))
self._sftpTransport.connect(username=username, password=password)
self._sftp = paramiko.SFTPClient.from_transport(self._sftpTransport)
except:
print ("Unexpected error" , sys.exc_info())
raise
def get(self, sftpPath):
"""
creates ftp connection
Args:
sftpPath = "path/to/file/on/sftp/to/be/downloaded"
"""
localPath="/tmp/temp-download.txt"
self._sftp.get(sftpPath, localPath)
self._sftp.close()
tmpfile = open(localPath, 'r')
return tmpfile.read()
def close(self):
self._sftpTransport.close()
On my local machine it works as expected (test.py):
import ftp_client
sftp = ftp_client.FTPClient(
"host",
"myuser",
"password")
file = sftp.get('/testFile.txt')
print(file)
But when I deploy it with serverless and run the handler.py function (same as the test.py above) I get back the error:
Unable to import module 'handler': No module named 'paramiko'
Looks like the deploy is unable to import paramiko (by the article above it seems like it should be available for lambda python 3 on AWS) isn't it?
If not what's the best practice for this case? Should I include the library into my local project and package/deploy it to aws?
A comprehensive guide tutorial exists at :
https://serverless.com/blog/serverless-python-packaging/
Using the serverless-python-requirements package
as serverless node plugin.
Creating a virtual env and Docker Deamon will be required to packup your serverless project before deploying on AWS lambda
In the case you use
custom:
pythonRequirements:
zip: true
in your serverless.yml, you have to use this code snippet at the start of your handler
try:
import unzip_requirements
except ImportError:
pass
all details possible to find in Serverless Python Requirements documentation
You have to create a virtualenv, install your dependencies and then zip all files under sites-packages/
sudo pip install virtualenv
virtualenv -p python3 myvirtualenv
source myvirtualenv/bin/activate
pip install paramiko
cp handler.py myvirtualenv/lib/python
zip -r myvirtualenv/lib/python3.6/site-packages/ -O package.zip
then upload package.zip to lambda
You have to provide all dependencies that are not installed in AWS' Python runtime.
Take a look at Step 7 in the tutorial. Looks like he is adding the dependencies from the virtual environment to the zip file. So I'd assume your ZIP file to contain the following:
your worker_function.py on top level
a folder paramico with the files installed in virtual env
Please let me know if this helps.
I tried various blogs and guides like:
web scraping with lambda
AWS Layers for Pandas
spending hours of trying out things. Facing SIZE issues like that or being unable to import modules etc.
.. and I nearly reached the end (that is to invoke LOCALLY my handler function), but then my function even though it was fully deployed correctly and even invoked LOCALLY with no problems, then it was impossible to invoke it on AWS.
The most comprehensive and best by far guide or example that is ACTUALLY working is the above mentioned by #koalaok ! Thanks buddy!
actual link
I have a problem pretty much exactly like this:
How to preserve a SQLite database from being reverted after deploying to OpenShift?
I don't understand his answer fully and clearly not enough to apply it to my own app and since I can't comment his answer (not enough rep) I figured I had to make ask my own question.
Problem is that when pushing my local files (not including the database file) my database on openshift becomes the one I have locally (all changes made through the server are reverted).
I've googled alot and pretty much understand the problem being that the database should be located somewhere else but I can't grasp fully where to place it and how to deploy it if it's outside the repo.
EDIT: Quick solution: If you have this problem, try connecting to your openshift app with rhc ssh appname
and then cp app-root/repo/database.db app-root/data/database.db
if you have the openshift data dir as reference to SQLALCHEMY_DATABASE_URI. I recommend the accepted answer below though!
I've attached my filestructure and here's some related code:
config.py
import os
basedir = os.path.abspath(os.path.dirname(__file__))
SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(basedir, 'database.db')
SQLALCHEMY_MIGRATE_REPO = os.path.join(basedir, 'db_repository')
app/__ init.py__
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
app = Flask(__name__)
#so that flask doesn't swallow error messages
app.config['PROPAGATE_EXCEPTIONS'] = True
app.config.from_object('config')
db = SQLAlchemy(app)
from app import rest_api, models
wsgi.py:
#!/usr/bin/env python
import os
virtenv = os.path.join(os.environ.get('OPENSHIFT_PYTHON_DIR', '.'), 'virtenv')
#
# IMPORTANT: Put any additional includes below this line. If placed above this
# line, it's possible required libraries won't be in your searchable path
#
from app import app as application
## runs server locally
if __name__ == '__main__':
from wsgiref.simple_server import make_server
httpd = make_server('localhost', 4599, application)
httpd.serve_forever()
filestructure: http://sv.tinypic.com/r/121xseh/8 (can't attach image..)
Via the note at the top of the OpenShift Cartridge Guide:
"Cartridges and Persistent Storage: Every time you push, everything in your remote repo directory is recreated. Store long term items (like an sqlite database) in the OpenShift data directory, which will persist between pushes of your repo. The OpenShift data directory can be found via the environment variable $OPENSHIFT_DATA_DIR."
You can keep your existing project structure as-is and just use a deploy hook to move your database to persistent storage.
Create a deploy action hook (executable file) .openshift/action_hooks/deploy:
#!/bin/bash
# This deploy hook gets executed after dependencies are resolved and the
# build hook has been run but before the application has been started back
# up again.
# if this is the initial install, copy DB from repo to persistent storage directory
if [ ! -f ${OPENSHIFT_DATA_DIR}database.db ]; then
cp -rf ${OPENSHIFT_REPO_DIR}database.db ${OPENSHIFT_DATA_DIR}/database.db 2>/dev/null
fi
# remove the database from the repo during all deploys
if [ -d ${OPENSHIFT_REPO_DIR}database.db ]; then
rm -rf ${OPENSHIFT_REPO_DIR}database.db
fi
# create symlink from repo directory to new database location in persistent storage
ln -sf ${OPENSHIFT_DATA_DIR}database.db ${OPENSHIFT_REPO_DIR}database.db
As another person pointed out, also make sure you are actually committing/pushing your database (make sure your database isn't included in your .gitignore).