script runs fine, but can't get pytest to work - python-3.x

Trying to learn pytest, the following script runs fine, but with pytest it fails as it can't the find csv file.
import csv
def load_data(file):
mast_list = []
with open(file) as csvfile:
data = csvfile.read()
phone_masts = data.split('\n')
columns = csv.reader(phone_masts, delimiter=',')
for row in columns:
if len(row) > 0:
mast_list.append(row)
return mast_list
I am trying just to get something working, so trying to test that the function is returning a list type, but it says no csv file found. I'm sure there are plenty of other issues but I'm trying to do one bit at a time. Here is the test:
import pytest
import mobile_phone_data
def test_column_count():
file = 'Mobile Phone Masts.csv'
assert load_list() == type(list)
Why does the script work on it's own but the test fail because it can't find the csv file?

It is actually a bit surprising you get a file not found error: you create a function under one name and try test a different function, that is an error to start with.
I call your first listing foo.py and modified your test script as following:
test_foo.py
from foo import load_data
def test_column_count():
file = 'spam.csv'
assert isinstance(load_data(file), list)
There is also a a file called spam.csv, all three files are in the same folder. pytest runs this test and it passes.
Other issues in your code:
your csv import does unnecessary things - you do not have to split a newline the hard way, use reader parameter instaed
isinstance should be used for type checking
you might be creating a temp file for unit testing and destroying it
the initial function load_data() can be split in two: one that reads a file and the other that parses its content, which shoud eventually make it easier to test.

Related

How to write batch of data to Django's sqlite db from a custom written file?

For a pet project I am working on I need to import list of people to sqlite db. I have 'Staff' model, as well as a users.csv file with list of users. Here is how I am doing it:
import csv
from staff.models import Staff
with open('users.csv') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
firstname = row['firstname']
lastname = row['lastname']
email = row['email']
staff = Staff(firstname=firstname, lastname=lastname, email=email)
staff.save()
csv_file.close()
However, I am getting below error message:
raise ImproperlyConfigured(
django.core.exceptions.ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
Is what I am doing correct? If yes what I am missing here?
Django needs some environment variables when it is being bootstrapped to run. DJANGO_SETTINGS_MODULE is one of these, which is then used to configure Django from the settings. Typically many developers don't even notice because if you stay in Django-land it isn't a big deal. Take a look at manage.py and you'll notice it sets it in that file.
The simplest thing is to stay in django-land and run your script in its framework. I recommend creating a management command. Perhaps a more proper way is to create a data migration and put the data in a storage place like S3 if this is something many many people need to do for local databases... but it seems like a management command is the way to go for you. Another option (and the simplest if this is really just a one-time thing) is to just run this from the django shell. I'll put that at the bottom.
It's very simple and you can drop in your code almost as you have it. Here are the docs :) https://docs.djangoproject.com/en/3.2/howto/custom-management-commands/
For you it might look something like this:
/app/management/commands/load_people.py <-- the file name here is what manage.py will use to run the command later.
from django.core.management.base import BaseCommand, CommandError
import csv
from staff.models import Staff
class Command(BaseCommand):
help = 'load people from csv'
def handle(self, *args, **options):
with open('users.csv') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
firstname = row['firstname']
lastname = row['lastname']
email = row['email']
staff = Staff(firstname=firstname, lastname=lastname, email=email)
staff.save()
# csv_file.close() # you don't need this since you used `with`
which you would call like this:
python manage.py load_people
Finally, the simplest solution is to just run the code in the Django shell.
python manage.py shell
will open up an interactive shell with everything loaded properly. You can execute your code there and it should work.

requests.get(url).headers.get('content-disposition') returning NONE on PYTHON

Well, I've got the need of automate a process in my job(actually I'm an intern), and I just wondered if I could use Python for such process. I'm still processing my ideas of how to do those stuffs, and now I'm currently trying to understand how to download a file from a web URL using python3. I've found a guide on another website, but there's no active help there. I was told to use the module requests to download the actual file, and the module re to get the real file name.
The code was working fine, but then I tried to add some features like GUI, and it just stopped working. I took off the GUI code, and it didn't work again. Now I have no idea of what to do to get the code working, pls someone helo me, thanks :)
code:
import os
import re
# i have no idea of how this function works, but it gets the real file name
def getFilename(cd):
if not cd:
print("check 1")
return None
fname = re.findall('filename=(.+)', cd)
if len(fname) == 0:
print("check 2")
return None
return fname[0]
def download(url):
# get request
response = requests.get(url)
# get the real file name, cut off the quota and take the second element of the list(actual file name)
filename = getFilename(response.headers.get('content-disposition'))
print(filename)
# open in binary mode and write to file
#open(filename, "wb").write(response.content)
download("https://pixabay.com/get/57e9d14b4957a414f6da8c7dda353678153fd9e75b50704b_1280.png?attachment=")
os.system("pause")```

how to create a dynamic list to read and update for each script run

i have a script which runs some checks for all DB's.Now i want to have a list such that this list contains all the DB's already checked.So the next time the script runs it will read this list and if the DB is not in that list only then the checks happen.
What is the best way to implement this? If i initialize a empty list(DB_checked) and append each DB name while the checks are run,the issue would be that each time the script starts the list would again be empty.
Please suggest.Thanks.
At the end of the script will call the below function to write to disk:
def writeDBList(db_checked):
with open(Path(__file__).parent / "db_names.txt", "w") as fp:
for s in job_names:
fp.write(str(s) +"\n")
return
When the script starts will call the below to read the file from disk:
def readDBList():
with open(Path(__file__).parent / "db_names.txt", "r") as fp:
for line in fp:
db_list.append(line.strip())
return
But how to convert the file contents to a list so that i can easily check with below:
checked_list = readDBList()
if db not in checked_list:
....
....
You need to write this list to the disk after the script finishes the checks, and read it again in the beginning of the script in the next script run.
# Read DB CheckList
DB_List = readDBList()
# Your normal script functionality for only DBs not in the list
# Store DB CheckList
writeDBList(DB_List)
Check this in case you are not familiar with I/O file handling in python.
Now, regarding your second question about how to read the list. I would suggest using pickle, which allows you to read/write python structures without worrying about stringfying or parsing.
import pickle
def writeDBList():
with open('DBListFile', 'wb') as fp:
pickle.dump(DBList, fp)
def readDBList():
with open ('DBListFile', 'rb') as fp:
DBList= pickle.load(fp)

Output data from subprocess command line by line

I am trying to read a large data file (= millions of rows, in a very specific format) using a pre-built (in C) routine. I want to then yeild the results of this, line by line, via a generator function.
I can read the file OK, but where as just running:
<command> <filename>
directly in linux will print the results line by line as it finds them, I've had no luck trying to replicate this within my generator function. It seems to output the entire lot as a single string that I need to split on newline, and of course then everything needs reading before I can yield line 1.
This code will read the file, no problem:
import subprocess
import config
file_cmd = '<command> <filename>'
for rec in (subprocess.check_output([file_cmd], shell=True).decode(config.ENCODING).split('\n')):
yield rec
(ENCODING is set in config.py to iso-8859-1 - it's a Swedish site)
The code I have works, in that it gives me the data, but in doing so, it tries to hold the whole lot in memory. I have larger files than this to process which are likely to blow the available memory, so this isn't an option.
I've played around with bufsize on Popen, but not had any success (and also, I can't decode or split after the Popen, though I guess the fact I need to split right now is actually my problem!).
I think I have this working now, so will answer my own question in the event somebody else is looking for this later ...
proc = subprocess.Popen(shlex.split(file_cmd), stdout=subprocess.PIPE)
while True:
output = proc.stdout.readline()
if output == b'' and proc.poll() is not None:
break
if output:
yield output.decode(config.ENCODING).strip()

Python: Dynamicaly imported module fails when first created, then succeeds

I have a template engine named Contemplate which has implementations for php, node/js and python.
All work fine except lately the python implementation gives me some issues. Specificaly the problem appears when first parsing a template and creating the template python code which is then dynamically imported as a module. When template is already created everything works fine but when template needs to be parsed and saved to disk and THEN imported it raises an error eg
ModuleNotFoundError: No module named 'blah blah'
(note this error appears to be random, it is not always sure that it will be raised, many times it works even if template is created just before importing, other times it fails and then if ran again with template already created it succeeds)
Is there any way I can bypass this issue, maybe add a delay between saving a parsed template and then importing as module or somethig else?
The code to import the module (the parsed template which is now a python class) is below:
def import_tpl( filename, classname, cacheDir, doReload=False ):
# http://www.php2python.com/wiki/function.import_tpl/
# http://docs.python.org/dev/3.0/whatsnew/3.0.html
# http://stackoverflow.com/questions/4821104/python-dynamic-instantiation-from-string-name-of-a-class-in-dynamically-imported
#_locals_ = {'Contemplate': Contemplate}
#_globals_ = {'Contemplate': Contemplate}
#if 'execfile' in globals():
# # Python 2.x
# execfile(filename, _globals_, _locals_)
# return _locals_[classname]
#else:
# # Python 3.x
# exec(read_file(filename), _globals_, _locals_)
# return _locals_[classname]
# http://docs.python.org/2/library/imp.html
# http://docs.python.org/2/library/functions.html#__import__
# http://docs.python.org/3/library/functions.html#__import__
# http://stackoverflow.com/questions/301134/dynamic-module-import-in-python
# http://stackoverflow.com/questions/11108628/python-dynamic-from-import
# also: http://code.activestate.com/recipes/473888-lazy-module-imports/
# using import instead of execfile, usually takes advantage of Python cached compiled code
global _G
getTplClass = None
# add the dynamic import path to sys
basename = os.path.basename(filename)
directory = os.path.dirname(filename)
os.sys.path.append(cacheDir)
os.sys.path.append(directory)
currentcwd = os.getcwd()
os.chdir(directory) # change working directory so we know import will work
if os.path.exists(filename):
modname = basename[:-3] # remove .py extension
mod = __import__(modname)
if doReload: reload(mod) # Might be out of date
# a trick in-order to pass the Contemplate super-class in a cross-module way
getTplClass = getattr( mod, '__getTplClass__' )
# restore current dir
os.chdir(currentcwd)
# remove the dynamic import path from sys
del os.sys.path[-1]
del os.sys.path[-1]
# return the tplClass if found
if getTplClass: return getTplClass(Contemplate)
return None
Note the engine creates a __init__.py file in cacheDir if it is not there already.
If needed I can change the import_tpl function to sth else I dont mind.
Python tested is python 3.6 on windows but I dont think this is a platform-specific issue.
To test the issue you can download the github repository (linked above) and run the /tests/test.py test after clearing all cached templates from /tests/_tplcache/ folder
UPDATE:
I am thinking of adding a while loop with some counter in import_tpl that catches the error raised if any and retries a specified amount of times until it succeeds to import the module. But I am also wondering if this is a good solution or there is something else I am missing here..
UPDATE (20/02/2019):
Added a loop to retry a specified amount of times plus a small delay of 1 sec if initially failed to import template module (see online repository code), but still it raises same error sometimes when templates are first created before being imported. Any solutions?
Right, if you use a "while" loop with to handle exceptions would be one way.
while True:
try:
#The module importing
break
except ModuleNotFoundError:
print("NOPE! Module not found")
If it works for some other, an not other "module" files, the likely suspect is the template files the template files themselves.

Resources