save a dictionary to a .db file - python-3.x

for the past couple of hours, I've been trying to find a solution to this issue. Any knowledge share is very helpful.
The objective is to save the dictionary created from the program. I am getting an error in line 3.
def save_dict(dictionary_to_be_saved):
with shelve.open('OperationNamesDictionary.db', 'c') as s: #create OperationNamesDictionary.db
s = dictionary_to_be_saved
What am I planning to achieve? : I have to pass the dictionary name to this function and it should create a (****.db) file, which I can use later.
Thanks in advance
Code used to save a dictionary:
def save_dict(dictionary_to_be_saved):
with shelve.open('OperationNamesDictionary.db', 'c') as s: # "c" flag used to create dictionary
s = dictionary_to_be_saved
Code used to retrieve a dictionary from created file:
def load_dict():
try:
with shelve.open('TOperationNamesDictionary.db', 'r') as s:
operation_names = s
return(operation_names)
except dbm.error:
print("Could not find a saved dictionary, using a pre-coded one.")
operation_names = load_dict()
print(operation_names)
output:<shelve.DbfilenameShelf object at 0x0000021970AD89A0>
Expected output: data inside the operation_names (A dictionary)

I think this is what you are after, more or less. This is untested code so I hope it works! What I have added to your attempts in the question is to provide a key value for the items you are shelving (I used the uninteresting identifier "mydictionary" as my key). So we put items in by that name, and take them out again by that same name.
def save_dict(dictionary_to_be_saved):
with shelve.open('OperationNamesDictionary.db', 'c') as s:
s['mydictionary'] = dictionary_to_be_saved
def load_dict():
try:
with shelve.open('TOperationNamesDictionary.db', 'r') as s:
return s['mydictionary']
except KeyError:
print("Could not find a saved dictionary")

For my specific case, creating a (.npy) file worked for me.
Again, the objective of my code is to use a dictionary that is available (the type of file that stores this dictionary doesn't matter) and at the end of the program save the updated dictionary to the same file.
import numpy as np
try:
operation_names = np.load("Operation_Dictionary.npy", allow_pickle = 'TRUE').item()
print("Using the Dictionary which is found in Path")
except FileNotFoundError:
print("Using Pre-coded Dictionary from the code")
operation_names = {"Asd-013we":"No Influence", "rdfhyZ1":"TRM"}# default dictionary
.....
.....
# the program will update the dictionary
np.save("Operation_Dictionary.npy",operation_names) #the updated dictionary is saved to same file.

Related

Python3: Is it possible to use a variable as part of a function call

I am iterating through a list and want to use the list item to call a function. This is what I have tried:
def get_list1():
....do something
def get_list2():
....do something
def get_list3():
....do something
data = ['list1', 'list2', 'list3']
for list_item in data:
function_call = 'get_' + list_item
function_call()
But I am receiving the error "TypeError: 'str' object is not callable"
There are a couple of other ways that I could attack this, but this would be helpful to know for the future as well. Thanks!
Hopefully that TypeError is not surprising, because when you write...
function_call = 'get_' + list_item
...you're creating a string. If you want to look up a function by that name, you can use the vars() function, like this:
def get_list1():
print('list1')
def get_list2():
print('list2')
def get_list3():
print('list3')
data = ['list1', 'list2', 'list3']
for list_item in data:
function_call = 'get_' + list_item
fn = vars()[function_call]
fn()
Running the above produces:
list1
list2
list3
But as #pynchia notes in a comment on another answer, this isn't a great way to structure your code: you're better off building an explicit dictionary mapping names to functions if you really need this sort of functionality. Without seeing your actual code it's hard to tell what the most appropriate solution would look like.
Just to give an example of using dictionaries (as they have been mentioned here in other answers) in case you find it useful.
def get_list1():
print('get_list1 executes')
def get_list2():
print('get_list2 executes')
# Create a dictionary with a reference to your functions as values
# (note no parenthesis, as that would execute the function here instead)
fns = {
'example_key1': get_list1,
'example_key2': get_list2,
}
print(type(fns['example_key1'])) # returns <class 'function'>
# If you still want a list
lst = list(fns) # Create a list containing the keys of the fns dictionary
for fn in lst:
# Iterate through the list (of keys) and execute the function
# found in the value.
fns[fn]()
# Or you can now just simply iterate through the dictionary instead, if you wish:
for fn in fns.values():
fn()
This code produces:
<class 'function'>
get_list1 executes
get_list2 executes
get_list1 executes
get_list2 executes
fn = vars()['get_' + list_item]
fn()

Why my function that creates a pandas dataframe changes the dtype to none when called

I'm working on processing csv files, I was writing my code without functions and it worked, albeit some problems when trying to fillna with a string, before I did a try and except.
For some reason it didn't work before creating the while loop.
My question is why a dataframe object created inside of a function by reading a csv file name I passed when I called the function, returned an empty object? I thought when the dataframe was in memory it wouldn't be destroyed, what am I missing?
My code:
import pandas as pd
grossmargin = 1.2
def read_wholesalefile(name):
mac = name
apple = pd.read_csv(mac)
apple['price'] = apple['Wholesale'] * grossmargin
while True:
try:
apple.fillna('N/A', inplace=True)
break
except ValueError:
print('Not Valid')
read_wholesalefile('Wholesalelist5182021.csv')
Well sorry guys, I figure it out by myself:
I was missing the scope, sorry again for the newb stuff. I just started coding in Python a few months ago(last December) and I'm learning in the process.
What worked for me was to add the scope Global, within the function, seriously I didn't know dataframes behaved as variables ... inside a function.
#My Modified code that works
import pandas as pd
grossmargin = 1.2
def read_wholesalefile(name):
global apple
mac = name
apple = pd.read_csv(mac)
apple['price'] = apple['Wholesale'] * grossmargin
while True:
try:
apple.fillna('N/A', inplace=True)
break
except ValueError:
print('Not Valid')
read_wholesalefile('Wholesalelist5182021.csv')

how can i keep/ save the user input in dictionary?

import pandas as pd
from pandas import DataFrame
words = {}
def add_word():
ask = input("Do You Want To Add a New Word?(y/n): ")
if ask == 'y':
new_word = input("type the word you want to add: ")
word_meaning = input("type the word meaning: ")
words[new_word] = [word_meaning]
elif ask == 'n':
pass
add_word()
table = pd.DataFrame(data=words)
table_transposed = table.transpose()
print(table_transposed)
as you can see, i want to make a dictionary but i don't know how to save the user's input.
i want to take the user input and save it in the dictionary, so the next time he uses the program he can see everything he added before
When you make and populate (fill) a dictionary in a running Python program, that dictionary only exists as long as the program is running. When you close the program - that memory is wiped and any modifications that are made are not stored.
As Tomerikoo pointed out, this solution: shelving dictionaries will allow you to preserve your dictionary after the program is closed.
I copy the code from the link (jabaldonedo's solution) and annotate it for you for clarity.
import shelve # this is a Python library that allows you to store dictionaries after the program is closed
data = {'foo':'foo value'} # this is a mock-up dictionary. "words" in your case
d = shelve.open('myfile.db') # this creates a storage container in the program folder that can store multiple dictionaries. You can see this file appear when this code runs.
d['data'] = data # you make a section in that storage container, give it a name, e.g. "data" in this case, and store your dictionary in that section. You will store your "words" here.
d.close() # close the storage container if you do not intend to put anything else inside.
When you close and open up the program, the dictionary will not automatically pop into your running memory - you need to write code to access it. It can be made as an option in your game menu, e.g. "Load existing dictionary of words".
Back to jabaldonedo's solution:
import shelve # no need to import again, if you are writing in the same python program, this is for demonstration
d = shelve.open('myfile.db') # open the storage container where you put the dictionary
data = d['data'] # pull out the section, where you stored the dictionary and save it into a dictionary variable in the running program. You can now use it normally.
d.close() # close the storage container if you do not intend to use it for now.
EDIT: Here is how this could be used in the specific context provided in your answer. Note that I imported an additional library and changed the flags in your shelve access commands.
As I mentioned in my comment, you should first attempt to load the dictionary before writing new things into it:
import shelve
import dbm # this import is necessary to handle the custom exception when shelve tries to load a missing file as "read"
def save_dict(dict_to_be_saved): # your original function, parameter renamed to not shadow outer scope
with shelve.open('shelve2.db', 'c') as s: # Don't think you needed WriteBack, "c" flag used to create dictionary
s['Dict'] = dict_to_be_saved # just as you had it
def load_dict(): # loading dictionary
try: # file might not exist when we try to open it
with shelve.open('shelve2.db', 'r') as s: # the "r" flag used to only read the dictionary
my_saved_dict = s['Dict'] # load and assign to a variable
return my_saved_dict # give the contents of the dictionary back to the program
except dbm.error: # if the file is not there to load, this error happens, so we suppress it...
print("Could not find a saved dictionary, returning a blank one.")
return {} # ... and return an empty dictionary instead
words = load_dict() # first we attempt to load previous dictionary, or make a blank one
ask = input('Do you want to add a new word?(y/n): ')
if ask == 'y':
new_word = input('what is the new word?: ')
word_meaning = input('what does the word mean?: ')
words[new_word] = word_meaning
save_dict(words)
elif ask == 'n':
print(words) # You can see that the dictionary is preserved between runs
print("Oh well, nothing else to do here then.")
import shelve
words = {}
def save_dict(words):
s = shelve.open('shelve2.db', writeback=True)
s['Dict'] = words
s.sync()
s.close()
def load_dict():
s = shelve.open('shelve2.db', writeback=True)
dict = s['Dict']
print(dict)
s.close()
ask = input('Do you want to add a new word?(y/n): ')
if ask == 'y':
new_word = input('what is the new word?: ')
word_meaning = input('what does the word mean?: ')
words[new_word] = word_meaning
save_dict(words)
elif ask == 'n':
load_dict()
so this is my code after making the save_dict and load_dict functions, it works fine but when i run the program and write a new_word and word_meaning it overwrites the previous data, i believe i am missing something in the save_dict function, if you can point the problem to me i would be so grateful

pdf form filled with PyPDF2 does not show in print

I need to fill pdf form in batch, so tried to write a python code to do it for me from a csv file. I used second answer in this question and it fills the forms fine, however when I open the filled forms the answers does not show unless the corresponding field is selected. Also the answers does not show when the form is printed. I looked into PyPDF2 documents to see if I can flatten the generated forms but this features has not been implemented yet even though has been asked for about a year ago. My preference is not to use pdftk so I can compile the script without the need for more dependency. When using the original code in the mentioned question, some fields show in the print and some doesn't which makes me confused on how they're working. Any help is appreciated.
Here's the code.
# -*- coding: utf-8 -*-
from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader
def _getFields(obj, tree=None, retval=None, fileobj=None):
"""
Extracts field data if this PDF contains interactive form fields.
The *tree* and *retval* parameters are for recursive use.
:param fileobj: A file object (usually a text file) to write
a report to on all interactive form fields found.
:return: A dictionary where each key is a field name, and each
value is a :class:`Field<PyPDF2.generic.Field>` object. By
default, the mapping name is used for keys.
:rtype: dict, or ``None`` if form data could not be located.
"""
fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
'/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
if retval is None:
retval = {} #OrderedDict()
catalog = obj.trailer["/Root"]
# get the AcroForm tree
if "/AcroForm" in catalog:
tree = catalog["/AcroForm"]
else:
return None
if tree is None:
return retval
obj._checkKids(tree, retval, fileobj)
for attr in fieldAttributes:
if attr in tree:
# Tree is a field
obj._buildField(tree, retval, fileobj, fieldAttributes)
break
if "/Fields" in tree:
fields = tree["/Fields"]
for f in fields:
field = f.getObject()
obj._buildField(field, retval, fileobj, fieldAttributes)
return retval
def get_form_fields(infile):
infile = PdfFileReader(open(infile, 'rb'))
fields = _getFields(infile)
return {k: v.get('/V', '') for k, v in fields.items()}
def update_form_values(infile, outfile, newvals=None):
pdf = PdfFileReader(open(infile, 'rb'))
writer = PdfFileWriter()
for i in range(pdf.getNumPages()):
page = pdf.getPage(i)
try:
if newvals:
writer.updatePageFormFieldValues(page, newvals)
else:
writer.updatePageFormFieldValues(page,
{k: f'#{i} {k}={v}'
for i, (k, v) in
enumerate(get_form_fields(infile).items())
})
writer.addPage(page)
except Exception as e:
print(repr(e))
writer.addPage(page)
with open(outfile, 'wb') as out:
writer.write(out)
if __name__ == '__main__':
import csv
import os
from glob import glob
cwd=os.getcwd()
outdir=os.path.join(cwd,'output')
csv_file_name=os.path.join(cwd,'formData.csv')
pdf_file_name=glob(os.path.join(cwd,'*.pdf'))[0]
if not pdf_file_name:
print('No pdf file found')
if not os.path.isdir(outdir):
os.mkdir(outdir)
if not os.path.isfile(csv_file_name):
fields=get_form_fields(pdf_file_name)
with open(csv_file_name,'w',newline='') as csv_file:
csvwriter=csv.writer(csv_file,delimiter=',')
csvwriter.writerow(['user label'])
csvwriter.writerow(['fields']+list(fields.keys()))
csvwriter.writerow(['Mr. X']+list(fields.values()))
else:
with open(csv_file_name,'r',newline='') as csv_file:
csvreader=csv.reader(csv_file,delimiter=',')
csvdata=list(csvreader)
fields=csvdata[1][1:]
for frmi in csvdata[2:]:
frmdict=dict(zip(fields,frmi[1:]))
outfile=os.path.join(outdir,frmi[0]+'.pdf')
update_form_values(pdf_file_name, outfile,frmdict)
I had the same issue and apparently adding the "/NeedsAppearance" attribute to the PdfWriter object of the AcroForm fixed the problem (see https://github.com/mstamy2/PyPDF2/issues/355). With much help from ademidun (https://github.com/ademidun), I was able to populate a pdf form and have the values of the fields show properly. The following is an example:
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
def set_need_appearances_writer(writer):
# See 12.7.2 and 7.7.2 for more information:
# http://www.adobe.com/content/dam/acom/en/devnet/acrobat/
# pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update(
{
NameObject("/AcroForm"): IndirectObject(
len(writer._objects), 0, writer
)
}
)
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
return writer
except Exception as e:
print("set_need_appearances_writer() catch : ", repr(e))
return writer
reader = PdfFileReader("myInputPdf.pdf", strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
field_dictionary = {"Field1": "Value1", "Field2": "Value2"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)
with open("myOutputPdf.pdf", "wb") as fp:
writer.write(fp)
The underlying reason form fields are not showing up after being filled in, is that the values are not being added to the stream. Adding "NeedAppearances" tells the PDF reader that it needs to update the appearance, in this case it needs to create a stream for each field value, but not all PDF readers will honor that, and the fields may still look blank or have the default values.
The best solution to make sure the fields are updated for any reader is to create a stream for each field and add it to the field's XObject.
Here is an example solution for single line text fields. It also encodes the stream, updates the default value, and sets the fields to read only, which are all optional.
# Example data.
data = {
"field_name": "some value"
}
# Get template.
template = PdfReader("template-form.pdf", strict=False)
# Initialize writer.
writer = PdfWriter()
# Add the template page.
writer.add_page(template.pages[0])
# Get page annotations.
page_annotations = writer.pages[0][PageAttributes.ANNOTS]
# Loop through page annotations (fields).
for index in range(len(page_annotations)): # type: ignore
# Get annotation object.
annotation = page_annotations[index].get_object() # type: ignore
# Get existing values needed to create the new stream and update the field.
field = annotation.get(NameObject("/T"))
new_value = data.get(field, 'N/A')
ap = annotation.get(AnnotationDictionaryAttributes.AP)
x_object = ap.get(NameObject("/N")).get_object()
font = annotation.get(InteractiveFormDictEntries.DA)
rect = annotation.get(AnnotationDictionaryAttributes.Rect)
# Calculate the text position.
font_size = float(font.split(" ")[1])
w = round(float(rect[2] - rect[0] - 2), 2)
h = round(float(rect[3] - rect[1] - 2), 2)
text_position_h = h / 2 - font_size / 3 # approximation
# Create a new XObject stream.
new_stream = f'''
/Tx BMC
q
1 1 {w} {h} re W n
BT
{font}
2 {text_position_h} Td
({new_value}) Tj
ET
Q
EMC
'''
# Add Filter type to XObject.
x_object.update(
{
NameObject(StreamAttributes.FILTER): NameObject(FilterTypes.FLATE_DECODE)
}
)
# Update and encode XObject stream.
x_object._data = FlateDecode.encode(encode_pdfdocencoding(new_stream))
# Update annotation dictionary.
annotation.update(
{
# Update Value.
NameObject(FieldDictionaryAttributes.V): TextStringObject(
new_value
),
# Update Default Value.
NameObject(FieldDictionaryAttributes.DV): TextStringObject(
new_value
),
# Set Read Only flag.
NameObject(FieldDictionaryAttributes.Ff): NumberObject(
FieldFlag(1)
)
}
)
# Clone document root & metadata from template.
# This is required so that the document doesn't try to save before closing.
writer.clone_reader_document_root(template)
# write "output".
with open(f"output.pdf", "wb") as output_stream:
writer.write(output_stream) # type: ignore
Thanks to fidoriel and others from the discussion here: https://github.com/py-pdf/PyPDF2/issues/355.
This is what works for me on Python 3.8 and PyPDF4 (but I think it will work as well with PyPDF2):
#!/usr/bin/env python3
from PyPDF4.generic import NameObject
from PyPDF4.generic import TextStringObject
from PyPDF4.pdf import PdfFileReader
from PyPDF4.pdf import PdfFileWriter
import random
import sys
reader = PdfFileReader(sys.argv[1])
writer = PdfFileWriter()
# Try to "clone" the original one (note the library has cloneDocumentFromReader)
# but the render pdf is blank.
writer.appendPagesFromReader(reader)
writer._info = reader.trailer["/Info"]
reader_trailer = reader.trailer["/Root"]
writer._root_object.update(
{
key: reader_trailer[key]
for key in reader_trailer
if key in ("/AcroForm", "/Lang", "/MarkInfo")
}
)
page = writer.getPage(0)
params = {"Foo": "Bar"}
# Inspired by updatePageFormFieldValues but also handles checkboxes.
for annot in page["/Annots"]:
writer_annot = annot.getObject()
field = writer_annot["/T"]
if writer_annot["/FT"] == "/Btn":
value = params.get(field, random.getrandbits(1))
if value:
writer_annot.update(
{
NameObject("/AS"): NameObject("/On"),
NameObject("/V"): NameObject("/On"),
}
)
elif writer_annot["/FT"] == "/Tx":
value = params.get(field, field)
writer_annot.update(
{
NameObject("/V"): TextStringObject(value),
}
)
with open(sys.argv[2], "wb") as f:
writer.write(f)
This updates text fields and checkboxes.
I believe the key part is copying some parts from the original file:
reader_trailer = reader.trailer["/Root"]
writer._root_object.update(
{
key: reader_trailer[key]
for key in reader_trailer
if key in ("/AcroForm", "/Lang", "/MarkInfo")
}
)
Note: Please feel free to share this solution in other places. I consulted a lot of SO questions related to this topic.
What worked for me was to reopen with pdfrw
The following has worked for me for Adobe Reader, Acrobat, Skim, and Mac OS Preview:
pip install pdfrw
import pdfrw
pdf = pdfrw.PdfReader("<input_name>")
for page in pdf.pages:
annotations = page.get("/Annots")
if annotations:
for annotation in annotations:
annotation.update(pdfrw.PdfDict(AP=""))
pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
pdfrw.PdfWriter().write("<output_name>", pdf)
alepisa's answer was the closest to working for me (thank you, alepisa), but I just had to change one small section
elif writer_annot["/FT"] == "/Tx":
value = params.get(field, field)
writer_annot.update(
This was producing an output where my PDF had the desired fields updated based off the dictionary with field names and values I passed it, but every fillable field, whether I wanted them filled or not, was populated with the name of that fillable field. I changed the elif statement to the one below and everything worked like a charm!
elif writer_annot["/FT"] == "/Tx":
field_value = field_values.get(field_name, "")
writer_annot.update({NameObject("/V"): TextStringObject(field_value),
#This line below is just for formatting
NameObject("/DA"): TextStringObject("/Helv 0 Tf 0 g")})
This nested back into the rest of alepisa's script should work for anybody having issues with getting the output in Acrobat to show the values without clicking on the cell!

Store scrape results and search in results with Python and Pandas?

as part of my Ph.D. research, I am scraping numerous webpages and search for keywords within the scrape results.
This is how I do it thus far:
# load data with as pandas data frame with column df.url
df = pd.read_excel('sample.xls', header=0)
# define keyword search function
def contains_keywords(link, keywords):
try:
output = requests.get(link).text
return int(any(x in output for x in keywords))
except:
return "Wrong/Missing URL"
# define the relevant keywords
mykeywords = ('for', 'bar')
# store search results in new column 'results'
df['results'] = df.url.apply(lambda l: contains_keywords(l, mykeywords))
This works just fine. I only have one problem: the list of relevant keywords mykeywordschanges frequently, whilst the webpages stay the same. Running the code takes a long time, since I request over and over.
I have two questions:
(1) Is there a way to store the results of request.get(link).text?
(2) And if so, how to I search within the saved file(s) producing the same result as with the current script?
As always, thank you for your time and help! /R
You can download the content of the urls and save them in separate files in a directory (eg: 'links')
def get_link(url):
file_name = os.path.join('/path/to/links', url.replace('/', '_').replace(':', '_'))
try:
r = requests.get(url)
except Exception as e:
print("Failded to get " + url)
else:
with open(file_name, 'w') as f:
f.write(r.text)
Then modify the contains_keywords function to read local files, so you won't have to use requests every time you run the script.
def contains_keywords(link, keywords):
file_name = os.path.join('/path/to/links', link.replace('/', '_').replace(':', '_'))
try:
with open(file_name) as f:
output = f.read()
return int(any(x in output for x in keywords))
except Exception as e:
print("Can't access file: {}\n{}".format(file_name, e))
return "Wrong/Missing URL"
Edit: i just added a try-except block in get_link and used absolute path for file_name

Resources