Edit Data of Unpickled Objects

Edit Data of Unpickled Objects - python-3.x

I am attempting to use pickle to save a dictionary that includes class instances since json, which I usually use, cannot save class instances. I plan to be able to load them and be able to modify the data I have loaded. Whenever I run a function from the class instances, that would normally change their data, it doesn't. They are readable and can still run their functions, but the data just doesn't change. Below I have put the code for loading the data, with players being a list of the?e class instances, and I put a snippet of code from inside one of the class functions. How do I fix this
species = pickle.load(open("species.obj","rb"))
vLanguage = species["lang"]
players = species["instances"]
speciesName = species["species"]
# In separate file imported before pickle data is loaded
def turn(self,env):
if (randint(0,100)/100) < env.wound:
self.wound = True
if log:
print(f"{self.id} was wounded searching for {action}")
# If this came out as true, and self.wound was False, it would stay as False

Related

How to open multiple persistent windows in PyQt?

I am trying to display JSON metadata in PyQt6/PySide6 QTreeView. I want to generalize for the case where multiple persistent windows (QtWidgets) pop up if my JSON metadata list has a length greater than 1.
for example:
def openTreeWidget(app, jmd):
view = QTreeView()
model = JsonModel()
view.setModel(model)
model.load(jmd)
app.w = view # app = `self` of a QMainWindow instance
app.w.show()
for md in jsonMetadataList:
openTreeWidget(self, md)
where TreeItem and JsonModel are based on: https://doc.qt.io/qtforpython/tutorials/basictutorial/treewidget.html
I stole the app.w idea from: https://www.pythonguis.com/tutorials/pyqt6-creating-multiple-windows/
In the current case, all pop ups (except one) close after momentarily opening. Only the last item in jsonMetadataList remains displayed in a persistent window. I believe that somehow I am not keeping the reference to previous windows and reopening/rewriting data on a single widget. How can I keep the reference?
Also, I am very new to PyQt/PySide so I'm just doing things no matter how ugly they look at the moment. This will, of course, get better with time :);

I managed to bodge it up by not destroying the reference. Here's how I did it.
def openTreeWidget(app, jmd):
"""
app is the parent QWidget (here, a QMainWindow)
jmd is JSON metadata stored in a string
"""
view = QTreeView()
model = JsonModel()
view.setModel(model)
model.load(jmd)
return view
# `self` of a QMainWindow instance
self.temp = [None]*len(jsonMetadataList) # a list storing individual handles for all JSON metadata entries
for ii, md in enumerate(jsonMetadataList):
self.temp[ii] = openTreeWidget(self, md) # get the reference for QTreeView and store it in temp[ii]
self.temp[ii].show() # show the ii-th metadata in QTreeView
Better ideas are still welcome :)

QCheckbox issue [duplicate]

I am struggling to get this working.
I tried to transpose from a c++ post into python with no joy:
QMessageBox with a "Do not show this again" checkbox
my rough code goes like:
from PyQt5 import QtWidgets as qtw
...
mb = qtw.QMessageBox
cb = qtw.QCheckBox
# following 3 lines to get over runtime errors
# trying to pass the types it was asking for
# and surely messing up
mb.setCheckBox(mb(), cb())
cb.setText(cb(), "Don't show this message again")
cb.show(cb())
ret = mb.question(self,
'Close application',
'Do you really want to quit?',
mb.Yes | mb.No )
if ret == mb.No:
return
self.close()
the above executes with no errors but the checkbox ain't showing (the message box does).
consider that I am genetically stupid... and slow, very slow.
so please go easy on my learning curve

When trying to "port" code, it's important to know the basis of the source language and have a deeper knowledge of the target.
For instance, taking the first lines of your code and the referenced question:
QCheckBox *cb = new QCheckBox("Okay I understand");
The line above in C++ means that a new object (cb) of type QCheckBox is being created, and it's assigned the result of QCheckBox(...), which returns an instance of that class. To clarify how objects are declared, here's how a simple integer variable is created:
int mynumber = 10
This is because C++, like many languages, requires the object type for its declaration.
In Python, which is a dynamic typing language, this is not required (but it is possible since Python 3.6), but you still need to create the instance, and this is achieved by using the parentheses on the class (which results in calling it and causes both calling __new__ and then __init__). The first two lines of your code then should be:
mb = qtw.QMessageBox()
cb = qtw.QCheckBox()
Then, the problem is that you're calling the other methods with new instances of the above classes everytime.
An instance method (such as setCheckBox) is implicitly called with the instance as first argument, commonly known as self.
checkboxInstance = QCheckBox()
checkboxInstance.setText('My checkbox')
# is actually the result of:
QCheckBox.setText(checkboxInstance, 'My checkbox')
The last line means, more or less: call the setText function of the class QCheckBox, using the instance and the text as its arguments.
In fact, if QCheckBox was an actual python class, setText() would look like this:
class QCheckBox:
def setText(self, text):
self.text = text
When you did cb = qtw.QCheckBox you only created another reference to the class, and everytime you do cb() you create a new instance; the same happens for mb, since you created another reference to the message box class.
The following line:
mb.setCheckBox(mb(), cb())
is the same as:
QMessageBox.setCheckBox(QMessageBox(), QCheckBox())
Since you're creating new instances every time, the result is absolutely nothing: there's no reference to the new instances, and they will get immediately discarded ("garbage collected", aka, deleted) after that line is processed.
This is how the above should actually be done:
mb = qtw.QMessageBox()
cb = qtw.QCheckBox()
mb.setCheckBox(cb)
cb.setText("Don't show this message again")
Now, there's a fundamental flaw in your code: question() is a static method (actually, for Python, it's more of a class method). Static and class methods are functions that don't act on an instance, but only on/for a class. Static methods of QMessageBox like question or warning create a new instance of QMessageBox using the provided arguments, so everything you've done before on the instance you created is completely ignored.
These methods are convenience functions that allow simple creation of message boxes without the need to write too much code. Since those methods only allow customization based on their arguments (which don't include adding a check box), you obviously cannot use them, and you must code what they do "under the hood" explicitly.
Here is how the final code should look:
# create the dialog with a parent, which will make it *modal*
mb = qtw.QMessageBox(self)
mb.setWindowTitle('Close application')
mb.setText('Do you really want to quit?')
# you can set the text on a checkbox directly from its constructor
cb = qtw.QCheckBox("Don't show this message again")
mb.setCheckBox(cb)
mb.setStandardButtons(mb.Yes | mb.No)
ret = mb.exec_()
# call some function that stores the checkbox state
self.storeCloseWarning(cb.isChecked())
if ret == mb.No:
return
self.close()

Creating custom component in SpaCy

I am trying to create SpaCy pipeline component to return Spans of meaningful text (my corpus comprises pdf documents that have a lot of garbage that I am not interested in - tables, headers, etc.)
More specifically I am trying to create a function that:
takes a doc object as an argument
iterates over the doc tokens
When certain rules are met, yield a Span object
Note I would also be happy with returning a list([span_obj1, span_obj2])
What is the best way to do something like this? I am a bit confused on the difference between a pipeline component and an extension attribute.
So far I have tried:
nlp = English()
Doc.set_extension('chunks', method=iQ_chunker)
####
raw_text = get_test_doc()
doc = nlp(raw_text)
print(type(doc._.chunks))
>>> <class 'functools.partial'>
iQ_chunker is a method that does what I explain above and it returns a list of Span objects
this is not the results I expect as the function I pass in as method returns a list.

I imagine you're getting a functools partial back because you are accessing chunks as an attribute, despite having passed it in as an argument for method. If you want spaCy to intervene and call the method for you when you access something as an attribute, it needs to be
Doc.set_extension('chunks', getter=iQ_chunker)
Please see the Doc documentation for more details.
However, if you are planning to compute this attribute for every single document, I think you should make it part of your pipeline instead. Here is some simple sample code that does it both ways.
import spacy
from spacy.tokens import Doc
def chunk_getter(doc):
# the getter is called when we access _.extension_1,
# so the computation is done at access time
# also, because this is a getter,
# we need to return the actual result of the computation
first_half = doc[0:len(doc)//2]
secod_half = doc[len(doc)//2:len(doc)]
return [first_half, secod_half]
def write_chunks(doc):
# this pipeline component is called as part of the spacy pipeline,
# so the computation is done at parse time
# because this is a pipeline component,
# we need to set our attribute value on the doc (which must be registered)
# and then return the doc itself
first_half = doc[0:len(doc)//2]
secod_half = doc[len(doc)//2:len(doc)]
doc._.extension_2 = [first_half, secod_half]
return doc
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser", "ner"])
Doc.set_extension("extension_1", getter=chunk_getter)
Doc.set_extension("extension_2", default=[])
nlp.add_pipe(write_chunks)
test_doc = nlp('I love spaCy')
print(test_doc._.extension_1)
print(test_doc._.extension_2)
This just prints [I, love spaCy] twice because it's two methods of doing the same thing, but I think making it part of your pipeline with nlp.add_pipe is the better way to do it if you expect to need this output on every document you parse.

Two python(3) classes with the same input (extending a python class)

I have a design change I have been trying to implement with little success, as I can't seem to find my question anywhere.
Currently I have a python class that creates a database connection, stores the index name (table), and other attributes (specifically its an Elasticsearch database connection but that shouldn't matter for this question).
class Create:
# Functions to manipulate Index Objects
def __init__(self, index, type, host, shards=3, replicas=1):
# Create Index Object (OcrBook or OcrPage)
self.index = index
self.type = type
self.shards = shards
self.replicas = replicas
self.es_connection = Elasticsearch([{'host': host, 'port': 9200}])
Associated with this class are functions to manipulate the index objects, for example creating that index (table) on the database (cluster) or modifying that table in some way.
def create_index(self):
# Creates/Executes Index
try:
self.es_connection.indices.create(
index=self.index,
body={
'settings' : {
'number_of_shards' : self.shards,
'number_of_replicas': self.replicas,
}
})
except Exception:
CreateLog.write_log(Exception, 'Create Index Exception')
These being in the same class make sense to me, as the connection to the table/database and creating or modifying that table/database are connected to each other.
I also have a group of other functions that search that particular table. These I believe should be in a separate class as rather then creating or modifying the table/database they are simply searching the table/database and could ideally take any table/database initialized by the create class. Currently I tried breaking them up by doing the following:
class Search(Create):
def find_book(self, bookkey):
""" Finds a Book """
try:
results = self.es_connection.search(self.index, self.type, body={
"query": {
"match": {
"BookKey": bookkey
}
}
})
return results['hits']['hits']
except Exception:
CreateLog.write_log(Exception, 'Could Not Find Book')
This works on windows, but is not portable to 'linux' as the 'class has not been initialized' when I try to use the Search functionality. I know there is a design problem here, and I could combine both classes into one to fix the problem. But I would like to keep them separate. Is there a better way to 'inherit' (I don't believe that's the right word in this case) the object created in the 'Create' class by the search class, does anyone have a better way to separate these logically, or is there a better way to extend the create class with the search functionality? All input is helpful! Thank you.

You seem to be on a OOP path, but why exactly does Search has to be a class? You have a perfect task for a stand-alone function find_book(index_object, bookkey). It does not store anything internally, I do no see why this has to de a class, not a function.
Class naming can also be hinting at your design decisions (or problems). Class name is ususlly a noun, function name tends to be a verb. Create is not a perfect class name to me.
In your setting I'd go with a class IndexObjects (that is Create renamed) and function find_book(index_object, bookkey). You can switch to more of OOP once this design up and running.
Another split of responsibilities that comes to mind is below. Here you inject, not inherit, which allows you to parts more independent.
class IndexObject:
# ...
def query(self, query_dict):
return self.es_connection.search(self.index, self.type, body=query_dict)
class BookSearcher():
def __init__(self, index_object):
self.index_object = index_object
def find(self, book_key):
""" Finds a Book """
query_dict = {"query": {
"match": {
"BookKey": book_key
}
}
}
try:
results = self.index_object.query(query_dict)
return results['hits']['hits']
# FIXME: looks lile bare exception, not great
except Exception:
CreateLog.write_log(Exception, 'Could Not Find Book')

locks needed for multithreaded python scraping?

I have a list of zipcodes that I want to pull business listings for using the yelp fusion api. Each zipcode will have to make at least one api call ( often much more) and so, I want to be able to keep track of my api usage as the daily limit is 25000. I have defined each zipcode as an instance of user defined Locale class. This locale class has a class variable Locale.pulls, which acts as a global counter for the number of pulls.
I want to multithread this using the multiprocessing module but I am not sure if I need to use locks and if so, how would I do so? The concern is race conditions as I need to be sure each thread sees the current number of pulls defined as the Zip.pulls class variable in the pseudo code below.
import multiprocessing.dummy as mt
class Locale():
pulls = 0
MAX_PULLS = 20000
def __init__(self,x,y):
#initialize the instance with arguments needed to complete the API call
def pull(self):
if Locale.pulls > MAX_PULLS:
return none
else:
# make the request, store the returned data and increment the counter
self.data = self.call_yelp()
Locale.pulls += 1
def main():
#zipcodes below is a list of arguments needed to initialize each zipcode as a Locale class object
pool = mt.Pool(len(zipcodes)/100) # let each thread work on 100 zipcodes
data = pool.map(Locale, zipcodes)

A simple solution would be to check that len(zipcodes) < MAP_PULLS before running the map().

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Edit Data of Unpickled Objects - python-3.x

Related

How to open multiple persistent windows in PyQt?

QCheckbox issue [duplicate]

Creating custom component in SpaCy

Two python(3) classes with the same input (extending a python class)

locks needed for multithreaded python scraping?

Categories

Resources