Automatically creating separate instances of one class (Python - Excel/CSV) - excel

My goal is, given an Excel or CSV file, to automatically create instances of one object. After some research, I saw that similar questions have been asked. But in most of the cases, the author only wanted to put instances into a list to print information on them (like: Python creating class instances in a loop).
What I need, is not only to create separate instances of a class, but also to be able to call on these distinct instances later in the code. Also, the main point of this, is that my file is dynamic. The one I put just below is just a toy example, my goal being to be able to automatically process bigger and more complex "models".
Let's have an example. Given the following file:
I would like to create different instances of the following object to store the information given in the file:
class Element
name = ""
Property1 = []
Property2 = []
def add_name(self, name):
self.name = name
def add_pos_reg(self, p1):
self.Property1.append(p1)
def add_neg_reg(self, p2):
self.Property2.append(p2)
I thought of using the classic way of instancing an object in a loop, and then stocking the instances in a list:
ListeElement = []
for i in range(2, max_row):
e=Element("get the property from the file") ## I already have a custom function to get the properties from the file into the instance. ##
ListeElement.append(e)
But then, I think that this way does not create distinct instances, and also I am not even sure that I will be able to call on specific instances stocked in the list later in my code.
I am sorry if this is a redundant question, I usually find what I want to do using the search function on this website, but I am getting stuck there.

Related

Unable to access text from a class using selenium on python

I am willing to parse https://2gis.kz , and I encountered the problem that I am getting error while using .text or any methods used to extract text from a class
I am typing the search query such as "fitness"
My window variable is
all_cards = driver.find_elements(By.CLASS_NAME,"_1hf7139")
for card_ in all_cards:
card_.click()
window = driver.find_element(By.CLASS_NAME, "_18lzknl")
This is a quite simplified version of how I open a mini-window with all of the essential information inside it. Below I am attaching the piece of code where I am trying to extract text from a phone number holder.
texts = window.find_elements(By.CLASS_NAME,'_b0ke8')
print(texts) # this prints out something from where I am concluding that this thing is accessible
try:
print(texts.text)
except:
print(".text")
try:
print(texts.text())
except:
print(".text()")
try:
print(texts.get_attribute("innerHTML"))
except:
print('getAttribute("innerHTML")')
try:
print(texts.get_attribute("textContent"))
except:
print('getAttribute("textContent")')
try:
print(texts.get_attribute("outerHTML"))
except:
print('getAttribute("outerHTML")')
Hi, guys, I solved an issue. The .text was not working for some reason. I guess developers somehow managed to protect information from using this method. I used a
get_attribute("innerHTML") # afaik this allows us to get a html code of a particular class
and now it works like a charm.
texts = window.find_elements(By.TAG_NAME, "bdo")
with io.open("t.txt", "a", encoding="utf-8") as f:
for text in texts:
nums = re.sub("[^0-9]", "",
text.get_attribute("innerHTML"))
f.write(nums+'\n')
f.close()
So the problem was that:
I was trying to print a list of items just by using print(texts)
Even when I tried to print each element of texts variable in a for loop, I was getting an error due to the fact that it was decoded in utf-8.
I hope someone will find it useful and will not spend a plethora of time trying to fix such a simple bug.
find_elements method returns a list of web elements. So this
texts = window.find_elements(By.CLASS_NAME,'_b0ke8')
gives you texts a list of web elements.
You can not apply .text method directly on list.
In order to get each element text you will have to iterate over elements in the list and extract that element text, like this:
text_elements = window.find_elements(By.CLASS_NAME,'_b0ke8')
for element in text_elements:
print(element.text)
Also, I'm not sure about locators you are using.
_1hf7139, _18lzknl and _b0ke8 class names are seem to be dynamic class names i.e they may change each browsing session.

Creating SequenceTaggingDataset from list, not file

I would like to create a SequenceTaggingDataset from two lists that I have created dynamically inside my code - train_sentences and train_tags. I would want to write something like this:
train_data = SequenceTaggingDataset(examples=(zip(train_sentences, train_tags)))
However, the constructor must receive a path. And not only that - it looks from the code as though, even if I were to provide the examples, it will override those, and initialize examples to be an empty list.
For various reasons, I do not want to save the lists I created in a file from which the SequenceTaggingDataset could read. Is there any way around this, save defining my own custom class?
You will need to modify source code for it (https://pytorch.org/text/_modules/torchtext/datasets/sequence_tagging.html#SequenceTaggingDataset). You can make a local copy and import as your module.
path is used in __init__. The important part is that it takes lines from file and splits it using given separator into list named columns. Then this columns list is being fed into another class method together with fields to construct examples list. Please read provided example here to understand fields (Note that UDPOS is called there to create SequenceTaggingDataset).
What you need is columns, which you don't need to read from file as you have all components already. You will feed it directly by simplifying class __init__:
def __init__(self, columns, fields, encoding="utf-8", separator="\t", **kwargs):
examples = []
examples.append(data.Example.fromlist(columns, fields))
super(SequenceTaggingDataset, self).__init__(examples, fields,
**kwargs)
columns is nested list of lists: [[word], [UD_TAG], [PTB_TAG]]. It means that you need to feed following into modified class:
train = SequenceTaggingDataset([train_sentences, train_tags], fields=...)

Call multiple strings inside variable

Python scrub here.
Im still learning python so sorry.
Im trying to create a Dict(i think) that then behaves as a variable called fileshare and then want to call each entry inside the variable called fileshareARN. So basically inside the AWS ARN I want each share to be called. for example I want share-A, share-B, etc to be called each time. Im guessing I need to setup a function or a IF statement but im not sure.
import boto3
client = boto3.client('storagegateway')
fileshare = [share-A, share-B, share-C, share-D]
response = client.refresh_cache(
FileShareARN='arn:aws:storagegateway:us-west-1:AWS-ID:share/{Fileshare-variable, share-ID should go here}.format',
FolderList=['/'],
Recursive=True
)
You're very close! A few notes to preface the answer to assist you on your Python journey:
Python does not allow hyphenated variable names, as hyphens are a reserved operator for subtraction. You only had them listed as placeholders, but figured it would be helpful to know.
lists, arrays, and dictionaries are all different data structures in Python. You can read about them more here: https://docs.python.org/3/tutorial/datastructures.html , but for your particular use case, if you're simply trying to store a collection of variables and iterate over them, a list or array work fine (although a dictionary is usable as well).
In Python, lists and arrays are iterables, which means that they have built-in functions that can naturally be iterated over to sequentially access their constituent values.
Let's go over an example using the following array:
fruits = ['apples','bananas','oranges'],
In other languages, you're probably used to having to define your own loop with the following syntax:
for (int i = 0; i < sizeOf(fruits); i++)
{
print(fruits[i]);
}
Python enables this same functionality much more easily.
for item in fruits:
print(item)
Here, the scope of the term item within the loop is equal to the value that exists at that current index in the array (fruits).
Now, to perform this same functionality for your example, we can use this same technique to loop over your list of ARNs:
import boto3
client = boto3.client('storagegateway')
fileshare = [shareA, shareB, shareC, shareD]
for path in fileshare:
response = client.refresh_cache(
FileShareARN='arn:aws:storagegateway:us-west-1:AWS-ID:share/'+path,
FolderList=['/'],
Recursive=True
)
After changing the placeholder variables you had in fileshare, I wrapped the existing response variable execution with a for loop, and made a slight change to the string appending at the end of your FileShareARN variable.
Hope this helps, and welcome to Python!
Did some more research, found f.string formatting which seems to make python life easy. Also since I am deploying this in AWS Lambda I added a handler.
#!/usr/bin/env python3
import boto3
def default_handler( event, context ):
print(boto3.client('sts').get_caller_identity())
client = boto3.client('storagegateway')
fileshare = ['share-A', 'share-B', 'share-C', 'share-D']
for path in fileshare:
response = client.refresh_cache(
FileShareARN = f"arn:aws:storagegateway:us-west-1:ARN-ID:share/{path}",
FolderList=['/'],
Recursive=True
)
print(response)
default_handler( None, None )

Inheritance and Pandas

I am trying to create a file writer based on Pandas' ExcelWriter. I proceeded as I usually do with classes in Python (3) with inheritance:
import pandas as pd
class Writer(pd.ExcelWriter):
def __init__(self, fname, engine='openpyxl'):
pd.ExcelWriter.__init__(self, fname, engine=engine)
self.newvar = 0
However, when I try to use it, I cannot access newvar:
test = Writer('test.xlsx')
test.newvar
returns:
AttributeError: '_XlsxWriter' object has no attribute 'nmax'
And when I check the type of test, it returns:
pandas.io.excel._XlsxWriter
I don't understand what I am missing since I used this kind of inheritance in many other cases. Any idea would be appreciated!
This is because pandas.ExcelWriter.__new__ returns a different class than itself (actually it is an abc.ABCMeta). The class is chosen based on the extension of the file path and the engine which is used - you could observe that when you checked the type of the newly created instance. That means the __init__ method of whatever class is returned gets called. You can think of ExcelWriter as some kind of proxy for the specific writers for each format and engine (though it also defines the API which such a writer must provide).
In order to make your writer available (for the given engine), you need to register it.
But before you can do that you need to make your class compatible by following the instructions which you'll find via help(pandas.ExcelWriter). For the sake of completeness I cite them here:
# Defining an ExcelWriter implementation (see abstract methods for more...)
# - Mandatory
# - ``write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)``
# --> called to write additional DataFrames to disk
# - ``supported_extensions`` (tuple of supported extensions), used to
# check that engine supports the given extension.
# - ``engine`` - string that gives the engine name. Necessary to
# instantiate class directly and bypass ``ExcelWriterMeta`` engine
# lookup.
# - ``save(self)`` --> called to save file to disk
# - Mostly mandatory (i.e. should at least exist)
# - book, cur_sheet, path
# - Optional:
# - ``__init__(self, path, engine=None, **kwargs)`` --> always called
# with path as first argument.
So with that in mind we can extend your class:
class Writer(pd.ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('xlsx',)
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
# Implement something useful here.
pass
def save(self):
# Implement something useful here.
pass
def __init__(self, fname, engine='openpyxl', **kwargs):
super().__init__(self, fname, engine=engine, **kwargs)
Now you can use pd.io.excel.register_writer(Writer) to register the writer. But you need to make sure the engine which you've specified matches your version of openpyxl. You can check the process of how a specific writer is chosen here; the writers which are currently registered for each version can be checked via print(pd.io.excel._writers).
As a side note: You can also subclass one of the already available specific writers and reuse their write_cells and save methods for example (however you'll need to register your writer also in that case):
_Openpyxl1Writer
_Openpyxl20Writer
_Openpyxl22Writer
_XlwtWriter
_XlsxWriter

how structure a python3/tkinter project

I'm developing a small application using tkinter and PAGE 4.7 for design UI.
I had designed my interface and generated python source code. I got two files:
gm_ui_support.py: here declare tk variables
gm_ui.py : here declare widget for UI
I'm wondering how this files are supposed to be use, one of my goals is to be able to change the UI as many times as I need recreating this files, so if I put my code inside any of this files will be overwritten each time.
So, my question is:
Where I have to put my own code? I have to extend gm_ui_support? I have to create a 3th class? I do directly at gm_ui_support?
Due the lack of answer I'm going to explain my solution:
It seems that is not possible to keep both files unmodified, so I edit gm_ui_support.py (declaration of tk variables and events callback). Each time I make a change that implies gm_ui_support.py I copy changes manually.
To minimize changes on gm_ui_support I create a new file called gm_control.py where I keep a status dict with all variables (logical and visual) and have all available actions.
Changes on gm_ui_support.py:
I create a generic function (sync_control) that fills my tk variables using a dict
At initialize time it creates my class and invoke sync_control (to get default values defined in control)
On each callback I get extract parameter from event and invoke logical action on control class (that changes state dict), after call to sync_control to show changes.
Sample:
gm_ui_support.py
def sync_control():
for k in current_gm_control.state:
gv = 'var_'+k
if gv in globals():
#print ('********** found '+gv)
if type(current_gm_control.state[k]) is list:
full="("
for v in current_gm_state.state[k]:
if len(full)>1: full=full+','
full=full+"'"+v+"'"
full=full+")"
eval("%s.set(%s)" % (gv, full))
else:
eval("%s.set('%s')" % (gv, current_gm_state.state[k]))
else:
pass
def set_Tk_var():
global current_gm_state
current_gm_control=gm_control.GM_Control()
global var_username
var_username = StringVar()
...
sync_control()
...
def on_select_project(event):
w = event.widget
index = int(w.curselection()[0])
value = w.get(index)
current_gm_control.select_project(value)
sync_state()
...

Resources