XML parsing with a Class call - python-3.x

I parsed an xml file with xml.etree python module. Its Working well, but now I try to call this code as a module/Class from a main program. I would like to send the xml tree and a filename for the csv writing to the Class.
My dummy file to call the file with the Class:
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='phonebook.xml')
root = tree.getroot()
from xml2csv import Vcard_xml2csv
my_export = Vcard_xml2csv(tree, 'phonebook.csv')
my_export.write_csv()
here is the class:
class Vcard_xml2csv:
"""The XML phone book export to a csv file"""
def __init__(self, tree, csvfilename):
root = tree.getroot()
self.csvfilename = csvfilename
def write_content(contact, csvfilename):
with open(csvfilename, mode='a+') as phonebook_file:
contact_writer = csv.writer(phonebook_file, delimiter=',', quotechar=" ", quoting=csv.QUOTE_MINIMAL)
contact_writer.writerow([contact]) # delete [] if you will see only text separated by a comma
def write_csv(tree):
for contacts in tree.iter(tag='phonebook'):
for contact in contacts.findall("./contact"):
row=[]
for category in contact.findall("./category"):
print('Category: ',category.text)
category=category.text
row.append(category)
for person in contact.findall("./person/realName"):
print('realName: ',person.text)
realName=person.text
row.append(realName)
for mod_time in contact.findall("./mod_time"):
print ('mod_time: ', mod_time.text)
mod_time=mod_time.text
row.append(mod_time)
for uniqueid in contact.findall("./uniqueid"):
print ('uniqueid: ', uniqueid.text)
uniqueid_=uniqueid.text
row.append(uniqueid_)
numberlist=[]
for number in contact.findall("./telephony/number"):
print('id',number.attrib['id'],'type:',number.attrib['type'], 'prio:',number.attrib['prio'], 'number: ',number.text)
id_=number.attrib['id']
numberlist.append(id_)
type_=number.attrib['type']
numberlist.append(type_)
prio_=number.attrib['prio']
numberlist.append(prio_)
number_=number.text
numberlist.append(number_)
contact = row + numberlist
write_content(contact, csvfilename)
numberlist=[]
Iam geeting the ERROR below:
for contacts in tree.iter(tag='phonebook'): AttributeError:
'Vcard_xml2csv' object has no attribute 'iter' Thanks for your help!

When we define a method in a class, like write_csv() in the example, the first parameter is always the class instance. Think of it as a way to access class attributes and methods. Conventionally, for readability, the first parameter is called self.
In the write_csv method, tree has become this class instance and that is the reason you see the error. The resolution to this would be to define the method like the following:
def write_csv(self, tree)
....
....
and the call to the method would be:
my_export.write_csv(tree)
I hope this helps. More about self here

Related

Unsuccessful in trying to convert a column of strings to integers in Python (hoping to sort)

I am attempting to sort a dataframe by a column called 'GameId', which are currently of type string and when I attempt to sort the result is unexpected. I have tried the following but still return a type string.
TEST['GameId'] = TEST['GameId'].astype(int)
type('GameId')
One way to make the data life easier is using dataclasses!
from dataclasses import dataclass
# here will will be calling the dataclass decorator to send hints for data type!
#dataclass
class Columns:
channel_id : int
frequency_hz : int
power_dBmV : float
name : str
# this class will call the data class to organise the data as data.frequency data.power_dBmV etc
class RadioChannel:
radio_values = ['channel_id', 'frequency', 'power_dBmV']
def __init__(self, data): # self is 'this' but for python, it just means that you mean to reference 'this' or self instance
self.data = data # this instances data is called data here
data = Columns(channel_id=data[0], frequency=data[1], power_dBmv=data[4], name=data[3]) # now we give data var a val!
def present_data(self):
# this is optional class method btw
from rich.console import Console
from rich.table import Table
console = Console()
table = Table(title="My Radio Channels")
for item in self.radio_values:
table.add_column(item)
table.add_row(data.channel_id, data.frequency_hz, data.power_dBmv)
console.print(table)
# ignore this if its confusing
# now inside your functional part of your script
if __name__ == '__main__':
myData = []
# calling an imaginary file here to read
with open("my_radio_data_file", 'r') as myfile:
mylines = myfile.readlines()
for line in myline:
myData.append(line)
myfile.close()
#my data would look like a string ["value", value, 00, 0.0, "hello joe, from: world"]
ch1 = radioChannel(data=myData[0])
ch1.present_data()
This way you can just call the class object on each line of a data file. and print it to see if it lines up. once you get the hang of it, it starts to get fun.
I used rich console here, but it works well with pandas and normal dataframes!
dataclasses help the interpreter find its way with type hints and class structure.
Good Luck and have fun!

python: multiple functions or abstract classes when dealing with data flow requirement

I have more of a design question, but I am not sure how to handle that. I have a script preprocessing.py where I read a .csv file of text column that I would like to preprocess by removing punctuations, characters, ...etc.
What I have done now is that I have written a class with several functions as follows:
class Preprocessing(object):
def __init__(self, file):
self.my_data = pd.read_csv(file)
def remove_punctuation(self):
self.my_data['text'] = self.my_data['text'].str.replace('#','')
def remove_hyphen(self):
self.my_data['text'] = self.my_data['text'].str.replace('-','')
def remove_words(self):
self.my_data['text'] = self.my_data['text'].str.replace('reference','')
def save_data(self):
self.my_data.to_csv('my_data.csv')
def preprocessing(file_my):
f = Preprocessing(file_my)
f.remove_punctuation()
f.remove_hyphen()
f.remove_words()
f.save_data()
return f
if __name__ == '__main__':
preprocessing('/path/to/file.csv')
although it works fine, i would like to be able to expand the code easily and have smaller classes instead of having one large class. So i decided to use abstract class:
import pandas as pd
from abc import ABC, abstractmethod
my_data = pd.read_csv('/Users/kgz/Desktop/german_web_scraping/file.csv')
class Preprocessing(ABC):
#abstractmethod
def processor(self):
pass
class RemovePunctuation(Preprocessing):
def processor(self):
return my_data['text'].str.replace('#', '')
class RemoveHyphen(Preprocessing):
def processor(self):
return my_data['text'].str.replace('-', '')
class Removewords(Preprocessing):
def processor(self):
return my_data['text'].str.replace('reference', '')
final_result = [cls().processor() for cls in Preprocessing.__subclasses__()]
print(final_result)
So now each class is responsible for one task but there are a few issues I do not know how to handle since I am new to abstract classes. first, I am reading the file outside the classes, and I am not sure if that is good practice? if not, should i pass it as an argument to the processor function or have another class who is responsible to read the data.
Second, having one class with several functions allowed for a flow, so every transformation happened in order (i.e, first punctuation is removes, then hyphen is removed,...etc) but I do not know how to handle this order and dependency in abstract classes.

Loading a class of unknown name in a dynamic location

Currently I am extracting files to the temp directory of the operating system. One of the files is a Python file containing a class which I need to get a handle of. The Python's file is known, but the name of the class inside the file is unknown. But it is safe to assume, that the there is only one single class, and that the class is a subclass of another.
I tried to work with importlib, but I am not able to get a handle of the class.
So far I tried:
# Assume
# module_name contains the name of the class and -> "MyClass"
# path_module contains the path to the python file -> "../Module.py"
spec = spec_from_file_location(module_name, path_module)
module = module_from_spec(spec)
for pair in inspect.getmembers(module):
print(f"{pair[1]} is class: {inspect.isclass(pair[1])}")
When I iterate over the members of the module, none of them get printed as a class.
My class in this case is called BasicModel and the Output on the console looks like this:
BasicModel is class: False
What is the correct approach to this?
Edit:
As the content of the file was requested, here you go:
class BasicModel(Sequential):
def __init__(self, class_count: int, input_shape: tuple):
Sequential.__init__(self)
self.add(Input(shape=input_shape))
self.add(Flatten())
self.add(Dense(128, activation=nn.relu))
self.add(Dense(128, activation=nn.relu))
self.add(Dense(class_count, activation=nn.softmax))
Use dir() to get the attributes of the file and inspect to check if the attribute is a class. If so, you can create an object.
Assuming that your file's path is /tmp/mysterious you can do this:
import importlib
import inspect
from pathlib import Path
import sys
path_pyfile = Path('/tmp/mysterious.py')
sys.path.append(str(path_pyfile.parent))
mysterious = importlib.import_module(path_pyfile.stem)
for name_local in dir(mysterious):
if inspect.isclass(getattr(mysterious, name_local)):
print(f'{name_local} is a class')
MysteriousClass = getattr(mysterious, name_local)
mysterious_object = MysteriousClass()

How to store in variable function returning value (kivy properties)

class Data(object):
def get_key_nicks(self):
'''
It returns key and nicks object
'''
file = open(self.key_address, 'rb')
key = pickle.load(file)
file.close()
file = open(self.nicks_address, 'rb')
nicks = pickle.load(file)
file.close()
return (key, nicks)
Above is the data api and function which i want to use in kivy
class MainScreen(FloatLayout):
data = ObjectProperty(Data())
key, nicks = ListProperty(data.get_key_nicks())
it gives error like: AttributeError: 'kivy.properties.ObjectProperty' object has no attribute 'get_key_nicks'
Properties are descriptors, which basically means they look like normal attributes when accessed from instances of the class, but at class level they are objects on their own. That's the nature of the problem here - at class level data is an ObjectProperty, even though if you access it from an instance of the class you'll get your Data() object that you passed in as the default value.
That said, I don't know what your code is actually trying to do, do you want key and nicks to be separate ListProperties?
Could you expand a bit more on what you're trying to do?
I think all you actually need to do is:
class MainScreen(FloatLayout):
data = ObjectProperty(Data())
def get_key_nicks(self):
return data.get_key_nicks()

Why do I get an error for incorrect number of arguments?

I have:
import datetime
class Animal(object):
def __init__(self, dob, carnivore):
self.__dob = dob
self.__carnivore = carnivore
#property
def dob(self):
return self.__dob
#dob.setter
def dob(self, dob):
self.__dob = dob
#property
def carnivore(self):
return self.__carnivore
#carnivore.setter
def carnivore(self, carnivore):
self.__carnivore = carnivore
def __str__(self):
return "DOB: " + str(self.__dob) + "\nCarnivore: " + str(self.__carnivore)
My second class:
import Species.Animal as Animal
import datetime as date
class Amphibian(Animal):
def __init__(self, dob=date.datetime.now(), carnivore=False, *characteristics):
super(Animal, self).__init__(dob, carnivore)
self.__characteristics = []
for characteristic in characteristics:
self.__characteristics.append(characteristic)
#property
def characteristics(self):
return self.__characteristics
#characteristics.setter
def characteristics(self, characteristic):
self.__characteristics.append(characteristic)
def __str__(self):
characteristics = ""
for characteristic in self.__characteristics:
characteristics += str(characteristic)
return characteristics
Using:
amphibian = Amphibian(date.date(1979, 1, 12), True, "BackBone", "Cold Blooded")
print(amphibian)
I get the error:
Traceback (most recent call last): File
"C:/Users/Daniel/PycharmProjects/ObjectOrientedSpecies/Species/Amphibian.py",
line 7, in
class Amphibian(Animal): TypeError: module.__init__() takes at most 2 arguments (3 given)
I am new to Python so I'm not sure what good OO Practices are.
There are two issues I can see. Firstly you should call super on the class name of the class you are calling super in (that is a bit of a mouthful) so that line should be super(Amphibian, self).__init__(dob, carnivore) and not super(Animal, self).__init__(dob, carnivore). Python will find the baseclass Animal itself.
However the main problem is that the class Animal is (almost certainly) in a file called "Animal.py". Python makes a module called Animal automagically when it sees a file called "Animal.py" (and does something similar for all other ".py" names. So your class Animal is, in fact inside a module called Animal.
Therefore when you do import Species.Animal as Animal you are importing the module not the class inside it. So when you do class Amphibian(Animal): your Amphibians are inheriting from the class module not from the class Animal. You need to change your import to the following to get the class Animal instead: from Species.Animal import Animal.
In terms of your style I don't really see the point of having lots of decorated getters and setters if there is nothing going on inside them except simple getting and setting. Just get rid of the underscores in front of the atribute names and use them directly. You only need to use the getters and setters if something else has to happen to do the getting and setting.

Resources