Read data from csv into list of class objects - Python - python-3.x

I'm having trouble figuring this out. Basically I have a .csv file that has 7 employees with their first and last names, employee ID, dept #, and job title. My goal is for def readFile(employees) to accept an empty List (called employees), open the file for reading, and load all the employees from the file into a List of employee objects (employees). I already have my class built as:
class Employee:
def __init__(self, fname, lname, eid, dept, title):
self.__firstName = fname
self.__lastName = lname
self.__employeeID = int(eid)
self.__department = int(dept)
self.__title = title
I have a couple other class methods, but basically I don't quite understand how to properly load the file into a list of objects.

I was able to figure this out. I opened the file and then read a line from it, stripping the \n and splitting my data. I used a while loop to keep reading lines, as long as it wasn't an empty line, and appended it to my empty list. I also had to split the first indexed item as it was first and last name together in the same string and I needed them separate.
def readFile(employees):
with open("employees.csv", "r") as f:
line = f.readline().strip().split(",")
while line != ['']:
line = line[0].split(" ") + line[1:]
employees.append(Employee(line[0], line[1], line[2], line[3], line[4]))
line = f.readline().strip().split(",")
It most likely could be written better and more pythonic but it does what I need it to do.

Why don’t use pandas. So you define an employee pandas object and use their index for select each employee and the name of each column for select an specific employee attribute.

Related

Add column and values to CSV or Dataframe

Brand new to Python and programming. I have a function that extracts a file creation date from .csv files (the date is included the file naming convention):
def get_filename_dates(self):
"""Extract date from filename and place it into a list"""
for filename in self.file_list:
try:
date = re.search("([0-9]{2}[0-9]{2}[0-9]{2})",
filename).group(0)
self.file_dates.append(date)
self.file_dates.sort()
except AttributeError:
print("The following files have naming issues that prevented "
"date extraction:")
print(f"\t{filename}")
return self.file_dates
The data within these files are brought into a DataFrame:
def create_df(self):
"""Create DataFrame from list of files"""
for i in range(0, len(self.file_dates)):
self.agg_data = pd.read_csv(self.file_list[i])
self.agg_data.insert(9, 'trade_date', self.file_dates[i],
allow_duplicates=False)
return self.agg_data
As each file in file_list is worked with, I need to insert its corresponding date into a new column (trade_date).
As written here, the value of the last index in the list returned by get_filename_dates() is duplicated into every row of the trade_date column. -- presumably because read_csv() opens and closes each file before the next line.
My questions:
Is there an advantage to inserting data into the csv file using with open() vs. trying to match each file and corresponding date while iterating through files to create the DataFrame?
If there is no advantage to with open(), is there a different Pandas method that would allow me to manipulate the data as the DataFrame is created? In addition to the data insertion, there's other clean-up that I need to do. As it stands, I wrote a separate function for the clean-up; it's not complex and would be great to run everything in this one function, if possible.
Hope this makes sense -- thank you
You could grab each csv as an intermediate dataframe, do whatever cleaning you need to do, and use pd.concat() to concatenate them all together as you go. Something like this:
def create_df(self):
self.agg_data = pd.DataFrame()
"""Create DataFrame from list of files"""
for i, date in enumerate(self.file_dates):
df_part = pd.read_csv(self.file_list[i])
df_part['trade_date'] = date
# --- Any other individual file level cleanup here ---
self.agg_data = pd.concat([self.agg_data, df_part], axis=0)
# --- Any aggregate-level cleanup here
return self.agg_data
It makes sense to do as much of the preprocessing/cleanup as possible on the aggregated level as you can.
I also went to the liberty of converting the for-loop to use the more pythonic enumerate

Splitting a list entry in Python

I am importing a CSV file into a list in Python. When I split it into list elements then print a index,the entry is printed like this.
2000-01-03,3.745536,4.017857,3.631696,3.997768,2.695920,133949200
How would I split this list so if I wanted to just print a solo element like this?
2000-01-03Here is my code so far.
def main():
list = []
filename = "AAPL.csv"
with open(filename) as x:
for line in x.readlines():
val = line.strip('\n').split(',')
list.append(val)
print(list[2])
Your current code build a list of lists, precisely a list (of rows) of lists (of fields).
To extract one single element, say first field of third row, you could do:
...
print(list[2][0])
But except for trivial tasks, you should use the csv module when processing csv file, because it is robust to corner cases like newlines or field separarors contained in fields. Your code could become:
def main():
list = []
filename = "AAPL.csv"
with open(filename) as x:
rd = csv.reader(x)
for val in rd: # the reader is an iterator of lists of fields
list.append(val)
print(list[2][0])

Can't perform reverse web search from a csv file

I've written some code to scrape "Address" and "Phone" against some shop names which is working fine. However, it has got two parameters to be filled in to perform it's activity. I expected to do the same from a csv file where "Name" will be in first column and "Lid" will be in second column and the harvested results will be placed across third and fourth column accordingly. At this point, I can't get any idea as to how I can perform the search from a csv file. Any suggestion will be vastly appreciated.
import requests
from lxml import html
Names=["Literati Cafe","Standard Insurance Co","Suehiro Cafe"]
Lids=["3221083","497670909","12183177"]
for Name in Names and Lids:
Page_link="https://www.yellowpages.com/los-angeles-ca/mip/"+Name.replace(" ","-")+"-"+Name
response = requests.get(Page_link)
tree = html.fromstring(response.text)
titles = tree.xpath('//article[contains(#class,"business-card")]')
for title in titles:
Address= title.xpath('.//p[#class="address"]/span/text()')[0]
Contact = title.xpath('.//p[#class="phone"]/text()')[0]
print(Address,Contact)
You can get your Names and Lids lists from CSV like:
import csv
Names, Lids = [], []
with open("file_name.csv", "r") as f:
reader = csv.DictReader(f)
for line in reader:
Names.append(line["Name"])
Lids.append(line["Lid"])
(nevermind PEP violations for now ;)). Then you can use it in the rest of your code, although I'm not sure what you are trying to achieve with your for Name in Names and Lids: loop but it's not giving you what you think it is - it will not loop through the Names list but only through the Lids list.
Also the first order of optimization should be to replace your loop with the loop over the CSV, like:
with open("file_name.csv", "r") as f:
reader = csv.DictReader(f)
for entry in reader:
page_link = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
# rest of your scraping code...

convert data string to list

I'm having some troubles processing some input.
I am reading data from a log file and store the different values according to the name.
So my input string consists of ip, name, time and a data value.
A log line looks like this and it has \t spacing:
134.51.239.54 Steven 2015-01-01 06:09:01 5423
I'm reading in the values using this code:
loglines = file.splitlines()
data_fields = loglines[0] # IP NAME DATE DATA
for loglines in loglines[1:]:
items = loglines.split("\t")
ip = items[0]
name = items[1]
date = items[2]
data = items[3]
This works quite well but I need to extract all names to a list but I haven't found a functioning solution.
When i use print name i get:
Steven
Max
Paul
I do need a list of the names like this:
['Steven', 'Max', 'Paul',...]
There is probably a simple solution and i haven't figured it out yet, but can anybody help?
Thanks
Just create an empty list and add the names as you loop through the file.
Also note that if that file is very large, file.splitlines() is probably not the best idea, as it reads the entire file into memory -- and then you basically copy all of that by doing loglines[1:]. Better use the file object itself as an iterator. And don't use file as a variable name, as it shadows the type.
with open("some_file.log") as the_file:
data_fields = next(the_file) # consumes first line
all_the_names = [] # this will hold the names
for line in the_file: # loops over the rest
items = line.split("\t")
ip, name, date, data = items # you can put all this in one line
all_the_names.append(name) # add the name to the list of names
Alternatively, you could use zip and map to put it all into one expression (using that loglines data), but you rather shouldn't do that... zip(*map(lambda s: s.split('\t'), loglines[1:]))[1]

text file reading and writing, ValueError: need more than 1 value to unpack

I need to make a program in a single def that opens a text file 'grades' where first, last and grade are separated by comas. Each line is a separate student. Then it displays students and grades as well as class average. Then goes on to add another student and grade and saves it to the text file while including the old students.
I guess I just don't understand the way python goes through the text file. If i comment out 'lines' I see it prints the old_names but its as if everything is gone after. When lines is not commented out 'old_names' is not printed which makes me think the file is closed? or empty? however everything is still in the txt file as it should be.
currently i get this error.... Which I am pretty sure is telling me I'm retarded there's no information in 'line'
File "D:\Dropbox\Dropbox\1Python\Batch Processinga\grades.py", line 45, in main
first_name[i], last_name[i], grades[i] = line.split(',')
ValueError: need more than 1 value to unpack
End goal is to get it to give me the current student names and grades, average. Then add one student, save that student and grade to file. Then be able to pull the file back up with all the students including the new one and do it all over again.
I apologize for being a nub.
def main():
#Declare variables
#List of strings: first_name, last_name
first_name = []
last_name = []
#List of floats: grades
grades = []
#Float grade_avg, new_grade
grade_avg = new_grade = 0.0
#string new_student
new_student = ''
#Intro
print("Program displays information from a text file to")
print("display student first name, last name, grade and")
print("class average then allows user to enter another")
print("student.\t")
#Open file “grades.txt” for reading
infile = open("grades.txt","r")
lines = infile.readlines()
old_names = infile.read()
print(old_names)
#Write for loop for each line creating a list
for i in len(lines):
#read in line
line = infile.readline()
#Split data
first_name[i], last_name[i], grades[i] = line.split(',')
#convert grades to floats
grades[i] = float(grades[i])
print(first_name, last_name, grades)
#close the file
infile.close()
#perform calculations for average
grade_avg = float(sum(grades)/len(grades))
#display results
print("Name\t\t Grade")
print("----------------------")
for n in range(5):
print(first_name[n], last_name[n], "\t", grades[n])
print('')
print('Average Grade:\t% 0.1f'%grade_avg)
#Prompt user for input of new student and grade
new_student = input('Please enter the First and Last name of new student:\n').title()
new_grade = eval(input("Please enter {}'s grade:".format(new_student)))
#Write new student and grade to grades.txt in same format as other records
new_student = new_student.split()
new_student = str(new_student[1] + ',' + new_student[0] + ',' + str(new_grade))
outfile = open("grades.txt","w")
print(old_names, new_student ,file=outfile)
outfile.close()enter code here
File objects in Python have a "file pointer", which keeps track of what data you've already read from the file. It uses this to know where to start looking when you call read or readline or readlines. Calling readlines moves the file pointer all the way to the end of the file; subsequent read calls will return an empty string. This explains why you're getting a ValueError on the line.split(',') line. line is an empty string, so line.split(",") returns a list of length 0, but you need a list of length 3 to do the triple assignment you're attempting.
Once you get the lines list, you don't need to interact with the infile object any more. You already have all the lines; you may as well simply iterate through them directly.
#Write for loop for each line creating a list
for line in lines:
columns = line.split(",")
first_name.append(columns[0])
last_name.append(columns[1])
grades.append(float(columns[2]))
Note that I'm using append instead of listName[i] = whatever. This is necessary because Python lists will not automatically resize themselves when you try to assign to an index that doesn't exist yet; you'll just get an IndexError. append, on the other hand, will resize the list as desired.

Resources