Reading from CSV, updating a value and then re-writing - python-3.x

I am trying to read from a csv file and then update a field based on a users selection and write the contents (including the amendment) to a new csv file. I have managed everything but my solution only writes the amended line and not the rest of the files contents.
The csv file looks like:
1001, item1, 0.5, 10
1002, item2, 1.5, 20
Here is an example of my attempt:
run="yes"
while run=="yes":
id=input("Enter the id of the product you want to order: ")
amount=input("Enter the quantity: ")
reader = csv.reader(open('items.csv', 'r'))
writer = csv.writer(open('updatedstock.csv','w'))
for row in reader:
if id==row[0]:
name=row[1]
price=row[2]
stock=row[3]
newstock=int(stock)-int(amount)
writer.writerow([id, name, price, newstock])
run=input("Do you want to order another item? yes/no ")

You are currently only writing to the new file if you match the id based on the condition:
if id==row[0]:
Change it to always write the row:
run="yes"
while run=="yes":
id=input("Enter the id of the product you want to order: ")
amount=input("Enter the quantity: ")
reader = csv.reader(open('items.csv', 'r'))
writer = csv.writer(open('updatedstock.csv','w'))
for row in reader:
if id==row[0]:
name=row[1]
price=row[2]
stock=row[3]
newstock=int(stock)-int(amount)
writer.writerow([id, name, price, newstock])
else:
writer.writerow(row)
run=input("Do you want to order another item? yes/no ")
If you plan to replace many values, however, this can be very inefficient as the csv will be read in for each change. It would be better to read in all values that you want to change and then go through the csv, modifying the entries desired.
Even better would be to use some other data structure like SQLite that performs better with finds and writes. Of course, this would not be as easily human-readable format as it sits in the filesystem. You can easily output the sqlite database to a .csv if needed: Export from sqlite to csv using shell script.

Related

dynamically change header of a CSV file

I'm trying to write a cmd program to track the updated value of investments and things. I want to save their value in a CSV file using csv library and have a history of their values over time. there are some constant investments like ['Gold','Stock','Bit-coin'] and I have added them as default header in my program like this:
+------------+-------------+------------+
| Date | Stock | Gold | Bit-coin |
+------------+-------------+------------+
But I want my program to have the feature for adding Other categories as the user wants to name it. and add and also edit the header whenever wants to edit it.
Is there any way to dynamically edit header names as new column data is added to the CSV file?
import csv
with open('C:/test/test.csv','r') as csvinput:
with open('C:/test/output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Berry')
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
Here you can append a new column to your csv file
new_column = input("Enter a new column name:")
Now with this line you can take user input.
You should be able to complete the code on your own, by the way, when posting a question please post the code you're working with so we can grab it and display what you might be requiring with a solution.

Add column and values to CSV or Dataframe

Brand new to Python and programming. I have a function that extracts a file creation date from .csv files (the date is included the file naming convention):
def get_filename_dates(self):
"""Extract date from filename and place it into a list"""
for filename in self.file_list:
try:
date = re.search("([0-9]{2}[0-9]{2}[0-9]{2})",
filename).group(0)
self.file_dates.append(date)
self.file_dates.sort()
except AttributeError:
print("The following files have naming issues that prevented "
"date extraction:")
print(f"\t{filename}")
return self.file_dates
The data within these files are brought into a DataFrame:
def create_df(self):
"""Create DataFrame from list of files"""
for i in range(0, len(self.file_dates)):
self.agg_data = pd.read_csv(self.file_list[i])
self.agg_data.insert(9, 'trade_date', self.file_dates[i],
allow_duplicates=False)
return self.agg_data
As each file in file_list is worked with, I need to insert its corresponding date into a new column (trade_date).
As written here, the value of the last index in the list returned by get_filename_dates() is duplicated into every row of the trade_date column. -- presumably because read_csv() opens and closes each file before the next line.
My questions:
Is there an advantage to inserting data into the csv file using with open() vs. trying to match each file and corresponding date while iterating through files to create the DataFrame?
If there is no advantage to with open(), is there a different Pandas method that would allow me to manipulate the data as the DataFrame is created? In addition to the data insertion, there's other clean-up that I need to do. As it stands, I wrote a separate function for the clean-up; it's not complex and would be great to run everything in this one function, if possible.
Hope this makes sense -- thank you
You could grab each csv as an intermediate dataframe, do whatever cleaning you need to do, and use pd.concat() to concatenate them all together as you go. Something like this:
def create_df(self):
self.agg_data = pd.DataFrame()
"""Create DataFrame from list of files"""
for i, date in enumerate(self.file_dates):
df_part = pd.read_csv(self.file_list[i])
df_part['trade_date'] = date
# --- Any other individual file level cleanup here ---
self.agg_data = pd.concat([self.agg_data, df_part], axis=0)
# --- Any aggregate-level cleanup here
return self.agg_data
It makes sense to do as much of the preprocessing/cleanup as possible on the aggregated level as you can.
I also went to the liberty of converting the for-loop to use the more pythonic enumerate

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

How do I even start sorting/getting data from a .txt file in Python?

So my task involves finding and printing the player name with the most games, shots etc and his name from a .txt file that looks like this;
Rk|Player|Age|Games|Minutes Played|Field Goals|Field Goal Attempts|3P Field Goals|3P Field Goal Attempts
1|Quincy Acy|24|60|1110|126|278|12|47
2|Jordan Adams|20|24|173|22|51|7|16
...
484|Cody Zeller|22|62|1487|172|373|1|1
485|Tyler Zeller|25|74|1560|300|550|0|0
I was thinking about making empty lists and then filling them with for example "Games" and then pull the max value but I don't understand how to pull out the number of games.
You can use python's csv module.
Your code might look like this.
import csv
with open('games.csv', 'rb') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter='|')
for row in csv_reader:
print "Player name: %s, Rank: %s" % (row['Player'], row['Rk'])
Refer to the documentation for more details.

convert data string to list

I'm having some troubles processing some input.
I am reading data from a log file and store the different values according to the name.
So my input string consists of ip, name, time and a data value.
A log line looks like this and it has \t spacing:
134.51.239.54 Steven 2015-01-01 06:09:01 5423
I'm reading in the values using this code:
loglines = file.splitlines()
data_fields = loglines[0] # IP NAME DATE DATA
for loglines in loglines[1:]:
items = loglines.split("\t")
ip = items[0]
name = items[1]
date = items[2]
data = items[3]
This works quite well but I need to extract all names to a list but I haven't found a functioning solution.
When i use print name i get:
Steven
Max
Paul
I do need a list of the names like this:
['Steven', 'Max', 'Paul',...]
There is probably a simple solution and i haven't figured it out yet, but can anybody help?
Thanks
Just create an empty list and add the names as you loop through the file.
Also note that if that file is very large, file.splitlines() is probably not the best idea, as it reads the entire file into memory -- and then you basically copy all of that by doing loglines[1:]. Better use the file object itself as an iterator. And don't use file as a variable name, as it shadows the type.
with open("some_file.log") as the_file:
data_fields = next(the_file) # consumes first line
all_the_names = [] # this will hold the names
for line in the_file: # loops over the rest
items = line.split("\t")
ip, name, date, data = items # you can put all this in one line
all_the_names.append(name) # add the name to the list of names
Alternatively, you could use zip and map to put it all into one expression (using that loglines data), but you rather shouldn't do that... zip(*map(lambda s: s.split('\t'), loglines[1:]))[1]

Resources