How to get specific column value from .csv Python3? - python-3.x

I have a .csv file with Bitcoin price and market data, and I want to get the 5th and 7th columns from the last row in the file. I have worked out how to get the last row, but I'm not sure how to extract columns (values) 5 and 7 from it. Code:
with open('BTCAUD_data.csv', mode='r') as BTCAUD_data:
writer = csv.reader(BTCAUD_data, delimiter=',')
data = list(BTCAUD_data)[-1]
print(data)
Edit: How would I also add column names, and would adding them help me? (I have already manually put the names into individual columns in the first line of the file itself)
Edit #2: Forget about the column names, they are unimportant. I still don't have a working solution. I have a vague idea that I'm not actually reading the file as a list, but rather as a string. (This means when I subscript the data variable, I get a single digit, rather than an item in a list) Any hints to how I read the line as a list?
Edit #3: I have got everything working to expectations now, thanks for everyone's help :)

Your code never uses the csv-reader. You can do so like this:
import csv
# This creates a file with demo data
with open('BTCAUD_data.csv', 'w') as f:
f.write(','.join( f"header{u}" for u in range(10))+"\n")
for l in range(20):
f.write(','.join( f"line{l}_{c}" for c in range(10))+"\n")
# this reads and processes the demo data
with open('BTCAUD_data.csv', 'r', newline="") as BTCAUD_data:
reader = csv.reader(BTCAUD_data, delimiter=',')
# 1st line is header
header = next(reader)
# skip through the file, row will be the last line read
for row in reader:
pass
print(header)
print(row)
# each row is a list and you can index into it
print(header[4], header[7])
print(row[4], row[7])
Output:
['header0', 'header1', 'header2', 'header3', 'header4', 'header5', 'header6', 'header7', 'header8', 'header9']
['line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19', 'line19']
header4 header7
line19_4 line19_7

Better use pandas for handling CSV file.
import pandas as pd
df=pd.read_csv('filename')
df.column_name will give the corresponding column
If you read this csv file into df and try df.Year will give you the Year column.

Related

Python problems writing rows in CSV

I have this script that reads a CSV and saves the second column to a list, I'm trying to get it to write the contents of the list to a new CSV. The problem is every entry should have its own row but the new file sets everything into the same row.
I've tried moving the second with open code to within the first with open and I've tried adding a for loop to the second with open but no matter what I try I don't get the right results.
Here is the code:
import csv
col_store=[]
with open('test-data.csv', 'r') as rf:
reader = csv.reader(rf)
for row in reader:
col_store.append(row[1])
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows([col_store])
In your case if you have a column of single letters/numbers then Y.R answer will work.
To have a code that works in all cases, use this.
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(([_] for _ in col_store))
From here it is mentioned that writerows expect an an iterable of row objects. Every row object should be an iterable of strings or numbers for Writer objects
The problem is that you are using 'writerows' treating 'col_store' as a list with one item.
The simplest approach to fixing this is calling
csv_writer.writerows(col_store)
# instead of
csv_writer.writerows([col_store])
However, this will lead to a probably unwanted result - having blank lines between the lines.
To solve this, use:
with open('meow.csv', 'wt', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(col_store)
For more about this, see CSV file written with Python has blank lines between each row
Note: writerows expects 'an iterable of row objects' and 'row objects must be an interable of strings or numbers'.
(https://docs.python.org/3/library/csv.html)
Therefore, in the generic case (trying to write integers for examlpe), you should use Sam's solution.

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Python 3.7 Start reading from a specific point within a csv

Hey I could really use help here. I've tried for 1 hour to find a solution for python but was unable to find it.
I am using Python 3.7
My input is a file provided by a customer - I cannot change it. It is structured in the following way:
It starts with random text not in CSV format and from line 3 on the rest of the file is in csv format.
text line
text line
text line or nothing
Enter
[Start of csv file] "column Namee 1","column Namee 2" .. until 6
"value1","value2" ... until 6 - continuing for many lines.
I wanted to extract the first 3 lines to create a pure CSV file but was unable to find code to only do it for a specific line range. It also seems the wrong solution as I think starting to read from a certain point should be possible.
Then I thought split () is the solution but it did not work for this format. The values are sometimes numbers, dates or strings. You cannot use the seek() method as they start differently.
Right now my dictreader takes the first line as an index and consequently the rest is rendered in chaos.
import csv
import pandas as pd
from prettytable import PrettyTable
with open(r'C:\Users\Hans\Downloads\file.csv') as csvfile:
csv_reader = csv.DictReader (r'C:\Users\Hans\Downloads\file.csv', delimiter=',')
for lines in csvfile:
print (lines)
If some answer for python has been found please link it, I was not able to find it.
Thank you so much for your help. I really appreciate it.
I will insist with the pandas option, given that the documentation clearly states that the skiprows parameter allows to skip n number of lines. I tried it with the example provided by #Chris Doyle (saving it to a file named line_file.csv) and it works as expected.
import pandas as pd
f = pd.read_csv('line_file.csv', skiprows=3)
Output
name num symbol
0 chris 4 $
1 adam 7 &
2 david 5 %
If you know the number of lines you want to skip then just open the file and read that many lines then pass the filehandle to Dictreader and it will read the remaining lines.
import csv
skip_n_lines = 3
with open('test.dat') as my_file:
for _ in range(skip_n_lines):
print("skiping line:", my_file.readline(), end='')
print("###CSV DATA###")
csv_reader = csv.DictReader(my_file)
for row in csv_reader:
print(row)
FILE
this is junk
this is more junk
last junk
name,num,symbol
chris,4,$
adam,7,&
david,5,%
OUTPUT
skiping line: this is junk
skiping line: this is more junk
skiping line: last junk
###CSV DATA###
OrderedDict([('name', 'chris'), ('num', '4'), ('symbol', '$')])
OrderedDict([('name', 'adam'), ('num', '7'), ('symbol', '&')])
OrderedDict([('name', 'david'), ('num', '5'), ('symbol', '%')])

compiling a regex query with a row in python

I have two csv files. I want to take each row in turn of csv1, find its entry in csv 2 and then pull out other information for that row from csv2.
I'm struggling with the items I am looking for are company names. In both csv1 and csv2 they may or may not have the suffix 'LTD, LTD, Limited or LIMITED' within the name so I would like the query to not include these substrings in the query. My code is still only finding exact matches, rather than ignoring 'Ltd' etc. I'm guessing it's the way I'm combining 'row[0]' with the regex query but can't figure it out.
Code
import re, csv
with open (r'c:\temp\noregcompanies.csv', 'rb') as q:
readerM=csv.reader(q)
for row in readerM:
companySourcename = row[0]+"".join(r'.*(?!Ltd|Limited|LTD|LIMITED).*')
IBcompanies = re.compile(companySourcename)
IBcompaniesString = str(companySourcename)
with open (r'c:\temp\chdata.csv', 'rb') as f:
readerS = csv.reader(f)
for row in readerS:
companyCHname = row[0]+"".join(r'.*(?!Ltd|Limited|LTD|LIMITED)*')
CHcompanies = re.compile(companyCHname)
if CHcompanies.match(IBcompaniesString):
print ('Match is: ',row [0], row[1])
with open (r'c:\temp\outputfile.csv', 'ab') as o:
writer = csv.writer(o, delimiter=',')
writer.writerow(row) t

Original order of columns in csv not retained in unicodecsv.DictReader

I am trying read a CSV file into python 3 using unicodecsv library. Code follows :
with open('filename.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
Student_Data = list(reader)
But the order of the columns in the CSV file is not retained when I output any element from the Student_Data. The output contains any random order of the columns. Is there anything wrong with the code? How do I fix this?
As stated in csv.DictReader documentation, the DictReader object behaves like a dict - so it is not ordered.
You can obtain the list of the fieldnames with:
reader.fieldnames
But if you only want to obtain a list of the field values, in original order, you can just use a normal reader:
with open('filename.csv', 'rb') as f:
reader = unicodecsv.reader(f)
for row in reader:
Student_Data = row

Resources