Python list() vs append() - python-3.x

I'm trying to create a list of lists from a csv file.
Row 1 of CSV is a line describing the data source
Row 2 of CSV is the header
Row 3 of CSV is where the data starts
There are two ways I can go about it but I don't know why they're different.
First is the using list() and for some reason the result of this ignores row 1 and row 2 of the CSV.
data = []
with open(datafile,'rb') as f:
for line in f:
data = list(csv.reader(f, delimiter = ','))
return (name, data)
Whereas if I use .append(), I'd have to use .next() to skip row 2
data = []
with open(datafile,'rb') as f:
file = csv.reader(f, delimiter = ',')
next(file)
for line in file:
data.append(line)
return (name, data)
Why does list() ignores the row with all the header whereas append() doesn't?

Actually, this is not related to python's list() or append(), it is related to the logic you have used in the first snippet.
The program is not skipping the header, it is replacing it.
For every line in the loop, you are assigning a new value to data. So it is a new list , as it overwrites everything that was there previously.
Correct code :
data = []
with open(datafile,'rb') as f:
next(f)
for line in f:
data.extend(line.split(","))
return (name, data)
This will just extend the existing list with a new list that is passed as an argument, and there is no problem with 2nd snippet.

Related

Python problems writing rows in CSV

I have this script that reads a CSV and saves the second column to a list, I'm trying to get it to write the contents of the list to a new CSV. The problem is every entry should have its own row but the new file sets everything into the same row.
I've tried moving the second with open code to within the first with open and I've tried adding a for loop to the second with open but no matter what I try I don't get the right results.
Here is the code:
import csv
col_store=[]
with open('test-data.csv', 'r') as rf:
reader = csv.reader(rf)
for row in reader:
col_store.append(row[1])
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows([col_store])
In your case if you have a column of single letters/numbers then Y.R answer will work.
To have a code that works in all cases, use this.
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(([_] for _ in col_store))
From here it is mentioned that writerows expect an an iterable of row objects. Every row object should be an iterable of strings or numbers for Writer objects
The problem is that you are using 'writerows' treating 'col_store' as a list with one item.
The simplest approach to fixing this is calling
csv_writer.writerows(col_store)
# instead of
csv_writer.writerows([col_store])
However, this will lead to a probably unwanted result - having blank lines between the lines.
To solve this, use:
with open('meow.csv', 'wt', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(col_store)
For more about this, see CSV file written with Python has blank lines between each row
Note: writerows expects 'an iterable of row objects' and 'row objects must be an interable of strings or numbers'.
(https://docs.python.org/3/library/csv.html)
Therefore, in the generic case (trying to write integers for examlpe), you should use Sam's solution.

Using DictReader to read a csv file that contains a variable number of fields that have the same fieldname

Using DictReader and given a file that contains data like so:
First ,Last,fruit,fruit,fruit,fruit,fruit,fruit
Carl,Yung,apple,watermelon,,,,
Louis,Pasteur,banana,grape,mango,,,
Marie,Curie,watermelon,apple,banana,,,
How do I assign any non-empty "fruit" fields to a list so that when the following code executes, row['fruit'] contains that list.
with open(csv_file) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['First'],row['Last'],row['fruit'], sep='--->')
If fieldnames is omitted, the values in the first row will be used as the fieldnames. But you may specify it explicitly. If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None).
import csv
with open("myfile.csv") as f:
reader = csv.DictReader(f, fieldnames=("First", "Last"), restkey="fruit")
for row in reader:
print(row)

I need a faster way with logging function to parse this special type of data in CSV file

I have a file with below data format
<aqr>a=769 b="United States" c=02/04/2019 d=01:03:23
<aqr>a=798 b="India" c=02/04/2019 d=01:03:23 e="Non existent"
So basically all the lines have multiple columns but the columns are not fixed and no header is there. So need to create column header from the data itself. Like for the example above a,b,c,d and e will be column header.
I have made a code which does the job but I am looking for a more faster way and with logging facility.
By far my logic is to remove unwanted data at the beginning, then get the data in a dictionary and turn it into a dataframe.
result = defaultdict(list)
with open('testfiles/test.csv', 'r') as file:
pardic = { }
new_list = []
final_list = []
for line in file.read().splitlines():
rule0 = line.strip("<aqr>")
rule0 = '~'.join(shlex.split(rule0))
y = rule0.split('~')
for word in y:
x = word.split('=')
result[x[0]].append(x[1])
data = pd.DataFrame.from_dict(result, orient='index')
data = data.T
The result is fine. I just need a faster solution to this.

python read/write data IndexError: list index out of range

I'm trying to write a simply code to extract specific data columns from my measurement results (.txt files) and then save them into a new text file. Unfortunately I'm already stuck even before the writing part. The code below results in a following error: IndexError: list index out of range
How do I solve this? It seems to be related to the size of the data, i.e. the same code worked for a much smaller data file.
f = open('data.txt', 'r')
header1 = f.readline()
header2 = f.readline()
header3 = f.readline()
for line in f:
line = line.strip()
columns = line.split()
name = columns[2]
j = columns[3]
print(name, j)
Before using index you should check the length of the split() result or check the line's pattern by using a regex.
Example of length check to add right after the columns = line.split() :
if len(columns) < 4:
continue
So if you have a line that does not match your awaited data format it won't crash

Sort excel worksheet using python

I have an excel sheet like this:
I would like to output the data into an excel file like this:
Basically, for the common elements in column 2,3,4, I want to contacenate the values in the 5th column.
Please suggest, how could I do this ?
The easiest way to approach an issue like this is exporting the spreadsheet to CSV first, in order to help ensure that it imports correctly into Python.
Using a defaultdict, you can create a dictionary that has unique keys and then iterate through lines adding the final column's values to a list.
Finally you can write it back out to a CSV format:
from collections import defaultdict
results = defaultdict(list)
with open("in_file.csv") as f:
header = f.readline()
for line in f:
cols = line.split(",")
key = ",".join(cols[0:4])
results[key].append(cols[4])
with open("out_file.csv", "w") as f:
f.write(header)
for k, v in results.iteritems():
line = '{},"{}",\n'.format(k, ", ".join(v))
f.write(line)

Resources