Sort excel worksheet using python - excel

I have an excel sheet like this:
I would like to output the data into an excel file like this:
Basically, for the common elements in column 2,3,4, I want to contacenate the values in the 5th column.
Please suggest, how could I do this ?

The easiest way to approach an issue like this is exporting the spreadsheet to CSV first, in order to help ensure that it imports correctly into Python.
Using a defaultdict, you can create a dictionary that has unique keys and then iterate through lines adding the final column's values to a list.
Finally you can write it back out to a CSV format:
from collections import defaultdict
results = defaultdict(list)
with open("in_file.csv") as f:
header = f.readline()
for line in f:
cols = line.split(",")
key = ",".join(cols[0:4])
results[key].append(cols[4])
with open("out_file.csv", "w") as f:
f.write(header)
for k, v in results.iteritems():
line = '{},"{}",\n'.format(k, ", ".join(v))
f.write(line)

Related

Python problems writing rows in CSV

I have this script that reads a CSV and saves the second column to a list, I'm trying to get it to write the contents of the list to a new CSV. The problem is every entry should have its own row but the new file sets everything into the same row.
I've tried moving the second with open code to within the first with open and I've tried adding a for loop to the second with open but no matter what I try I don't get the right results.
Here is the code:
import csv
col_store=[]
with open('test-data.csv', 'r') as rf:
reader = csv.reader(rf)
for row in reader:
col_store.append(row[1])
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows([col_store])
In your case if you have a column of single letters/numbers then Y.R answer will work.
To have a code that works in all cases, use this.
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(([_] for _ in col_store))
From here it is mentioned that writerows expect an an iterable of row objects. Every row object should be an iterable of strings or numbers for Writer objects
The problem is that you are using 'writerows' treating 'col_store' as a list with one item.
The simplest approach to fixing this is calling
csv_writer.writerows(col_store)
# instead of
csv_writer.writerows([col_store])
However, this will lead to a probably unwanted result - having blank lines between the lines.
To solve this, use:
with open('meow.csv', 'wt', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(col_store)
For more about this, see CSV file written with Python has blank lines between each row
Note: writerows expects 'an iterable of row objects' and 'row objects must be an interable of strings or numbers'.
(https://docs.python.org/3/library/csv.html)
Therefore, in the generic case (trying to write integers for examlpe), you should use Sam's solution.

How to loop through a list of dictionaries and write the values as individual columns in a CSV

I have a list of dictionaries
d = [{'value':'foo_1', 'word_list':['blah1', 'blah2']}, ...., {'value': 'foo_n', 'word_list':['meh1', 'meh2']}]
I want to write this to a CSV file with all the 'value' keys in one column, and then each individual word from the "value"'s word_list as its own column. So I have the first row as
foo_1 blah_1 blah_2
and so on.
I don't know how many dictionaries I have, or how many words I have in "word_list".
How would I go about doing this in Python 3?
Thanks!
I figured out a solution, but it's kind of messy (wow, I can't write a bit of code without it being in the "proper format"...how annoying):
with open('filename', 'w') as f:
for key in d.keys():
f.write("%s,"%(key))
for word in d[key]:
f.write("%s,"%(word))
f.write("\n")
You can loop through the dictionaries one at a time, construct the list and then use the csv module to write the data as I have shown here
import csv
d = [{'value':'foo_1', 'word_list':['blah1', 'blah2']}, {'value': 'foo_n', 'word_list':['meh1', 'meh2']}]
with open('test_file.csv', 'w') as file:
writer = csv.writer(file)
for val_dict in d:
csv_row = [val_dict['value']] + val_dict['word_list']
writer.writerow(csv_row)
It should work for word lists of arbitrary length and as many dictionaries as you want.
It would probably be easiest to flatten each row into a normal list before writing it to the file. Something like this:
with open(filename, 'w') as file:
writer = csv.writer(file)
for row in data:
out_row = [row['value']]
for word in row['word_list']:
out_row.append(word)
csv.writerow(out_row)
# Shorter alternative to the two loops:
# csv.writerow((row['value'], *row['word_list']) for row in data)

How to assign number to each value in python

I am comparatively new to python and data science and I was working with a CSV file which looks something like:
value1, value2
value3
value4...
Thing is, I want to assign a unique number to each of these values in the csv file such that the unique number acts as the key and the item in the CSV acts as the value like in a dictionary.
I tried using pandas but if possible, I wanted to know how I can solve this without using any libraries.
The desired output should be something like this:
{
"value1": 1,
"value2": 2,
"value3": 3,
.
.
.
and so on..
}
Was just about to talk about pandas before I saw that you wanted to do it in vanilla Python. I'd do it with pandas personally, but here you go:
You can read in lines from a file, split them by delimiter (','), and then get your word tokens.
master_dict = {}
counter = 1
with open("your_csv.csv", "r") as f:
for line in f:
words = line.split(',') # you may or may not want to add a call to .strip() as well
for word in words:
master_dict[counter] = word
counter += 1

Merge line in csv file python

I have this in csv file:
Titre,a,b,c,d,e
01,jean,paul,,
01,,,jack,
02,jeanne,jack,,
02,,,jean
and i want :
Titre,a,b,c,d,e
01,jean,paul,jack,
02,jeanne,jack,,jean
can you help me ?
In general, a good approach is to read the csv file and iterate through the rows using Python's CSV module.
CSV will create an iterator that will let you loop through your file like this:
import csv
with open('your filename.csv', 'r') as infile:
reader = csv.reader(infile)
for line in reader:
for value in line:
# Do your thing
You're going to need to construct a new data set that has different properties. The requirements you described:
Ignore any empty cells
Any time you encounter a row that has a new index number, add a new row to your new data set
Any time you encounter a row that has an index number you've seen before, add it to the row that you already created (except for that index number value itself)
I'm not writing that part of the code for you because you need to learn and grow. It's a good task for a beginner.
Once you've constructed that data set, it will look like this:
example_processed_data = [["Titre","a","b","c","d","e"],
["01","jean","paul","jack"],
["02","jeanne","jack","","jean"]]
You can then create a CSV writer, and create your outfile by iterating over that data, similarly to how you iterated over the infile:
with open('outfile.csv', 'w') as outfile:
writer = csv.writer(outfile)
for line in example_processed_data:
writer.writerow(line)
print("Done! Wrote", len(example_processed_data), "lines to outfile.csv.")

Original order of columns in csv not retained in unicodecsv.DictReader

I am trying read a CSV file into python 3 using unicodecsv library. Code follows :
with open('filename.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
Student_Data = list(reader)
But the order of the columns in the CSV file is not retained when I output any element from the Student_Data. The output contains any random order of the columns. Is there anything wrong with the code? How do I fix this?
As stated in csv.DictReader documentation, the DictReader object behaves like a dict - so it is not ordered.
You can obtain the list of the fieldnames with:
reader.fieldnames
But if you only want to obtain a list of the field values, in original order, you can just use a normal reader:
with open('filename.csv', 'rb') as f:
reader = unicodecsv.reader(f)
for row in reader:
Student_Data = row

Resources