Wrong format csv output in python - python-3.x

i have the following code that is working 99% - its taken me various attempts to get it right:
w = csv.writer(filename, lineterminator="\n")
sC = []
for i in sOut:
#print("save", i[1:])
sC.append(i[1:], "\n") #slice away first part
sP = self.ids(sC)
w.writerow(sP)
filename.close()
print("You save ", filename) #To show on CLI
def ids(self, numbering):
tally = 1
for i in range(len(numbering)):
id = str(tally)
numbering[i].insert(0, id)
tally = tally+1
return(numbering)
The out put it should return inside a CSV file should look like this i.e. in separate columns:
1 -4.885276794 55.72986221
2 -4.885276794 55.72958374
3 -4.883611202 55.72958374
Instead it returns it all in one row and with square brackets, commas and aprostophes all of which I do not want:
['1', -4.88527679443359, 55.7298622131348] ['2', -4.7475008964538, 55.9473609924319] ['3', -4.79416608810425, 56.02791595459]
I know I am making some basic mistake somewhere however I just don't know what? All help will be greatly appreciated.
Thanks Jemma

As per juanpa should be w.writerows(saveProcess) instead of w.writerow(saveProcess)
all works!

Related

Converting a csv file containing pixel values to it's equivalent images

This is my first time working with such a dataset.
I have a .csv file containing pixel values (48x48 = 2304 columns) of images, with their labels in the first column and the pixels in the subsequent ones, as below:
A glimpse of the dataset
I want to convert these pixels into their images, and store them into different directories corresponding to their respective labels. Now I have tried the solution posted here but it doesn't seem to work for me.
Here's what I've tried to do:
labels = ['Fear', 'Happy', 'Sad']
with open('dataset.csv') as csv_file:
csv_reader = csv.reader(csv_file)
fear = 0
happy = 0
sad = 0
# skip headers
next(csv_reader)
for row in csv_reader:
pixels = row[1:] # without label
pixels = np.array(pixels, dtype='uint8')
pixels = pixels.reshape((48, 48))
image = Image.fromarray(pixels)
if csv_file['emotion'][row] == 'Fear':
image.save('C:\\Users\\name\\data\\fear\\im'+str(fear)+'.jpg')
fear += 1
elif csv_file['emotion'][row] == 'Happy':
image.save('C:\\Users\\name\\data\\happy\\im'+str(happy)+'.jpg')
happy += 1
elif csv_file['emotion'][row] == 'Sad':
image.save('C:\\Users\\name\\data\\sad\\im'+str(sad)+'.jpg')
sad += 1
However, upon running the above block of code, the following is the error message I get:
Traceback (most recent call last):
File "<ipython-input-11-aa928099f061>", line 18, in <module>
if csv_file['emotion'][row] == 'Fear':
TypeError: '_io.TextIOWrapper' object is not subscriptable
I referred to a bunch of posts that solved the above error (like this one), but I found that the people were trying their hand at a relatively different problem than mine, and others I couldn't understand.
This may well be a very trivial question, but as I mentioned earlier, this is my first time working with such a dataset. Kindly tell me what am I doing wrong and how I can fix my code.
Try -
if str(row[0]) == 'Fear':
And in a similar way for the other conditions:
elif str(row[0]) == 'Happy':
elif str(row[0]) == 'Sad':
(a good practice is to just save the first value of the array as a variable)
The first problem that arose was that the first row was just the column names. In order to take care of this, I used the skiprows parameter like so:
raw = pd.read_csv('dataset.csv', skiprows = 1)
Secondly, I moved the labels column to the end due to it being in the first column. For my own convenience.
Thirdly, after all the preparations were done, the dataset won't iterate over the whole row, and instead just took in the value of the first row and first column, which gave an issue in resizing. So I instead used the df.itertuples() like so:
for row in data.itertuples(index = False, name = 'Pandas'):
Lastly, thanks to #HadarM 's suggestions, I was able to get it to work.
Modified code of the above code snippet that was the problematic block:
for row in data.itertuples(index = False, name = 'Pandas'):
pixels = row[:-1] # without label
pixels = np.array(pixels, dtype='uint8')
pixels = pixels.reshape((48, 48))
image = Image.fromarray(pixels)
if str(row[-1]) == 'Fear':
image.save('C:\\Users\\name\\data\\fear\\im'+str(fear)+'.jpg')
fear += 1
elif str(row[-1]) == 'Happy':
image.save('C:\\Users\\name\\data\\happy\\im'+str(happy)+'.jpg')
happy += 1
elif str(row[-1]) == 'Sad':
image.save('C:\\Users\\name\\data\\sad\\im'+str(sad)+'.jpg')
sad += 1
print('done')

Produce the most unique elements with least lists

I am new here. I hope I can explain briefly below after giving an example.
example1: What is your name?
example1: Where are you from?
example1: How are you doing?
example2: What is your name?
example2: Where are you from?
example2: How are you doing?
example2: When did you move here?
example9: What is your name?
example3: Where are you from?
example23: Who gave you this book?
In the above example, I would like to print the unique questions by considering the number of example. So trying something like
expected output
example2: What is your name?
example2: Where are you from?
example2: How are you doing?
example2: When did you move here?
example23: Who gave you this book?
Here, I am searching for the unique questions in a file by considering fewer examples.
I played around something and placing that below.
import collections
s = collections.defaultdict(list)
u_s = set()
with open ('file.txt', 'r') as s1:
for line in s1:
data = line.split(':', maxsplit=1)
start = data[0]
end = data[-1]
if end not in u_s:
u_s.add(end)
s[start] += [end]
for start, ends in s.items():
print(start, ends[0])
for end in ends[1:]:
print(start, end)
Result that I am getting:
example1 What is your name?
example1 Where are you from?
example1 How are you doing?
example2 When did you move here?
example23 Who gave you this book?
Here, Instead of going to print example1, I want to consider example2 because it is giving more questions.
I tried by sorting the lines based on the repetitions of the line. I couldn't pass through it. I appreciate your help. Thanks
What your code achieved is to print all unique questions but cannot compare or print them in a whole set.
Apart from sorting, I would formulate the problem as to compare the combinations of example sets and select the one that contains the most unique questions with the least sets, so your question is more about the algorithm to me.
import collections
def calculate_contrib(values, set):
'''To calculate the contribution on the unique questions' number, based on values to add.
values: the list of question set to choose.
set: the already-added question set.'''
contrib = 0
for value in values:
if value not in set:
contrib += 1
return contrib
def print_result(x):
'''To print the result, x, as a dictionary, without repetition.'''
u_s = set()
for key, values in x.items():
for value in values:
if value not in u_s:
print(key,value)
u_s.add(value)
s = collections.defaultdict(list)
# get all questions in examples
with open('file.txt', 'r') as s1:
for line in s1:
data = line.split(':', maxsplit=1)
start = data[0]
end = data[-1]
s[start] += [end]
# Get the initial contribution on the unique questions' number for each example set
contrib = dict()
u_s = set()
result = dict()
for key,values in s.items():
contrib.update({key: calculate_contrib(s[key], u_s)})
# Execute the while loop when there are unique questions to add to u_s
while not(all([x == 0 for x in contrib.values()])):
# Add the example set with maximum contribution
max_contrib = 0
max_key = ""
for key, value in contrib.items():
if max_contrib < value:
max_key = key
max_contrib = value
result.update({max_key: s[max_key]})
u_s.update(s[max_key])
del s[max_key]
del contrib[max_key]
for key, values in s.items():
contrib[key] = calculate_contrib(values, u_s)
# print the result
print_result(result)
Above is a straightforward implementation, that is adding the example set with the most increase on the unique's number each time until no unique question remains.
Further improvement can be conducted. Hope it could give you some insight.

How to save tuples output form for loop to DataFrame Python

I have some data 33k rows x 57 columns.
In some columns there is a data which I want to translate with dictionary.
I have done translation, but now I want to write back translated data to my data set.
I have problem with saving tuples output from for loop.
I am using tuples for creating good translation. .join and .append is not working in my case. I was trying in many case but without any success.
Looking for any advice.
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
for index, row in data.iterrows():
row["translated"] = (tuple(slownik.get(znak) for znak in row["1st_service"]))
I just want to see in print(data["1st_service"] a translated data not the previous one before for loop.
First of all, if your csv doesn't already have a 'translated' column, you'll have to add it:
import numpy as np
data['translated'] = np.nan
The problem is the row object you're trying to write to is only a view of the dataframe, it's not the dataframe itself. Plus you're missing square brackets for your list comprehension, if I'm understanding what you're doing. So change your last line to:
data.loc[index, "translated"] = tuple([slownik.get(znak) for znak in row["1st_service"]])
and you'll get a tuple written into that one cell.
In future, posting the exact error message you're getting is very helpful!
I have manage it, below working code:
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
data.columns = []
slownik = dict([ ])
trans = ' '
for index, row in data.iterrows():
trans += str(tuple([slownik.get(znak) for znak in row["1st_service"]]))
data['1st_service'] = trans.split(')(')
data.to_csv("out.csv", index=False)
Can you tell me if it is well done?
Maybe there is an faster way to do it?
I am doing it for 12 columns in one for loop, as shown up.

How to read starting N words from each rows in python3

I am reading excel which has free text in a column.Now after reading that file from pandas, I want to restrict the column having text to read just N words from starting for each rows. I tried everything but was not able to make it.
data["text"] = I am going to school and I bought something from market.
But I just want to read staring 5 words. so that it could look like below.
data["text"] = I am going to school.
and I want this same operation to be done bow each row for data["text"] column.
You help will be highly appreciated.
def first_k(s: str, k=5) -> str:
s = str(s) # just in case something like NaN tries to sneak in there
first_words = s.split()[:k]
return ' '.join(first_words)
Then, apply the function:
data['text'] = data['text'].apply(first_k)
data["text"] = [' '.join(s.split(' ')[:5]) for s in data["text"].values]

Indexing the list in python

record=['MAT', '90', '62', 'ENG', '92','88']
course='MAT'
suppose i want to get the marks for MAT or ENG what do i do? I just know how to find the index of the course which is new[4:10].index(course). Idk how to get the marks.
Try this:
i = record.index('MAT')
grades = record[i+1:i+3]
In this case i is the index/position of the 'MAT' or whichever course, and grades are the items in a slice comprising the two slots after the course name.
You could also put it in a function:
def get_grades(course):
i = record.index(course)
return record[i+1:i+3]
Then you can just pass in the course name and get back the grades.
>>> get_grades('ENG')
['92', '88']
>>> get_grades('MAT')
['90', '62']
>>>
Edit
If you want to get a string of the two grades together instead of a list with the individual values you can modify the function as follows:
def get_grades(course):
i = record.index(course)
return ' '.join("'{}'".format(g) for g in record[i+1:i+3])
You can use index function ( see this https://stackoverflow.com/a/176921/) and later get next indexes, but I think you should use a dictionary.

Resources