how calc values in array imported from csv.reader? - python-3.x

I have this csv file
Germany,1,5,10,20
UK,0,2,4,10
Hungary,6,11,22,44
France,8,22,33,55
and this script,
I would like to make some aritmetic operations with values in 2D array(data)
For example print value (data[1][3]) increased of 10,
Seems that I need some conversion to integer, right ?
What is best solution please ?
import csv
datafile = open('sample.csv', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append(row)
print ((data[1][3])+10)
I got this error
/python$ python3 read6.py
Traceback (most recent call last):
File "read6.py", line 8, in <module>
print ((data[1][3])+10)
TypeError: must be str, not int

You'll have to manually convert to integers as you suspected:
import csv
datafile = open('sample.csv', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append([row[0]] + list(map(int, row[1:])))
print ((data[1][3])+10)
Specifically this modification on line 7 of your code:
data.append([row[0]] + list(map(int, row[1:])))
The csv package docs mention that
No automatic data type conversion is performed unless the QUOTE_NONNUMERIC format option is specified (in which case unquoted fields are transformed into floats).
Since the strings in your CSV are not quoted (i.e. "Germany" instead of Germany), this isn't useful for your case, so converting manually is the way to go.

Related

Changes csv row value

This is my code:
import pandas as pd
import re
# reading the csv file
patients = pd.read_csv("partial.csv")
# updating the column value/data
for patient in patients.iterrows():
cip=patient['VALOR_ID']
new_cip = re.sub('^(\w+|)',r'FIXED_REPLACED_STRING',cip)
patient['VALOR_ID'] = new_cip
# writing into the file
df.to_csv("partial-writer.csv", index=False)
print(df)
I'm getting this message:
Traceback (most recent call last):
File "/home/jeusdi/projects/workarea/salut/load-testing/load.py", line 28, in
cip=patient['VALOR_ID']
TypeError: tuple indices must be integers or slices, not str
EDIT
Form code above you can think I need to set a same fixed value to all rows.
I need to loop over "rows" and generate a random string and set it on each different "row".
Code above would be:
for patient in patients.iterrows():
new_cip = generate_cip()
patient['VALOR_ID'] = new_cip
Use Series.str.replace, but not sure about | in regex. Maybe should be removed it:
df = pd.read_csv("partial.csv")
df['VALOR_ID'] = df['VALOR_ID'].str.replace('^(\w+|)',r'FIXED_REPLACED_STRING')
#if function return scalars
df['VALOR_ID'] = df['VALOR_ID'].apply(generate_cip)
df.to_csv("partial-writer.csv", index=False)

How to read multiple text files as strings from two folders at the same time using readline() in python?

Currently have version of the following script that uses two simple readline() snippets to read a single line .txt file from two different folders. Running under ubuntu 18.04 and python 3.67 Not using glob.
Encountering 'NameError' now when trying to read multiple text files from same folders using 'sorted.glob'
readlines() causes error because input from .txt files must be strings not lists.
New to python. Have tried online python formatting, reindent.py etc. but no success.
Hoping it's a simple indentation issue so it won't be an issue in future scripts.
Current error from code below:
Traceback (most recent call last):
File "v1-ReadFiles.py", line 21, in <module>
context_input = GenerationInput(P1=P1, P3=P3,
NameError: name 'P1' is not defined
Current modified script:
import glob
import os
from src.model_use import TextGeneration
from src.utils import DEFAULT_DECODING_STRATEGY, LARGE
from src.flexible_models.flexible_GPT2 import FlexibleGPT2
from src.torch_loader import GenerationInput
from transformers import GPT2LMHeadModel, GPT2Tokenizer
for name in sorted(glob.glob('P1_files/*.txt')):
with open(name) as f:
P1 = f.readline()
for name in sorted(glob.glob('P3_files/*.txt')):
with open(name) as f:
P3 = f.readline()
if __name__ == "__main__":
context_input = GenerationInput(P1=P1, P3=P3,
genre=["mystery"],
persons=["Steve"],
size=LARGE,
summary="detective")
print("PREDICTION WITH CONTEXT WITH SPECIAL TOKENS")
model = GPT2LMHeadModel.from_pretrained('models/custom')
tokenizer = GPT2Tokenizer.from_pretrained('models/custom')
tokenizer.add_special_tokens(
{'eos_token': '[EOS]',
'pad_token': '[PAD]',
'additional_special_tokens': ['[P1]', '[P2]', '[P3]', '[S]', '[M]', '[L]', '[T]', '[Sum]', '[Ent]']}
)
model.resize_token_embeddings(len(tokenizer))
GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)
text_generator_with_context = TextGeneration(GPT2_model, use_context=True)
predictions = text_generator_with_context(context_input, nb_samples=1)
for i, prediction in enumerate(predictions):
print('prediction n°', i, ': ', prediction)
Thanks to afghanimah here:
Problem with range() function when used with readline() or counter - reads and processes only last line in files
Dropped glob. Also moved all model= etc. load functions before 'with open ...'
with open("data/test-P1-Multi.txt","r") as f1, open("data/test-P3-Multi.txt","r") as f3:
for i in range(5):
P1 = f1.readline()
P3 = f3.readline()
context_input = GenerationInput(P1=P1, P3=P3, size=LARGE)
etc.

How to plot multiple graphs in one using 3 different files and another file to sort them?

I have 2 CSV files. One of them has the sorted data and another unsorted. Example data is as shown below.
I am trying to do is to take the unsorted data and sort it according to index numbers from the sorted data. Ex: in the sorted data, I have index number "1" corresponds to "name001.a.a". So, since it iss index number = "1", In the unsorted file, I want "name 001.a.a,0001" to be the first in the list. The number after the comma in unsorted file is 4 digit number which does not play a role in sorting but is attached to the names.
One more sample would be: index "2" is for "name002.a.a", so after sorting, new file would have "name002.a.a,0002" as a second item in the list
unsorted.csv:
name002.a.a,0002
name001.a.a,0001
name005.a.a,0025
hostnum.csv (sorted):
"1 name001.a.a"
"2 name002.a.a"
"3 name005.a.a"
I need help to figure out where I have coded wrong and if possible, need help with completing it.
EDIT- CODE:
After changing the name csv_list to csv_file, I am receiving the following error
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
with open('hostnum.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
csv_file.append(line)
us_csv_file = []
with open('unsorted.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file.index(x[1]))
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csvfile)), [item[0] for item in csvfile])
plt.show()
ERROR:
Traceback (most recent call last):
File "C:/..../TEST_ALL.py", line 16, in <module>
us_csv_file.sort(key=lambda x: csv_file.index(x[1]))
File "C:/..../TEST_ALL.py", line 16, in <lambda>
us_csv_file.sort(key=lambda x: csv_file.index(x[1]))
ValueError: '0002' is not in list
Well, you haven't defined csv_list in your code. Looking quickly through your code, I'd guess changing us_csv_file.sort(key=lambda x: csv_list.index(x[1])) to us_csv_file.sort(key=lambda x: csv_file.index(x[1])) (i.e. using the correct variable name, which is csv_file and not csv_list), might just solve the problem.
Here's a new attempt. This one tries to extract the numbers from the second column from hostnum.csv and puts them onto a separate list, which it then uses to sort the items. When I run this code, I get ValueError: '025' is not in list but I assume that's because you haven't given us the entire files and there is indeed no such line that would contain name025.a.a in the snippet of hostnum.csv you gave us, I also added a [1:] to the sorting statement.
If this doesn't work, try removing that [1:] and changing csv_file_numbers.append(csv_file[-1][1][4:].split('.')[0]) to csv_file_numbers.append(csv_file[-1][1][4:].split('.')[0].zfill(4)). string.zfill(4) will add zeros to the beginning of a string so long that its length is at least 4.
Because your sorted file contains one more zero than the unsorted file, I also changed
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
csv_file_numbers = []
##with open('hostnum.csv', 'r') as f:
## csvreader = csv.reader(f, dialect="excel-tab")
## for line in csvreader:
## csv_file.append(line)
## csv_file_numbers.append(line[-1][4:].split('.')[0])
with open('hostnum.csv', 'r') as f:
sorted_raw = f.read()
for line in sorted_raw.splitlines():
csv_file.append(line.split('\t'))
csv_file_numbers.append(csv_file[-1][1][4:].split('.')[0])
us_csv_file = []
with open('unsorted.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file_numbers.index(x[1][1:]))
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csvfile)), [item[0] for item in csvfile])
plt.show()
This one worked on my computer:
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
csv_file_dict = {}
##with open('hostnum.csv', 'r') as f:
## csvreader = csv.reader(f, dialect="excel-tab")
## for line in csvreader:
## csv_file.append(line)
## csv_file_numbers.append(line[-1][4:].split('.')[0])
with open('hostnum.csv', 'r') as f:
sorted_raw = f.read()
for line in sorted_raw.splitlines():
csv_file.append(line.split('\t'))
csv_file_dict[csv_file[-1][-1][:-1]] = int(csv_file[-1][0][1:])
us_csv_file = []
with open('unsorted.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file_dict[x[0]])
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csv_file)), [item[0] for item in csv_file])
plt.show()
So now I created a dict which stores the index values as values and the names of each cell that is found in both files as keys. I also removed the quotations manually, as for some reason, csv.reader didn't seem to do it correctly, at least it didn't handle the tabs in the desired way. As I wrote in one of my comments, I don't know why for sure, I'd guess it's because the quotations are not closed within the cells in the file. Anyway, I decided to split each line manually with string.split('\t').
Also, you had missed the underscore in the variable name csv_file from a couple of places at the end, so I added them.

I am trying to read a .csv file which contains data in the order of timestamp, number plate, vehicle type and exit/entry

I need to store the timestamps in a list for further operations and have written the following code:
import csv
from datetime import datetime
from collections import defaultdict
t = []
columns = defaultdict(list)
fmt = '%Y-%m-%d %H:%M:%S.%f'
with open('log.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
#t = row[1]
for i in range(len(row)):
columns[i].append(row[i])
if (row):
t=list(datetime.strptime(row[0],fmt))
columns = dict(columns)
print (columns)
for i in range(len(row)-1):
print (t)
But I am getting the error :
Traceback (most recent call last):
File "parking.py", line 17, in <module>
t = list(datetime.strptime(row[0],fmt))
TypeError: 'datetime.datetime' object is not iterable
What can I do to store each timestamp in the column in a list?
Edit 1:
Here is the sample log file
2011-06-26 21:27:41.867801,KA03JI908,Bike,Entry
2011-06-26 21:27:42.863209,KA02JK1029,Car,Exit
2011-06-26 21:28:43.165316,KA05K987,Bike,Entry
If you have a csv file than why not use pandas to get what you want.The code for your problem may be something like this.
import Pandas as pd
df=pd.read_csv('log.csv')
timestamp=df[0]
if the first column of csv is of Timestamp than you have an array with having all the entries in the first column in the list known as timestamp.
After this you can convert all the entries of this list into timestamp objects using datetime.datetime.strptime().
Hope this is helpful.
I can't comment for clarifications yet.
Would this code get you the timestamps in a list? If yes, give me a few lines of data from the csv file.
from datetime import datetime
timestamps = []
with open(csv_path, 'r') as readf_obj:
for line in readf_obj:
timestamps.append(line.split(',')[0])
fmt = '%Y-%m-%d %H:%M:%S.%f'
datetimes_timestamps = [datetime.strptime(timestamp_, fmt) for timestamp_ in timestamps]

Python - Storing float values in CSV file

I am trying to store the positive and negative score of statements in a text file. I want to store the score in a csv file. I have implemented the below given code:
import openpyxl
from nltk.tokenize import sent_tokenize
import csv
from senti_classifier import senti_classifier
from nltk.corpus import wordnet
file_content = open('amazon_kindle.txt')
for lines in file_content:
sentence = sent_tokenize(lines)
pos_score,neg_score = senti_classifier.polarity_scores(sentence)
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
for val in range(pos_score):
writer.writerow(float(s) for s in val[0])
f.close()
But the code displays me the following error in for loop.
Traceback (most recent call last):
File "C:\Users\pc\AppData\Local\Programs\Python\Python36-32\classifier.py",
line 21, in for val in pos_score: TypeError: 'float' object is not iterable
You have several errors with your code:
Your code and error do not correspond with each other.
for val in pos_score: # traceback
for val in range(pos_score): #code
pos_score is a float so both are errors range() takes an int and for val takes an iterable. Where do you expect to get your list of values from?
And from usage it looks like you are expecting a list of list of values because you are also using a generator expression in your writerow
writer.writerow(float(s) for s in val[0])
Perhaps you are only expecting a list of values so you can get rid of the for loop and just use:
writer.writerow(float(val) for val in <list_of_values>)
Using:
with open('target.csv','w') as f:
means you no longer need to call f.close() and with closes the file at the end of the with block. This also means the writerow() needs to be in the with block:
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
writer.writerow(float(val) for val in <list_of_values>)

Resources