I am trying to read a .csv file which contains data in the order of timestamp, number plate, vehicle type and exit/entry - python-3.x

I need to store the timestamps in a list for further operations and have written the following code:
import csv
from datetime import datetime
from collections import defaultdict
t = []
columns = defaultdict(list)
fmt = '%Y-%m-%d %H:%M:%S.%f'
with open('log.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
#t = row[1]
for i in range(len(row)):
columns[i].append(row[i])
if (row):
t=list(datetime.strptime(row[0],fmt))
columns = dict(columns)
print (columns)
for i in range(len(row)-1):
print (t)
But I am getting the error :
Traceback (most recent call last):
File "parking.py", line 17, in <module>
t = list(datetime.strptime(row[0],fmt))
TypeError: 'datetime.datetime' object is not iterable
What can I do to store each timestamp in the column in a list?
Edit 1:
Here is the sample log file
2011-06-26 21:27:41.867801,KA03JI908,Bike,Entry
2011-06-26 21:27:42.863209,KA02JK1029,Car,Exit
2011-06-26 21:28:43.165316,KA05K987,Bike,Entry

If you have a csv file than why not use pandas to get what you want.The code for your problem may be something like this.
import Pandas as pd
df=pd.read_csv('log.csv')
timestamp=df[0]
if the first column of csv is of Timestamp than you have an array with having all the entries in the first column in the list known as timestamp.
After this you can convert all the entries of this list into timestamp objects using datetime.datetime.strptime().
Hope this is helpful.

I can't comment for clarifications yet.
Would this code get you the timestamps in a list? If yes, give me a few lines of data from the csv file.
from datetime import datetime
timestamps = []
with open(csv_path, 'r') as readf_obj:
for line in readf_obj:
timestamps.append(line.split(',')[0])
fmt = '%Y-%m-%d %H:%M:%S.%f'
datetimes_timestamps = [datetime.strptime(timestamp_, fmt) for timestamp_ in timestamps]

Related

Changes csv row value

This is my code:
import pandas as pd
import re
# reading the csv file
patients = pd.read_csv("partial.csv")
# updating the column value/data
for patient in patients.iterrows():
cip=patient['VALOR_ID']
new_cip = re.sub('^(\w+|)',r'FIXED_REPLACED_STRING',cip)
patient['VALOR_ID'] = new_cip
# writing into the file
df.to_csv("partial-writer.csv", index=False)
print(df)
I'm getting this message:
Traceback (most recent call last):
File "/home/jeusdi/projects/workarea/salut/load-testing/load.py", line 28, in
cip=patient['VALOR_ID']
TypeError: tuple indices must be integers or slices, not str
EDIT
Form code above you can think I need to set a same fixed value to all rows.
I need to loop over "rows" and generate a random string and set it on each different "row".
Code above would be:
for patient in patients.iterrows():
new_cip = generate_cip()
patient['VALOR_ID'] = new_cip
Use Series.str.replace, but not sure about | in regex. Maybe should be removed it:
df = pd.read_csv("partial.csv")
df['VALOR_ID'] = df['VALOR_ID'].str.replace('^(\w+|)',r'FIXED_REPLACED_STRING')
#if function return scalars
df['VALOR_ID'] = df['VALOR_ID'].apply(generate_cip)
df.to_csv("partial-writer.csv", index=False)

how calc values in array imported from csv.reader?

I have this csv file
Germany,1,5,10,20
UK,0,2,4,10
Hungary,6,11,22,44
France,8,22,33,55
and this script,
I would like to make some aritmetic operations with values in 2D array(data)
For example print value (data[1][3]) increased of 10,
Seems that I need some conversion to integer, right ?
What is best solution please ?
import csv
datafile = open('sample.csv', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append(row)
print ((data[1][3])+10)
I got this error
/python$ python3 read6.py
Traceback (most recent call last):
File "read6.py", line 8, in <module>
print ((data[1][3])+10)
TypeError: must be str, not int
You'll have to manually convert to integers as you suspected:
import csv
datafile = open('sample.csv', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append([row[0]] + list(map(int, row[1:])))
print ((data[1][3])+10)
Specifically this modification on line 7 of your code:
data.append([row[0]] + list(map(int, row[1:])))
The csv package docs mention that
No automatic data type conversion is performed unless the QUOTE_NONNUMERIC format option is specified (in which case unquoted fields are transformed into floats).
Since the strings in your CSV are not quoted (i.e. "Germany" instead of Germany), this isn't useful for your case, so converting manually is the way to go.

AttributeError: 'DataFrame' object has no attribute 'NET_NAME'

python 3.7
A task. Add a new column in the received date frame based on two conditions:
if the value in the NET_NAME column is equal to one of the list and the value in the ECELL_TYPE column is LTE, then assign the value to the SHARING column from the ENODEB_NAME column.
import csv
import os
import pandas as pd
import datetime
import numpy as np
from time import gmtime, strftime
WCOUNT=strftime("%V", gmtime())
WCOUNT = int(WCOUNT)
WCOUNT_last = int(WCOUNT)-1
os.environ['NLS_LANG'] = 'Russian.AL32UTF8'
cell_file_list=pd.read_excel('cdt_config.xlsx',sheet_name ='cdt_config',index_col='para_name')
filial_name_list=pd.read_excel('FILIAL_NAME.xlsx')
gcell_file_name1=cell_file_list.para_value.loc['ucell_file_name']
ecell_file_name=cell_file_list.para_value.loc['ecell_file_name']
cols_simple=['RECDATE','REGION_PHOENIX_NAME','NET_NAME','CELL_NAME_IN_BSC','ENODEB_NAME','ECELL_TYPE','NRI_ADDRESS', 'NRI_BS_NUMBER','NRI_SITEID','STOPTIME', ]
cols_export=['GSM', 'UMTS', 'LTE', 'TOTAL', 'NWEEK', 'SHARING' ]
ecell_df=df = pd.read_csv(ecell_file_name, sep=",",encoding='cp1251',
dtype={'NRI_SITEID': str})
ecell_df=ecell_df.rename(columns={"RECDATE.DATE": "RECDATE"})
ecell_df=ecell_df.rename(columns={"ECELL_MNEMONIC": "CELL_NAME_IN_BSC"})
#replace ","
ecell_df.STOPTIME=pd.to_numeric(ecell_df.STOPTIME.replace(',', '', regex=True), errors='coerce')/3600
ecell_df=ecell_df[cols_simple]
#pivot ecell table
ecell_sum_df=pd.pivot_table(ecell_df,values='STOPTIME',index=['RECDATE','NRI_SITEID','REGION_PHOENIX_NAME','NET_NAME','ENODEB_NAME','ECELL_TYPE'],aggfunc='sum')
ecell_sum_df=ecell_sum_df.fillna(0)
#create a empty column with the same index as the pivot table.
ecell_export_df= pd.DataFrame(index=ecell_sum_df.index.copy())
ecell_export_df=ecell_export_df.assign(LTE=0)
ecell_export_df.LTE=ecell_sum_df.STOPTIME
ecell_export_df['SHARING'] = 0
ecell_export_df.SHARING.replace(ecell_export_df.NET_NAME in filial_name_list, ENODEB_NAME,inplace=True)
print(ecell_export_df)
#print (ecell_export_df)
del ecell_df
del ecell_sum_df
export_df=pd.concat([ecell_export_df],join='outer',axis=1)
export_df=export_df.fillna(0)
export_df['TOTAL'] = export_df.sum(axis=1)
export_df['NWEEK'] = WCOUNT_last
del ecell_export_df
#################################################
Below is the error message:
Traceback (most recent call last):
File "C:/Users/PycharmProjects/ReportCDT/CDT 4G_power pivot.py", line 43, in <module>
ecell_export_df.SHARING.replace(ecell_sum_df.NET_NAME in filial_name_list, ENODEB_NAME,inplace=True)
File "C:\Users\vavrumyantsev\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 5067, in __getattr__
eturn object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'NET_NAME'
Your traceback contains: DataFrame object has no attribute NET_NAME,
meaning actually that this DataFrame has no column of this name.
This message pertains to ecell_sum_df.NET_NAME (also contained in
the traceback), so let's look how you created this DataFrame (slightly
reformatted for readablity):
ecell_sum_df=pd.pivot_table(ecell_df, values='STOPTIME',\
index=['RECDATE', 'NRI_SITEID', 'REGION_PHOENIX_NAME', 'NET_NAME',
'ENODEB_NAME', 'ECELL_TYPE'], aggfunc='sum')
Note that NET_NAME is a part of the index list, so in the DataFrame
created it is a part of the MultiIndex, not an "ordinary" column.
So Python is right displaying this message.
Maybe you should move this level of the MultiIndex to a "normal" column?

Creating a big pandas Dataframe [duplicate]

This question already has answers here:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
(5 answers)
Closed 5 years ago.
My code is retrieving historical data of 365 days back from today of 50 different stocks.
I want to store all those data in one dataframe to make it easier to analyse, here I want to filter all those data, date wise and calculate number of advancing/declining stocks at a given date.
My code:
import datetime
from datetime import date, timedelta
import pandas as pd
import nsepy as ns
#setting default dates
end_date = date.today()
start_date = end_date - timedelta(365)
#Deriving the names of 50 stocks in Nifty 50 Index
nifty_50 = pd.read_html('https://en.wikipedia.org/wiki/NIFTY_50')
nifty50_symbols = nifty_50[1][1]
for x in nifty50_symbols:
data = ns.get_history(symbol = x, start=start_date, end=end_date)
big_df = pd.concat(data)
Output:
Traceback (most recent call last):
File "F:\My\Getting data from NSE\advances.py", line 27, in <module>
big_df = pd.concat(data)
File "C:\Users\Abinash\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\reshape\concat.py", line 212, in concat
copy=copy)
File "C:\Users\Abinash\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\reshape\concat.py", line 227, in __init__
'"{name}"'.format(name=type(objs).__name__))
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
I am very new to python, I went through the tutorial of pandas and saw that pandas.concat was used to merge multiple dataframes into one. I might have understood that wrong.
Data for concatenation has to be iterable for example list.
results = []
for x in nifty50_symbols:
data = ns.get_history(symbol = x, start=start_date, end=end_date)
results.append(data)
big_df = pd.concat(results)

Python - Storing float values in CSV file

I am trying to store the positive and negative score of statements in a text file. I want to store the score in a csv file. I have implemented the below given code:
import openpyxl
from nltk.tokenize import sent_tokenize
import csv
from senti_classifier import senti_classifier
from nltk.corpus import wordnet
file_content = open('amazon_kindle.txt')
for lines in file_content:
sentence = sent_tokenize(lines)
pos_score,neg_score = senti_classifier.polarity_scores(sentence)
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
for val in range(pos_score):
writer.writerow(float(s) for s in val[0])
f.close()
But the code displays me the following error in for loop.
Traceback (most recent call last):
File "C:\Users\pc\AppData\Local\Programs\Python\Python36-32\classifier.py",
line 21, in for val in pos_score: TypeError: 'float' object is not iterable
You have several errors with your code:
Your code and error do not correspond with each other.
for val in pos_score: # traceback
for val in range(pos_score): #code
pos_score is a float so both are errors range() takes an int and for val takes an iterable. Where do you expect to get your list of values from?
And from usage it looks like you are expecting a list of list of values because you are also using a generator expression in your writerow
writer.writerow(float(s) for s in val[0])
Perhaps you are only expecting a list of values so you can get rid of the for loop and just use:
writer.writerow(float(val) for val in <list_of_values>)
Using:
with open('target.csv','w') as f:
means you no longer need to call f.close() and with closes the file at the end of the with block. This also means the writerow() needs to be in the with block:
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
writer.writerow(float(val) for val in <list_of_values>)

Resources