'datetime.time' object has no attribute 'where' Python - python-3.x

I need to check if an employee has checked out during the break.
To do so, I need to see if there is the time in which Door Name is RDC_OUT-1 is in the interval [12:15:00 ; 14:15:00]
import pandas as pd
df_by_date= pd.DataFrame({'Time':['01/02/2019 07:02:07', '01/02/2019 10:16:55', '01/02/2019 12:27:20', '01/02/2019 14:08:58','01/02/2019 15:32:28','01/02/2019 17:38:54'],
'Door Name':['RDC_OUT-1', 'RDC_IN-1','RDC_OUT-1','RDC_IN-1','RDC_OUT-1','RDC_IN-1']})
df_by_date['Time'] = pd.to_datetime(df_by_date['Time'])
df_by_date['hours']=pd.to_datetime(df_by_date['Time'], format='%H:%M:%S').apply(lambda x: x.time())
print('hours \n',df_by_date['hours'])
out = '12:15:00'
inn = '14:15:00'
pause=0
for i in range (len(df_by_date)):
if (out < str((df_by_date['hours'].iloc[i]).where(df_by_date['Door Name'].iloc[i]=='RDC_IN-1')) < inn) :
pause+=1
print('Break outside ')
else:
print('Break inside')
When running the code above, I got this error:
if (out < ((df_by_date['hours'].iloc[i]).where(df_by_date['Door Name'].iloc[i]=='RDC_OUT-1')) < inn) :
AttributeError: 'datetime.time' object has no attribute 'where'

When you are iterating the DataFrame/Series you are selecting one cell at a time.
The cell which you are Selecting is of type datetime.time
However, where only works with the complete DataFrame/Series rather than having this in a loop.
Like,
sub_df = df_by_date['hours'].where(condition)
and then to count you can use len(sub_df)

Related

AttributeError: 'tuple' object has no attribute 'value' PYTHON TO EXCEL

I am currently trying to write data to an excel spreadsheet using Python3 and openpyxl. I have figured out how to do it when assigning one single value, but for some reason when I introduce a For loop it is giving me an error. This program will eventually be used to filter through a python dictionary and print out the keys and values from the python dictionary. For now, I am just trying to create a loop that will input a random integer in the spreadsheet for every key listed in the dictionary (not including nested keys). If anyone can help me determine why this error is coming up it would be much appreciated. Thanks in advance!
# Writing Dictionary to excel spreadsheet
import openpyxl
import random
wb = openpyxl.load_workbook("ExampleSheet.xlsx")
sheet = wb.get_sheet_by_name("Sheet1")
sheet["B1"].value = "Price" #This works and assigns the B1 value to "price" in the spreadsheet
my_dict = {'key': {'key2' : 'value1', 'key3' : 'value2'} 'key4' : {'key5' : 'value3', 'key6' : 'value4'}} #an example dictionary
n = len(my_dict)
for i in range(0,n):
sheet['A'+ str(i)].value = random.randint(1,10) #This does not work and gives an error
wb.save('ExampleSheet.xlsx')
OUTPUT >>> AttributeError: 'tuple' object has no attribute 'value'
The first column of pyxl, is one based, so if you modify your loop to go over range(1,n) your issues should be resolved
Using .format(i) instead of string + str(i) in ur code may work well!
BTW, ur var my_dict get an error .
eg:
for i in range(10):
sheet['A{}'.format(i)].value = 'xx'

How to concatenate data frames from two different dictionaries into a new data frame in python?

This is my sample code
dataset_current=dataset_seq['Motor_Current_Average']
dataset_consistency=dataset_seq['Consistency_Average']
#technique with non-overlapping the values(for current)
dataset_slide=dataset_current.tolist()
from window_slider import Slider
import numpy
list = numpy.array(dataset_slide)
bucket_size = 336
overlap_count = 0
slider = Slider(bucket_size,overlap_count)
slider.fit(list)
empty_dictionary = {}
count = 0
while True:
count += 1
window_data = slider.slide()
empty_dictionary['df_current%s'%count] = window_data
empty_dictionary['df_current%s'%count] =pd.DataFrame(empty_dictionary['df_current%s'%count])
empty_dictionary['df_current%s'%count]= empty_dictionary['df_current%s'%count].rename(columns={0: 'Motor_Current_Average'})
if slider.reached_end_of_list(): break
locals().update(empty_dictionary)
#technique with non-overlapping the values(for consistency)
dataset_slide_consistency=dataset_consistency.tolist()
list = numpy.array(dataset_slide_consistency)
slider_consistency = Slider(bucket_size,overlap_count)
slider_consistency.fit(list)
empty_dictionary_consistency = {}
count_consistency = 0
while True:
count_consistency += 1
window_data_consistency = slider_consistency.slide()
empty_dictionary_consistency['df_consistency%s'%count_consistency] = window_data_consistency
empty_dictionary_consistency['df_consistency%s'%count_consistency] =pd.DataFrame(empty_dictionary_consistency['df_consistency%s'%count_consistency])
empty_dictionary_consistency['df_consistency%s'%count_consistency]= empty_dictionary_consistency['df_consistency%s'%count_consistency].rename(columns={0: 'Consistency_Average'})
if slider_consistency.reached_end_of_list(): break
locals().update(empty_dictionary_consistency)
import pandas as pd
output_current ={}
increment = 0
while True:
increment +=1
output_current['dataframe%s'%increment] = pd.concat([empty_dictionary_consistency['df_consistency%s'%count_consistency],empty_dictionary['df_current%s'%count]],axis=1)
My question is i have two dictionaries that contains 79 data frames in each one of them namely "empty_dictionary_consistency" and "empty_dictionary" . I want to create a new data frame for each one of them so that it concatenates df1 from empty_dictionary_consistency with df1 from empty_dictionary .So , it will start from concatenating df1 from empty_dictionary_consistency with df1 from empty_dictionary till df79 from empty_dictionary_consistency with df79 from empty_dictionary . I tried using while loop to increment it but does not shows any output.
output_current ={}
increment = 0
while True:
increment +=1
output_current['dataframe%s'%increment] = pd.concat([empty_dictionary_consistency['df_consistency%s'%count_consistency],empty_dictionary['df_current%s'%count]],axis=1)
Can anyone help me regarding this? How can i do this.
I am not near my computer now, so I can not test the code, but it seems that the problem is in indices. In the last loop, on every iteration you increment a variable called 'increment', but you still use indices from previous loops for dictionaries that you want to concatenate. Try to change variables that you use for indexing all dictionaries to 'increment'.
And one more thing - I can't see when this loop is going to finish?
UPD
I mean this:
length = len(empty_dictionary_consistency)
increment = 0
while increment < length:
increment +=1
output_current['dataframe%s'%increment] = pd.concat([empty_dictionary_consistency['df_consistency%s'%increment],empty_dictionary['df_current%s'%increment]],axis=1)
While iterating over your dictionaries you should use a variable that you increment as an index in all three dictionaries. And as soon as you do not use a Slider object in the loop, you have to stop it when the first dictionary is over.

Function reads iteration target as local argument string instead of iteration value in for loop

Trying to run this simple for loop with a pandas cross tab function. The iteration target is an argument in the cross-tab function. It's supposed to read through a list of columns and produce a cross-tab for each column combination. But instead it's interpreting my 'i' iterable as the literal title of the column instead of whatever variable it should be in that iteration.
I get the error: 'DataFrame' object has no attribute 'i' because it's reading 'i' as the literal name of an attribute instead of the value that should be stored in i from the loop.
import pandas
DF = pandas.read_excel('example.xlsx')
Categories = list(DF.columns.values)
for i in Categories:
pandas.crosstab(DF.Q, DF.i, normalize = 'index', margins=True)
IIUC, you want to loop though every column and create the cross tab against column Q, but your current loop won't produce anything.
Use the following to assign the results to a python dict that you can access with column names as the key:
DF = pandas.read_excel('example.xlsx')
Categories = list(DF.columns.values)
cross_tabs = {}
for i in Categories:
cross_tabs[i] = pandas.crosstab(DF.Q, DF[i], normalize = 'index', margins=True)

Pandas Dataframe: New Column Reference Week Beginning Monday

Wrote this function to reference and existing date column to create a new column called wbm (short for week beginning monday).
def wbmFunc(df, col):
if df[col].weekday() == 0:
return df[col]
else:
return df[col] + timedelta(days=(0 - df[col].weekday()))
df['wbm'] = wbmFunc(df, 'date')
Why does it return the below error?
AttributeError: 'Series' object has no attribute 'weekday'
Since you want to access a datetime like property you have to use:
series.dt.weekday
Also note that since it is a property, you don't call a function on the series.
You can refer to the pandas Documentation on this topic.
It looks like you want to construct a new column that takes the week begin Monday for a given date. I think to achieve this, even you fix the property bug, there is still some problem. Why not use the pd.offsets ? You can try the following code for the same purpose
def wbmFunc(df, col):
w_mon = pd.offsets.Week(weekday=0)
return df[col].apply(w_mon.rollback)

Openpyxl: Manipulation of cell values

I'm trying to pull cell values from an excel sheet, do math with them, and write the output to a new sheet. I keep getting an ErrorType. I've run the code successfully before, but just added this aspect of it, thus code has been distilled to below:
import openpyxl
#set up ws from file, and ws_out write to new file
def get_data():
first = 0
second = 0
for x in range (1, 1000):
if ws.cell(row=x, column=1).value == 'string':
for y in range (1, 10): #Only need next ten rows after 'string'
ws_out.cell(row=y, column=1).value = ws.cell(row=x+y, column=1).value
second = first #displaces first -> second
first = ws.cell(row=x+y, column=1).value/100 #new value for first
difference = first - second
ws_out.cell(row=x+y+1, column=1).value = difference #add to output
break
Throws a TypeError message:
first = ws.cell(row=x+y, column=1).value)/100
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
I assume this is referring to the ws.cell value and 100, respectively, so I've also tried:
first = int(ws.cell(row=x, column=1))/100 #also tried with float
Which raises:
TypeError: int() argument must be a string or a number
I've confirmed that every cell in the column is made up of numbers only. Additionally, openpyxl's cell.data_type returns 'n' (presumably for number as far as I can tell by the documentation).
I've also tested more simple math, and have the same error.
All of my searching seems to point to openpyxl normally behaving like this. Am I doing something wrong, or is this simply a limitation of the module? If so, are there any programmatic workarounds?
As a bonus, advice on writing code more succinctly would be much appreciated. I'm just beginning, and feel there must be a cleaner way to write an ideas like this.
Python 3.3, openpyxl-1.6.2, Windows 7
Summary
cfi's answer helped me figure it out, although I used a slightly different workaround. On inspection of the originating file, there was one empty cell (which I had missed earlier). Since I will be re-using this code later on columns with more sporadic empty cells, I used:
if ws.cell(row=x+r, column=40).data_type == 'n':
second = first #displaces first -> second
first = ws.cell(row=x+y, column=1).value/100 #new value for first
difference = first - second
ws_out.cell(row=x+y+1, column=1).value = difference #add to output
Thus, if a specified cell was empty, it was ignored and skipped.
Are you 100% sure (=have verified) that all the cells you are accessing actually hold a value? (Edit: Do a print("dbg> cell value of {}, {} is {}".format(row, 1, ws.cell(row=row, column=1).value)) to verify content)
Instead of going through a fixed range(1,1000) I'd recomment to use openpyxl introspection methods to iterate over existing rows. E.g.:
wb=load_workbook(inputfile)
for ws in wb.worksheets:
for row in ws.rows:
for cell in row: value = cell.value
When getting the values do not forget to extract the .value attribute:
first = ws.cell(row=x+y, column=1).value/100 #new value for first
As a general note: x, and y are useful variable names for 2D coordinates. Don't use them both for rows. It will mislead others who have to read the code. Instead of x you could use start_row or row_offset or something similar. Instead of y you could just use row and you could let it start with the first index being the start_row+1.
Some example code (untested):
def get_data():
first = 0
second = 0
for start_row in range (1, ws.rows):
if ws.cell(row=start_row, column=1).value == 'string':
for row in range (start_row+1, start_row+10):
ws_out.cell(row=start_row, column=1).value = ws.cell(row=row, column=1)
second = first
first = ws.cell(row=row, column=1).value/100
difference = first - second
ws_out.cell(row=row+1, column=1).value = difference
break
Now with this code I still don't understand what you are trying to achieve. Is the break indented correctly? If yes, the first time you are matching string, the outer loop will be quit by the break. Then, what is the point of the variables first and second?
Edit: Also ensure that your reading from and writing into cell().value not just cell().

Resources