Analysis from unique values from one column naming files from another - python-3.x

I have the following code that I am using to loop through unique values in my data set and it works great, but I would like to change the export name to a more appropriate unique value that I can use in a dashboard. The following is the code that I have where x in path is taken from teams unique names. However, for this part only, I'd like the name to be assigned from a list outside of the original dataframe.
team = df['RSA'].unique()
for x in team:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %x
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
Couple of ways I've tried to solve this:
1) Create a list and assign x in the path to the list instead of x.
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
for x in team:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %mylist
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
That doesn't work because it doesn't loop and it doesn't align the new values to the original dataframe values. Cross that off the list.
2) I thought maybe a loop inside the loop would do the trick:
team = df['RSA'].unique()
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
for x in team:
for name in mylist:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %name
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
That didn't work either. It just gave me the last value in mylist, but also it doesn't align the the unique values in team list appropriately.
3) Next I created a dataframe with the unique values from team and the new list.
team = df['RSA'].unique()
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
dict = {'RSA': team, 'DASH_ID': mylist}
newdf = pd.DataFrame(dict)
print (newdf)
RSA DASH_ID
0 Intermountain Region, R4 HR_DASH_0034
1 Pacific Southwest Region, R5 HR_DASH_0035
2 Alaska Region, R10 HR_DASH_0036
3 Pacific Northwest Region, R6 HR_DASH_0037
4 Northern Region, R1 HR_DASH_0038
5 Eastern Region, R9 HR_DASH_0039
6 Albuquerque Service Center(ASC) HR_DASH_0040
7 Rocky Mountain Region, R2 HR_DASH_0041
8 Research & Development(RES) HR_DASH_0042
9 Washington Office(WO) HR_DASH_0043
10 Southwestern Region, R3 HR_DASH_0044
11 Southern Region, R8 HR_DASH_0045
12 L2 Desc Not Available empty
However, I still don't know how to get the DASH_ID column element names to export in my path mentioned above.
So in the end, HR_DASH_0034, name should align to Intermountain Region, R4 when the file is sent out.
Any help appreciated!

Inside your first approach, just use:
mylist = [...] # your list definition
ml_iter = iter(mylist)
and inside the loop, replace mylist with:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %str(next(ml_iter))
More info: https://www.programiz.com/python-programming/methods/built-in/iter
Lemme know if this helps!
UPDATE: Second Solution
for x, m in zip(team, mylist):
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %m
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
Let me know if this works!

Related

How to merge cells without losting data in libreoffice-calc?

Launch libreoffice-calc:
soffice --calc --accept="socket,host=localhost,port=2002;urp;StarOffice.ServiceManager"
Launch python shell to write data into the calc:
import uno
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
context = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
svcmgr = context.ServiceManager
desktop = svcmgr.createInstanceWithContext("com.sun.star.frame.Desktop", context)
oDoc = desktop.loadComponentFromURL( "private:factory/scalc","_blank", 0, () )
oSheet = oDoc.getSheets().getByIndex(0)
oRange = oSheet.getCellRangeByName("A1:C3")
Write data into oRange.
oRange.setDataArray((('a1','a2','a3'),('b1','b2','b3'),('c1','c2','c3'),))
The calc's appearance now:
I want to merge all data in oRange and format it with vertical and horizontal alignment.
My desired effect in the editing calc.
oRange.merge()
oCell = oSheet.getCellByPosition(0, 0)
oCell.HoriJustify = 2
oCell.VertJustify = 2
Merged data with vertical and horizontal alignment ,previous data in many cells b1-c1 and a2-c2 and a3-c3 lost.
The real effect.
How to fix my code to get the desired effect?
import uno
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
context = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
svcmgr = context.ServiceManager
desktop = svcmgr.createInstanceWithContext("com.sun.star.frame.Desktop", context)
oDoc = desktop.loadComponentFromURL( "private:factory/scalc","_blank", 0, () )
oSheet = oDoc.getSheets().getByIndex(0)
oRange = oSheet.getCellRangeByName("A1:C3")
tup = (('a1','a2','a3'),('b1','b2','b3'),('c1','c2','c3'),)
oRange.setDataArray(tup)
target =''
for item in tup:
tmp = ' '.join(item)
target = target + tmp + ' '
target = target.strip()
oRange.merge(True)
oCell = oSheet.getCellByPosition(0, 0)
oCell.String = target
oCell.HoriJustify = 2
oCell.VertJustify = 2
I'm not sure, but I think UNO has no way of knowing that you want to rearrange the data into the merged cell that way. What UNO does is "copy" the data from the "main" cell (the top left one) and "paste" its data into the merged cell. Therefore, what you could do is change the data of the main cell before merging. Check the example below.
# get range
oRange = oSheet.getCellRangeByName("A1:C3")
# build string
flat_list = [str(item) for sublist in oRange.getDataArray() for item in sublist]
string = ' '.join(flat_list)
# put string into main cell
main_cell = oRange.getCellByPosition(0, 0)
main_cell.setString(string)
# merge
oRange.merge()

How to calculate active hours of an employee using face_recognition for attendance tracking

I am working on face recognition system for my academic project. I want to set the first time an employee was recognized as his first active time and the next time he is being recognized should be recorded as his last active time and then calculate the total active hours based on first active and last active time.
I tried with the following code but I'm getting only the current system time as the start time. can someone help me on what I am doing wrong.
Code:
data = pickle.loads(open(args["encodings"], "rb").read())
vs = VideoStream(src=0).start()
writers = None
time.sleep(2.0)
while True:
frame = vs.read()
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
rgb = imutils.resize(frame, width=750)
r = frame.shape[1] / float(rgb.shape[1])
boxes = face_recognition.face_locations(rgb)
encodings = face_recognition.face_encodings(rgb, boxes)
names = []
face_names = []
for encoding in encodings:
matches = face_recognition.compare_faces(data["encodings"],
encoding)
name = "Unknown"
if True in matches:
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
name = max(counts, key=counts.get)
names.append(name)
if names != []:
for i in names:
first_active_time = datetime.now().strftime('%H:%M')
last_active_time = datetime.now().strftime('%H:%M')
difference = datetime.strptime(first_active_time, '%H:%M') - datetime.strptime(last_active_time, '%H:%M')
difference = difference.total_seconds()
total_hours = time.strftime("%H:%M", time.gmtime(difference))
face_names.append([i, first_active_time, last_active_time, total_hours])

Updating multiple line plots dynamically in callback in bokeh

I have a use case where I have multiple line plots (with legends), and I need to update the line plots based on a column condition. Below is an example of two data set, based on the country, the column data source changes. But the issue I am facing is, the number of columns is not fixed for the data source, and even the types can vary. So, when I update the data source based on a callback when there is a new country selected, I get this error:
Error: attempted to retrieve property array for nonexistent field 'pay_conv_7d.content'.
I am guessing because in the new data source, the pay_conv_7d.content column doesn't exist, but in my plot those lines were already there. I have been trying to fix this issue by various means (making common columns for all country selection - adding the missing column in the data source in callback, but still get issues.
Is there any clean way to have multiple line plots updating using callback, and not do a lot of hackish way? Any insights or help would be really appreciated. Thanks much in advance! :)
def setup_multiline_plots(x_axis, y_axis, title_text, data_source, plot):
num_categories = len(data_source.data['categories'])
legends_list = list(data_source.data['categories'])
colors_list = Spectral11[0:num_categories]
# xs = [data_source.data['%s.'%x_axis].values] * num_categories
# ys = [data_source.data[('%s.%s')%(y_axis,column)] for column in data_source.data['categories']]
# data_source.data['x_series'] = xs
# data_source.data['y_series'] = ys
# plot.multi_line('x_series', 'y_series', line_color=colors_list,legend='categories', line_width=3, source=data_source)
plot_list = []
for (colr, leg, column) in zip(colors_list, legends_list, data_source.data['categories']):
xs, ys = '%s.'%x_axis, ('%s.%s')%(y_axis,column)
plot.line(xs,ys, source=data_source, color=colr, legend=leg, line_width=3, name=ys)
plot_list.append(ys)
data_source.data['plot_names'] = data_source.data.get('plot_names',[]) + plot_list
plot.title.text = title_text
def update_plot(country, timeseries_df, timeseries_source,
aggregate_df, aggregate_source, category,
plot_pay_7d, plot_r_pay_90d):
aggregate_metrics = aggregate_df.loc[aggregate_df.country == country]
aggregate_metrics = aggregate_metrics.nlargest(10, 'cost')
category_types = list(aggregate_metrics[category].unique())
timeseries_df = timeseries_df[timeseries_df[category].isin(category_types)]
timeseries_multi_line_metrics = get_multiline_column_datasource(timeseries_df, category, country)
# len_series = len(timeseries_multi_line_metrics.data['time.'])
# previous_legends = timeseries_source.data['plot_names']
# current_legends = timeseries_multi_line_metrics.data.keys()
# common_legends = list(set(previous_legends) & set(current_legends))
# additional_legends_list = list(set(previous_legends) - set(current_legends))
# for legend in additional_legends_list:
# zeros = pd.Series(np.array([0] * len_series), name=legend)
# timeseries_multi_line_metrics.add(zeros, legend)
# timeseries_multi_line_metrics.data['plot_names'] = previous_legends
timeseries_source.data = timeseries_multi_line_metrics.data
aggregate_source.data = aggregate_source.from_df(aggregate_metrics)
def get_multiline_column_datasource(df, category, country):
df_country = df[df.country == country]
df_pivoted = pd.DataFrame(df_country.pivot_table(index='time', columns=category, aggfunc=np.sum).reset_index())
df_pivoted.columns = df_pivoted.columns.to_series().str.join('.')
categories = list(set([column.split('.')[1] for column in list(df_pivoted.columns)]))[1:]
data_source = ColumnDataSource(df_pivoted)
data_source.data['categories'] = categories
Recently I had to update data on a Multiline glyph. Check my question if you want to take a look at my algorithm.
I think you can update a ColumnDataSource in three ways at least:
You can create a dataframe to instantiate a new CDS
cds = ColumnDataSource(df_pivoted)
data_source.data = cds.data
You can create a dictionary and assign it to the data attribute directly
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [17.0, 116.0], [17.0, 126.0]],
'ys1': [[179.0, 169.0], [179.0, 1169.0], [1729.0, 169.0]],
'xs2': [[27.0, 276.0], [27.0, 216.0], [27.0, 226.0]],
'ys2': [[279.0, 269.0], [279.0, 2619.0], [2579.0, 2569.0]]
}
data_source.data = d
Here if you need different sizes of columns or empty columns you can fill the gaps with NaN values in order to keep column sizes. And I think this is the solution to your question:
import numpy as np
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [np.nan], [np.nan]],
'ys1': [[179.0, 169.0], [np.nan], [np.nan]],
'xs2': [[np.nan], [np.nan], [np.nan]],
'ys2': [[np.nan], [np.nan], [np.nan]]
}
data_source.data = d
Or if you only need to modify a few values then you can use the method patch. Check the documentation here.
The following example shows how to patch entire column elements. In this case,
source = ColumnDataSource(data=dict(foo=[10, 20, 30], bar=[100, 200, 300]))
patches = {
'foo' : [ (slice(2), [11, 12]) ],
'bar' : [ (0, 101), (2, 301) ],
}
source.patch(patches)
After this operation, the value of the source.data will be:
dict(foo=[11, 22, 30], bar=[101, 200, 301])
NOTE: It is important to make the update in one go to avoid performance issues

Merging data frames by date in python

I have a 8 dataframes which contain date and a stock return. They are of different length (some 5 years, some 8, etc.). Some of them are randomly non-defined (depends on what the user do before - which countries she chooses). What I want to do, is to merge by date only those dataframes which have data and to find correlation between the columns. Please, advise me this issue. I use
the command below to merge data frames, but because some of them do not exist (this is random) I can not merge them.
merged =pd.concat([germany_return, france_return, usa_return, hongkong_return, india_return, japan_return, england_return, china_return], axis=1)
Below I present the last part of the code:
#Define Stock Price colum
stock_germany = data_germany['Settle']
stock_france = data_france['Settle']
stock_usa = data_usa['Value']
stock_hongkong = data_hongkong['Last Traded']
stock_india = data_india['Close']
stock_japan = data_india['Close']
stock_england = data_england['Settle']
stock_china = data_china['Value']
#Calculate Monthly returns
germany_return = stock_germany.pct_change(21)
france_return = stock_france.pct_change(21)
usa_return = stock_usa.pct_change(21)
hongkong_return = stock_hongkong.pct_change(21)
india_return = stock_india.pct_change(21)
japan_return = stock_japan.pct_change(21)
england_return = stock_england.pct_change(21)
china_return = stock_china.pct_change(21)
merged =pd.concat([germany_return, france_return, usa_return, hongkong_return, india_return, japan_return, england_return, china_return], axis=1)

Python pyodbc fetchmany() how to select out put to update query

I have code to fetchmany() that will output eg:10 records
And i have added iterating value for each 0 1 2 3 4 5 for print statement , now i want user input 0 or 1 and it should select column. For those input so i can update sql record for those column
cur.execute("select events.SERIALNUM, emp.LASTNAME, emp.SSNO,
events.EVENT_TIME_UTC from AccessControl.dbo.emp,
AccessControl.dbo.events where emp.id = events.empid and emp.SSNO=?
order by EVENT_TIME_UTC desc ", empid)
rows = cur.fetchmany(att_date)
n = 0
for row in rows :
event_date = row.EVENT_TIME_UTC
utc = event_date.replace(tzinfo=from_zone)
utc_to_local = utc.astimezone(to_zone)
local_time = utc_to_local.strftime('%H:%M:%S')
att_date = utc_to_local.strftime('%d:%m:%y')
print (n, row.SERIALNUM, row.LASTNAME, row.SSNO, att_date, local_time)
n = n + 1
seri_al = input("Copy And Past the serial number u want to modifiy: ")
this will output following Data
0 1500448188 FIRST NAME 03249 2017-07-19 17:01:17
1 1500448187 FIRST NAME 03249 2017-07-19 17:01:15
Eg:
seri_al = input("Copy And Past the serial number u want to modifiy: ")
instead of copying and pasting '1500448188' these numbers I want the user to only enter '0' and map that one and update sql query as for where clause serial number.
It appears that you already know how to use input to prompt for the user's choice. The only piece you are missing is to add items to a dictionary as you loop through the rows. Here is a slightly abstracted example:
rows = [('1500448188',),('1500448187',)] # test data
selections = dict()
n = 0
for row in rows:
selections[n] = row[0]
print(n, repr(row[0]))
n += 1
select = input("Enter the index (0, 1, ...) you want to select: ")
selected_key = selections[int(select)]
print("You selected " + repr(selected_key))
which prints
0 '1500448188'
1 '1500448187'
Enter the index (0, 1, ...) you want to select: 1
You selected '1500448187'

Resources