I have been trying to count and group per row the number of unique values. Perhaps will be easier to explain showing a table. I should first transpose before counting and groupby??
Box1
Box2
Box3
Count Result 1
Count Result 2
Count Result 3
Data A
Data A
Data B
Data A = 2
Data B = 1
Data C
Data D
Data B
Data C = 1
Data D = 1
Data B = 1
in GS try:
=ARRAYFORMULA(TRIM(SPLIT(FLATTEN(QUERY(QUERY(
QUERY(SPLIT(FLATTEN(A2:C3&" = ×"&ROW(A2:C3)), "×"),
"select max(Col1) group by Col1 pivot Col2")&
QUERY(SPLIT(FLATTEN(A2:C3&" = ×"&ROW(A2:C3)), "×"),
"select count(Col1) group by Col1 pivot Col2")&"",
"offset 1", ),,9^9)), "")))
I have two SQLite tables as Sqlite Table 1 and Sqlite Table 2.
Table1 has ID,Name and Code columns. Table2 has ID, Values and Con columns.
I want to create Excel as ID,Name,Code and Values Columns. ID,Name and Code columns comes from Table1 and Values column comes from table2 with sum value of Values column of table2 with two conditions are ID columns should be match and Con column satisfied with Done Value.
Below image is for reference:
I would approach this problem in steps.
First extract the sql tables into pandas dataframes. I am no expert on that aspect of the problem, but assuming you have two dataframes like the following:
df1 = ID Name Code
0 1 a 1a
1 2 b 2b
2 3 a 3c
and
df2 = ID Values Con
0 1 5 Done
1 2 9 No
2 1 7 Done
3 2 4 No
4 1 8 No
5 3 1 Done
def sumByIndex(dx, row):
# return sum value or 0 if ID doesn't exist
idx = row['ID']
st = list(dx['ID'])
if idx in st:
return dx[dx['ID'] == idx]['Values'].values[0]
else:
return 0
def combineFrames(d1, d2):
#Return updated version of d1 with "Values" column added
d3 = d2[d2['Con'] == 'Done'].groupby('ID', as_index= False).sum()
d1['Values'] = d1.apply( lambda row: sumByIndex(d3, row), axis = 1)
return d1
then print(combineFrames(df1, df2)) yields:
ID Name Code Values
0 1 a 1a 12
1 2 b 2b 0
2 3 a 3c 1
My program obtains the data from sqllite table 1 and sqlite table 2 in the form of lists (tuples and lists) with the corresponding values of ID, Name, Code and ID, Values, Con by making the request to the database like this 'SELECT * FROM sqlite table 1'
# sqlite table 1
table1 = [[5674, 'a', '1a'], [3385, 'b', '2b'], [5548, 'a', '3c']]
# sqlite table 2
table2 = [(5674, 5, 'Done'), (3385, 9, 'No'), (5674, 7, 'Done'), (3385, 4, 'No'), (5674, 8, 'No'), (5548, 1, 'Done')]
To begin I will add all the values Values in a dictionary that matches it with the corresponding ID
map_values = {table2[i][0]:0 for i in range(len(table2))}
for i in range(len(table2)):
if (table2[i][2] == 'Done'):
map_values[table2[i][0]] += table2[i][1]
then I define the pandas.DataFrame() instance using sqlite table 1 by this way:
df = pd.DataFrame(table1, index=[i for i in range(1, len(table1)+1)], columns=["ID", "Name", "Code"])
also the values of "Values" are stored in that order to later be added with a new Values column.
df["Values"] = list(map_values.values())
output:
ID Name Code Values
1 5674 a 1a 12
2 3385 b 2b 0
3 5548 a 3c 1
excel:
df.to_excel(r'./excel_file.xlsx', index=False)
def WA(A,B):
c = A*B
return c.sum()/A.sum()
sample = pd.pivot_table(intime,index=['year'], columns = 'DV', values = ['A','B'],
aggfunc={'A' : [len,np.sum], 'B' : [WA]}, fill_value = 0)
I'm grouping the dataframe by year and wanted to find the weighted avg of column B
I'm supposed to multiple column A with B then sum up the result and divide it by the sum of A [functian WA() does that]
I really have no idea on how to call the function by passing both the values
I have a table that stores column definition as listed below:
Col Name : store_name
Definition : name
Col Name : store_location
Definition : location
Table structure:
store_name,store_location
name,location
I am trying to have these values displayed in an excel spreadsheet using the below loop:
cursor = This queries the table that stores the above info
title_def = [i[0] for i in cursor.description]
row = 5
col = 2
for data in title_def:
worksheet1.write(row, col, data, header_format)
row += 1
The above loop only prints out the label. I am not sure how to modify the title_def above as I believe I am only filtering out the header and that gets displayed in the sheet using xlsxwriter. Could anyone advice how could I display both col_name and definition in the same spreadsheet..
# Loop through cells in Excel and print values
from openpyxl import load_workbook
workbook = load_workbook('C:\\your_path\\ExcelFile.xlsx')
sheet = workbook.active
row_count = sheet.max_row
for i in range(row_count):
print(sheet.cell(row=i, column=1).value)
# And if you want to do the same with a CSV file
import csv
with open('C:\\your_path\\CSVFile.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
I am attempting to drop '_Adj' from a column name, in a 'df_merged' data frame if (1) a column name contains 'eTIV' or "eTIV1'.
for col in df_merged.columns:
if 'eTIV1' in col or 'eTIV' in col:
df_merged.columns.str.replace('_Adj', '')
This code seems to be producing the following error:
KeyError: '[] not found in axis'
Here are two options:
Option 1
df_merged.columns = [col.replace('_Adj','') if 'eTIV' in col else col for col in list(df_merged.columns)]
Option 2
df_merged = df_merged.rename(columns={col: col.replace('_Adj','') if 'eTIV' in col else col for col in df_merged.columns})