How to sum data and join columns using Pandas - python-3.x

I want to sum values in data
from this:
to this:
but the problem is, the data doesn't add up but they like 11, 1111
here is my code:
df_data.insert(loc=2, column='Jumlah', value='1')
df_data.pivot_table(index='Kecamatan', columns='Status', aggfunc='sum', fill_value=0)
And how I can make the columns only KECAMATAN NEGATIF ODP ODPSELESAI OTG PDP
Thank you guys.

Note that the value of the inserted column (Jumlah) is a string.
In the next instruction you attempt to generate a pivot_table summing
this column.
But if you attempt to sum text values, it actually means concatenation.
To put things right, change the first instruction to:
df_data.insert(loc=2, column='Jumlah', value=1)
i.e. remove apostrophes surrounding 1.
Then this column will be of int type and will be summed as you wish.

Related

EXCEL - Dual VLOOKUP and Interpolation

I have a table on Excel with data as the following:
Meaning, I have different JPH based on the %SMALL unit and the number of active stations.
I need to create a matrix like the following (with %SMALL on horizontal and STATIONS on vertical axes):
And the formula for each cell should:
Take the input of Stations (column "B")
Check, for that specific Stations number, the amount of data on the other table (like make a filter on STATIONS for the specific number)
Perform an VLOOKUP for checking the JPH based on the %SMALL value on row 2
Interpolate for the exact JPH value, if not found on table
For now, I was able to create the last part (the VLOOKUP and the interpolation), with the following:
=IFERROR(VLOOKUP(C2;'EARLY-STATIONS'!$F:$H;3;FALSE);AVERAGE(OFFSET(INDEX('EARLY-STATIONS'!$H:$H;MATCH(C2;'EARLY-STATIONS'!$F:$F;1));0;0;2;1)))
The problem I'm facing is than with this, the calculation is not checking the number of stations, so the Iteration is not accurate.
Unfortunately I cannot use VBA macros to solve this.
Any clue?
This is an attempt because more clarity is needed in terms of all possible scenarios to consider, based on different input data and how to understand the "extrapolation" process. This approach understands as extrapolation the average of two values (lower and greater), but the idea can be customized to any other way to calculate it. Per tags listed in the question I assume there is no Excel version constraint. This is O365 solution:
=LET(sm, A2:A10, st, B2:B10, jph, C2:C10, smx, F1:J1, sty, E2:E4, NULL, "",
GETLk, LAMBDA(x,y,mode, FILTER(jph, (st=y)
* (sm = INDEX(sm, XMATCH(x, sm, mode))), NULL)),
GET, LAMBDA(x,y, LET(f, FILTER(jph, (jph=GETLk(x,y, 1))
+ (jph=GETLk(x,y, -1)), NULL), IF(#f=NULL, NULL, AVERAGE(f)))),
HREDUCE, LAMBDA(yi, DROP(REDUCE("", smx, LAMBDA(ac,x,
HSTACK(ac, GET(x, yi)))),,1)),
DROP(REDUCE("", sty, LAMBDA(ac,y, VSTACK(ac, HREDUCE(y)))),1))
The above formula spills the entire result, I don't think for this case you can use a LOOKUP-like function.
Here is the output:
The highlighted cells where the average is calculated.
Explanation
The main idea is to use DROP/REDUCE/HSTACK/VSTACK pattern to generate the grid. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length on how to apply it.
We use two user LAMBDA functions to abstract some calculations:
GETLk(x,y,mode), filters jph name based on %SMALL and Stations columns values, based on input values x (x-axis value from the grid), y (y-axis value form the grid) respectively. The third input argument mode, is for doing the approximate search in XMATCH (1-next largest, -1 next smallest). In case the value exist in the input table, XMATCH returns the same value in both cases.
GET(x,y) has the logic to find the value or if the value doesn't exist to calculate the average. It uses the previous LAMBDA function GETLk. We filter for jph values that match the input values (x,y), but we use an OR condition in the FILTER (+), to select both lower or greater values. If the value exist, returns just one value otherwise two values are returned by FILTER (f). Finally if f is not empty we return the average, otherwise the value we setup as NULL.
HREDUCE: Concatenate the result by columns for a given row of the grid. Check the referred question for more information about it.

Comparing two columns and their values and outputting the greater value

I'm trying to compare two columns ("Shows") from different tables and showing which one has the greater number ("Rating") associated with it in another table.
Ignore the operation column above as part of the solution that I'm trying to get, it's just to illustrate for you what I'm trying to compare.
Important note: If the names are duplicated. Compare the matching pair in their corresponding order. (1st with 1st, 2nd with 2nd, 3rd with 3rd etc..) illustrated in the table below:
Thanks
You can try the following in cell F3 for an array solution that spills the entire result at once:
=LET(sA, A3:A6, rA, B3:B6, sB, C3:C6, rB, D3:D6, CNTS, LAMBDA(x,
LET(seq, SEQUENCE(ROWS(x)), MAP(seq, LAMBDA(s,ROWS(FILTER(x,(x=INDEX(x,s))
*(seq<=s))))))), cntsA, CNTS(sA), cntsB, CNTS(sB), eval, MAP(sA, rA, cntsA,
LAMBDA(s,r,c,IF(r > FILTER(rB, (sB=s) * (cntsB=c)), "Table 1", "Table 2"))),
HSTACK(sA, eval))
Here is the output:
Explanation
The main idea is to count repeated show values. We use a user LAMBDA function CNTS, to avoid repetition of the same formula twice. Once we have the counts (cntsA, contsB), we use MAP to iterate over Table 1 elements with the counts and look for specific show and counts to compare with Table 2 columns. The FILTER function will return always a single value (based on sample data). Finally, we prepare the output as expected using HSTACK.
Try-
=IF(INDEX(FILTER($B$3:$B$6,$A$3:$A$6=G3),COUNTIFS($G$3:$G3,G3))>INDEX(FILTER($E$3:$E$6,$D$3:$D$6=G3),COUNTIFS($G$3:$G3,G3)),"Table-1","Table-2")

How to recover second element separate by character with index equiv

I got a table in Excel like this:
I used index with double equiv to have only the price for column A, the price for column B, the price for column C, I did this :
=INDEX($J$1:$L$4;EQUIV($F6;$J$1:$J$4;0);EQUIV(Z$24;$J$1:$L$1;0))
But I would like to have only the value at the right of ";" but I don't know how to combine with my index and equiv to have only the value 111,1456,44455.
I have this:
EQUIV() is the french name for MATCH() am I right?
If so just use a wildcard-match:
=MATCH("*;"&$F6,$J$1:$J$4,0)
Or the french equivalent:
=EQUIV("*;"&F6;$J$1:$J$4;0)
Your question is not quite clear, I am assuming you have a multiple values separated by semicolon ";" in column Price and now you want a portion of it, in this case only Right, if that is so, here is your solution:
Price
112233;50.99
223344;15.50
3344;150.5
to get the left side, use
=LEFT(C2,LEN(C2)-FIND(";",C2)-1)
here you have to subtract -1 because we don't want to include the semicolon at the end
to get the right side, use
=RIGHT(C2,LEN(C2)-FIND(";",C2))
Result:

Index rows and columns unexpected results

I'm trying to understand the following behaviour:
If I have the following data:
A
B
a
1
b
2
c
3
If I use =INDEX($A$1:$B$3,,)
It will correctly show the whole range.
If I use =INDEX($A$1:$B$3,1,)
It will correctly show the data for the first row for both columns.
If I use =INDEX($A$1:$B$3,SEQUENCE(2),)
I expect it to show the data for the first two rows for both columns. Instead it shows the data of the first two rows, not showing data for the second column.
How come INDEX loses the column reference here?
INDEX reads its parameters as a pair of lists.
For example, using array constants, you can type:
=INDEX(A1:B3,{1,3},{1,2})
which gives:
a 3
because Excel reads this as {1,1}, {3,2}.
With SEQUENCE, an array constant is returned, and so SEQUENCE(2) returns {1;2}. When used twice, Excel processes {1,1};{2,2}.
You can use SEQUENCE to return a vertical array constant, such as
SEQUENCE(1,2)
which returns {1,2}.
Now it works:
=INDEX(A1:B3,SEQUENCE(2),SEQUENCE(1,2))
Or, using a mix of horizontal and vertical array constants
=INDEX(A1:B3,{1;2},{1,2})
Ref:
https://support.microsoft.com/en-us/office/guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7
Create one and two-dimensional array constants

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Resources