built in method like pandas.sub() that does row wise subtraction

built in method like pandas.sub() that does row wise subtraction - python-3.x

I am looking for a simple method for doing row wise subtraction on a pandas df. The closest that I can find is df.shift(1) which only works on datetime. So if I have a dataframe
df['col'] = [1,2,3,4,5]
Is there a built in method that will allow me do element wise subtraction so that it returns the following, by subtracting from every element the one on its left. The first element will stay as is.
sd['col'] = [1, 1, 1, 1, 1]
Is there an already existing method that does this or do I have to code it myself ?

In your comment you write
First value is correct. The subsequent values are to be subtracted.
So, say you want to place the differences in a column called diff. Then you could do
df['diff'] = df['col'].diff()
(using pd.DataFrame.diff), which would place the difference in every entry except the first. You can easily mend this with
df['diff'].values[0] = df['col'].values[0]

Related

Comparing two columns and their values and outputting the greater value

I'm trying to compare two columns ("Shows") from different tables and showing which one has the greater number ("Rating") associated with it in another table.
Ignore the operation column above as part of the solution that I'm trying to get, it's just to illustrate for you what I'm trying to compare.
Important note: If the names are duplicated. Compare the matching pair in their corresponding order. (1st with 1st, 2nd with 2nd, 3rd with 3rd etc..) illustrated in the table below:
Thanks

You can try the following in cell F3 for an array solution that spills the entire result at once:
=LET(sA, A3:A6, rA, B3:B6, sB, C3:C6, rB, D3:D6, CNTS, LAMBDA(x,
LET(seq, SEQUENCE(ROWS(x)), MAP(seq, LAMBDA(s,ROWS(FILTER(x,(x=INDEX(x,s))
*(seq<=s))))))), cntsA, CNTS(sA), cntsB, CNTS(sB), eval, MAP(sA, rA, cntsA,
LAMBDA(s,r,c,IF(r > FILTER(rB, (sB=s) * (cntsB=c)), "Table 1", "Table 2"))),
HSTACK(sA, eval))
Here is the output:
Explanation
The main idea is to count repeated show values. We use a user LAMBDA function CNTS, to avoid repetition of the same formula twice. Once we have the counts (cntsA, contsB), we use MAP to iterate over Table 1 elements with the counts and look for specific show and counts to compare with Table 2 columns. The FILTER function will return always a single value (based on sample data). Finally, we prepare the output as expected using HSTACK.

Try-
=IF(INDEX(FILTER($B$3:$B$6,$A$3:$A$6=G3),COUNTIFS($G$3:$G3,G3))>INDEX(FILTER($E$3:$E$6,$D$3:$D$6=G3),COUNTIFS($G$3:$G3,G3)),"Table-1","Table-2")

how to transform a table in Excel from vertical to horizontal but with different length

i would like to get table 2 from Table 1 in a quicker way. can people help? thanks
so far i have done pivot tables and manually copy and paste transpose, but this is really time consuming/

Here, a solution that uses DROP/REDUCE/VSTACK pattern to generate each row. Check for example #JvdV's answer from this question: How to split texts from dynamic range? and a similar idea DROP/REDUCE/HSTACK pattern to generate the columns for a given row. In cell E2 put the following formula:
=LET(set, A2:B13, IDs, INDEX(set,,1), dates, INDEX(set,,2),
HREDUCE, LAMBDA(id, arr, REDUCE(id, arr, LAMBDA(acc, x, HSTACK(acc, x)))),
output, DROP(REDUCE("", UNIQUE(IDs), LAMBDA(ac, id, VSTACK(ac, LET(
idDates, FILTER(dates, ISNUMBER(XMATCH(IDs, id))), HREDUCE(id, idDates)
)))),1), IFERROR(VSTACK(HSTACK("ID", "Dates"), output), "")
)
and here is the output:
Update
As #JdvD pointed out in the comments section there is a shorted way:
=LET(set, A3:B13, title, A1:B1, IDs, INDEX(set,,1), dates, INDEX(set,,2),
IFERROR(REDUCE(title, UNIQUE(IDs),LAMBDA(ac, id,
VSTACK(ac,HSTACK(id,TOROW(FILTER(dates,IDs=id)))))),"")
)
The main idea is to use the title as a way to initialize the VSTACK accumulator (no need to use DROP), and have all the dates for a given id all at once via the FILTER function. As a side note, it can be expressed in terms of the pattern we explained in the Explanation section (see below), as follow:
=LET(set, A3:B13, title, A1:B1, IDs, INDEX(set,,1), dates, INDEX(set,,2),
HREDUCE, LAMBDA(id, HSTACK(id, TOROW(FILTER(dates,IDs=id)))),
IFERROR(REDUCE(title, UNIQUE(IDs),LAMBDA(ac,id, VSTACK(ac, HREDUCE(id)))),"")
)
Note: Keeping the same name of the user LAMBDA function (HREDUCE) for sake of consistency with the Explanation section, but there is no need to use REDUCE. A more appropriate name would be PIVOT_DATES.
Explanation
HREDUCE is a user LAMBDA function that implements the DROP/REDUCE/HSTACK pattern. In order to generate all the columns for a given row, this is the pattern to follow:
DROP(REDUCE("", arr, LAMBDA(acc, x, HSTACK(acc, func))),,1)
It iterates over all elements of arr (x) and uses HSTACK to concatenate column by column on each iteration. DROP function is used to remove the first column, if we don't have a valid value to initialize the first column (the accumulator, acc). The name func is just a symbolic representation of the calculation required to obtain the value to put on a given column. Usually, some variables are required to be defined, so quite often the LET function is used for that.
In our case we have a valid value to initialize the iteration process (no need to use DROP function), so this pattern can be implemented as follow via our user LAMBDA function HREDUCE:
LAMBDA(id, arr, REDUCE(id, arr, LAMBDA(acc, x, HSTACK(acc, x))))
In our case the initialization value will be each unique id value. The func will be just each element of arr, because we don't need to do any additional calculation to obtain the column value.
The previous process can be applied for a given row, but we need to create iteratively each row. In order to do that we use a DROP/REDUCE/VSTACK pattern, which is a similar idea:
DROP(REDUCE("", arr, LAMBDA(acc, x, VSTACK(acc, func))),1)
Now we append rows via VSTACK. For this case we don't know how to initialize properly the accumulator (acc), so we need to use DROP to remove the first row. Now fun will be: HREDUCE(id, idDates), i.e. the LAMBDA function we created before to generate all the dates columns for a given id. Now we use a LET function to name the selected dates for a given id (idDates).
At the beginning of each row (first column), we are going to have the unique IDs (UNIQUE(IDs)). To find the corresponding dates for each unique ID (id) we use the following:
FILTER(dates, ISNUMBER(XMATCH(IDs, id)))
and name the result idDates.
Finally, we build the output including the header. We pad non existing values with the empty string to avoid having #NA values. This is the default behavior of V/HSTACK functions. We use IFERROR function for that.
IFERROR(VSTACK(HSTACK("ID", "Dates"), output), "")
Note: Both patterns are very useful to avoid Nested Array Error (#CALC!) usually produced by some of the new Excel array functions, such as BYROW, BYCOL, MAP when using TEXTSPLIT for example. This is one of the effective ways to overcome it.

Why am I getting a change in list elements but not the subtracted value of that element, when using for loop and print(my_list[i-1])

I have recently started learning python and am currently on fundamentals so please accept my excuse if this question sounds silly. I am a little confused with the indexing behavior of the list while I was learning the bubble sort algorithm.
For example:
code
my_list = [8,10,6,2,4]
for i in range(len(my_list)):
print(my_list[i])
for i in range(len(my_list)):
print(i)
Result:
8
10
6
2
4
0
1
2
3
4
The former for loop gave elements of the list (using indexing) while the latter provided its position, which is understandable. But when I'm experimenting with adding (-1) i.e. print (my_list[i-1]) and print(i-1) in both the for loops, I expect -1 to behave like a simple negative number and subtract a value from the indexed element in the first for loop i.e. 8-1=7
Rather, it's acting like a positional indicator of the list of elements and giving the last index value 4.
I was expecting this result from the 2nd loop. Can someone please explain to me why the print(my_list[i-1]) is actually changing the list elements selection but not actually subtracting value 1 from the list elements itself i.e. [8(-1), 10(-1), 6(-1)...
Thank you in advance.

The list index in the expression my_list[i-1] is the part between the brackets, i.e. i-1. So by subtracting in there, you are indeed modifying the index. If instead you want to modify the value in the list, that is, what the index is pointing at, you would use my_list[i] - 1. Now, the subtraction comes after the retrieval of the list value.

Here when you are trying to run the first for loop -
my_list = [8,10,6,2,4]
for i in range(len(my_list)):
print(my_list[i-1])
Here in the for loop you are subtracting the index not the Integer at that index number. So for doing that do the subtraction like -
for i in range(len(my_list)):
print(my_list[i]-1)
and you were getting the last index of the list because the loop starts with 0 and you subtracted 1 from it and made it -1 and list[-1] always returns the last index of the list.
Note: Here it is not good practice to iterate a list through for loop like you did above. You can do this by simply by -
for i in my_list:
print(i-1)
The result will remain the same with some conciseness in the code

Pyspark conditionally replace value in column with value from another column

I am working with some weather data that is missing some values (indicated via value code). For example, if SLP data is missing, it is assigned code 99999. I was able to use a window function to calculate a 7 day average and save it as a new column. A significantly reduced example of a single row is shown below:
SLP_ORIGIN
SLP_ORIGIN_7DAY_AVG
99999
11945.823516044207
I'm trying to write code such that when SLP_ORIGIN has the missing code it gets replaced using the SLP_ORIGIN_7DAY_AVG value. However, most code explains how to replace a column value based on a conditional with a constant value, not the column value. I tried using the following:
train_impute = train.withColumn("SLP_ORIGIN", \
when(train["SLP_ORIGIN"] == 99999, train["SLP_ORIGIN_7DAY_AVG"]).otherwise(train["SLP_ORIGIN"]))
where the dataframe is called train.
When I perform a count on the SLP_ORIGIN column using train.where("SLP_ORIGIN = 99999").count() I get the same count from before I attempted replacing the value in that column. I have already checked and my SLP_ORIGIN_7DAY_AVG does not have any values that match the missing code.
So how do I actually replace the 99999 values in the SLP_ORIGIN column with the associated SLP_ORIGIN_7DAY_AVG value?
EVEN BETTER, is there a way to do this replacement and window calculation without making a 7 day average column (I have other variables I need to do the same thing with so I'm hoping there is a more efficient way to do this).

Make sure to double check with dataframe you are verifying on.
I was using train.where("SLP_ORIGIN = 99999").count() when I should have been using train_impute.where("SLP_ORIGIN = 99999").count()
Additionally, instead of making a whole new column to store the imputed 7 day average, one can only calculate the average when the missing value code is present:
train = train.withColumn("SLP_ORIGIN", when(train["SLP_ORIGIN"] == 99999, f.avg('SLP_ORIGIN').over(w)).otherwise(train["SLP_ORIGIN"]))\

How to sum data and join columns using Pandas

I want to sum values in data
from this:
to this:
but the problem is, the data doesn't add up but they like 11, 1111
here is my code:
df_data.insert(loc=2, column='Jumlah', value='1')
df_data.pivot_table(index='Kecamatan', columns='Status', aggfunc='sum', fill_value=0)
And how I can make the columns only KECAMATAN NEGATIF ODP ODPSELESAI OTG PDP
Thank you guys.

Note that the value of the inserted column (Jumlah) is a string.
In the next instruction you attempt to generate a pivot_table summing
this column.
But if you attempt to sum text values, it actually means concatenation.
To put things right, change the first instruction to:
df_data.insert(loc=2, column='Jumlah', value=1)
i.e. remove apostrophes surrounding 1.
Then this column will be of int type and will be summed as you wish.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

built in method like pandas.sub() that does row wise subtraction - python-3.x

Related

Comparing two columns and their values and outputting the greater value

how to transform a table in Excel from vertical to horizontal but with different length

Why am I getting a change in list elements but not the subtracted value of that element, when using for loop and print(my_list[i-1])

Pyspark conditionally replace value in column with value from another column

How to sum data and join columns using Pandas

Categories

Resources