I have a dataset in which all the columns are numbers and there are some nans I want to fill. The rows must be seen as a temporal serie, so I want to fill those nans with the average of next and previous values. Is there any way to do it in Pyspark?
Thanks!!
Related
How to compare 10 columns in the same row for identical values ? also ignore empty cells.
This formula [=if(or(a2=z2,b2=y2,c2=x2,"ok","not ok")) gives me false positives for the empty cells. I have to make sure that the items in table 1 has the same quantities in table 2.
edited to simplify the question.. i have only 3 columns in the example but in my actual problem i have over 100 columns in each table
I'm trying to apply conditional formatting to my data where I need to color the rows based on certain columns. If the current and previous rows have same data in 4 particular columns, I will color those rows. But I also need to apply this color to alternate rows.
So the result I need is like the format in the image below:
As in the sample image above, first two rows have same values in column Name1, Name2, Type_Name and Type_Code, they are colored. Then, the next row is skipped from coloring. And then the next row even if it does not have a matching row above or below, it will be colored. Then rows with Rita in Name1 are skipped.
So far I'm able to get to the rows with same values in the 4 columns and color the alternate rows, both the logics separately, but unable to apply the mix of both properly. Below are the logics applied so far.
This one, where the rows have same values in the 4 required columns, using the formula
=OR($H2&$I2&$J2&$K2 = $H1&$I1&$J1&$K1, $H2&$I2&$J2&$K2 = $H3&$I3&$J3&$K3)
And alternate rows colored with the formula
=MOD(ROW( ),2)=0
I would first add a helper column which separates the groups.
This is done by checking if the relevant columns of the row is the same as the row above. If it is, we simply take the max value of the column, if it is different, we increment the max value by 1. We can then apply the conditional formatting if this helper column is an odd value:
I am using an excel spreadsheet to keep track of some data.
I calculate median and quartile values for each column at the bottom of those columns (rows 15, 16, 17). However, I want to be able to keep adding new values into columns without changing the range of my median/quartile formula. I know I can select the whole column in my formula if I had the formula in a different cell than the column I am making calculations on.
I am wondering if there is a way to exclude the rows with my median and quartile formulas from my calculations. As I enter more data, those locations will also increase and I couldn't figure it out.
So if we know the number of blank lines to be two, before we do the median, then =median(INDIRECT(CONCATENATE("B2:B",text(row()-3,"#"))))
works, by constructing the range of column B fed to the median function to be B2 through B + whatever number is 3 up from the row where the formula sits. For the IQR presumably 3 must be increased.
I have a matrix with totals rows and totals columns as follow
matrix
I need to take these totals and calculate a value for each column within each row as follow
desired matrix
I have used the following formula
Row Total*Column Total/Overall Total
which assumes a no-zero value per each column and each row.
Any tip for this?
thank you!
Chiara
I have found that using the SUM() function to add a column of calculated values provides a different result than using the SUM() function against raw values. The following screen shots show a 1cent discrepancy between the sum of two columns that appear identical, except one is a calculated column and the other is raw data entry:
Note sum on column c is 1 cent different than column d
See the sum formula in row 22 is a simple sum
This is because the calculated values are not rounded to the 2nd decimal, they are just shown that way. But some of numbers may show 9.98 but really be 9.978.
If you want to see what I mean increase the decimal shown in those cells to four or more places.
To avoid this you can wrap the formulas in the range with =ROUND(...,2), where he ... is your formula. This will round the results to two places and then the two sums will match.