Replace consecutive identical values from excel column using python dataframes

Replace consecutive identical values from excel column using python dataframes - python-3.x

As seen in the first table with fact values beneath units are mentioned and the consecutive "Numbered" values should be replaced with blank preserving text values and boolean values in column.
The required output can be something similar as follow:

Try using mask and shift:
print(df.mask(df.shift() == df).fillna(''))

Related

how to sum columns using column headers and bridge tables in excel

I have two sets of data in excel, set 1 is the raw data, and set 2 is a bridge table. The desired output is also added. How should I prepare for this formula.
set 1:
set 2:
output expected:

Here, a solution that assumes a variable number of headers and no specific pattern in the column names. Assumed no Excel version constraints as per tags listed in the question. In cell H1, put the following formula which spills the entire result all at once:
=LET(in, A1:F5, lk, A8:B12, header, DROP(TAKE(in,1),,1), A, TAKE(lk,,1),
B, DROP(lk,,1), data, DROP(in,1,1), REDUCE(TAKE(in,,1), UNIQUE(B),
LAMBDA(ac,bb, LET(f, FILTER(A, B=bb),values, CHOOSECOLS(data,XMATCH(f, header)),
sum, MMULT(values, SEQUENCE(ROWS(f),,1,0)), HSTACK(ac, VSTACK(bb, sum))))))
Here it the output:
We use LET function with two input ranges only: in, lk, so the rest of the names defined depend on such range names. It makes the formula easy to maintain and to adapt to your real scenario.
Using DROP and TAKE we extract each portion of the input ranges: header, data, A, B (columns from the second table). We use REDUCE/HSTACK pattern to concatenate the column of the result on each iteration. Check my answer from the question: how to transform a table in Excel from vertical to horizontal but with different length for more information.
We iterate by unique values of B and for each value (bb) we select the column A values (f). We use XMATCH to select the corresponding index columns from header (it doesn't include the date column). We use CHOOSECOOLS to select the corresponding columns from data (values). Now we need to sum by column, and we use MMULT for that. The result is in sum name. Finally, we use HSTACK to concatenate the selected columns one each iteration, including as header the unique values from B.
Note: Instead of MMULT function, you can use the following array function, it is a matter of personal preferences:
BYROW(values, LAMBDA(x, sum(x)))

You could try SUMIFS with the wild card character for each row. For example, for the first column, put the following formula and drag it down.
=SUMIFS($B2:$F2,$B$1:$F$1,"=A*")
Then do the same thing for the other columns, e.g. for column B:

Five random items from a a list into a single cell separated by a comma

I have n number of unique values in n cells in Column A. (For ex: EDN12, EDN122, EDN991, ....)
I want to return any five unique values without repetition in a random order from Column A into an individual cell n times separated by a comma. For example; (EDN12, EDN112, EDN991, EDN881, EDN12)
How do I achieve this?
I have tried this formula provided here (Return a random order of a list into a single cell )
=TEXTJOIN(",",,INDEX($A$1:$A$5,UNIQUE(RANDARRAY(1000,1,1,5,TRUE))))
But it only generates five values for starting five cells in column A and rest are omitted.

Assuming values in column A are unique on their own, try:
=LET(x,TOCOL(A:A,3),TEXTJOIN(", ",,TAKE(SORTBY(x,RANDARRAY(COUNTA(x))),5)))
Otherwise just nest 'x' in UNIQUE():
=LET(x,UNIQUE(TOCOL(A:A,3)),TEXTJOIN(", ",,TAKE(SORTBY(x,RANDARRAY(COUNTA(x))),5)))

This is an alternate formula to get the required results without using LET.
Although I prefer the solution using the LET function.
=INDEX(A3:A22,INDEX(UNIQUE(RANDARRAY(COUNTA(A3:A22),1,1,COUNTA(A3:A22))),SEQUENCE(5)))
Breaking it down:
Get an array of random numbers based on the number of data rows.
=RANDARRAY(COUNTA(A3:A22),1,1,COUNTA(A3:A22),TRUE)
Extract the unique values from the array of random numbers.
=UNIQUE(C3#)
Extract the first five unique values
=INDEX(D3#,SEQUENCE(5))
Use the extracted values to extract matching rows from the source data.
=INDEX(A3:A22,E3#)
Finally join the values into a single cell.
=TEXTJOIN(", ",TRUE,F3#)
If your list of data is very short, then it can return non-unique values.
Although your example appears to have at least 1000 data rows, so it will not be a problem.

Loop through a column and vstack to a new single column using formula

I have a column of strings, example below. Each string is a delimited combinations of texts. Each row has different number of texts. I want to create a single column with one text per row based on this column.
FROM:
a;b
x;y;z
p;q;r;s;t
TO:
a
b
x
y
z
p
q
r
s
t
How do I achieve this using a single formula?
I tried TRANSPOSE(TEXTSPLIT(TEXTJOIN(";",TRUE, data),";"))
However, this fails because the TEXTJOIN part results in more than 32767 characters.
I also tried building a 2D array of mxn where m=no. of rows in the original data and n=no. of texts. However, the MAKEARRAY still results in a single column. Had it worked, I would have used TOCOL or something similar to convert to a single column.
=MAKEARRAY(ROWS(data),COLUMNS(MAX(num_of_texts_in_each_row)), LAMBDA(r,c, LET(
drow, INDEX(data,r,1),
splits, TEXTSPLIT(drow,";"),
INDEX(splits,,c)
)))

Another approach would be by using REDUCE:
=DROP(REDUCE(0,A1:A3,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,,";")))),1)
REDUCE behaves like a BYROW, where the VSTACK stacks the spilled result per row on top of eachother after the full spill value.
As it starts at 0, we use DROP the first value to get the desired result.
We could also avoid DROP, but that makes the formula more complicated and longer, but for reference:
=REDUCE(TEXTSPLIT(A1,,";"),A2:A3,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,,";"))))

Use:
=LET(
rng,A1:A3,
clm,MAX(BYROW(rng,LAMBDA(a,COUNTA(TEXTSPLIT(a,";"))))),
TOCOL(MAKEARRAY(ROWS(rng),clm,LAMBDA(a,b,INDEX(TEXTSPLIT(INDEX(A1:A3,a),";"),b))),3))

how to specify cell value according to its content in sum formula

I have a column contains some texts as follow:
one
two
three
four
I want to sum the values this column cells according to their content, so I should check the content then return a value, as if(cell = one) then 1
so the sum result should be 1+2+3+4 = 10
I tried to do a formula like =SUM(IF(A1=apartment,1),...) but its absolutely wrong.
how can I write this formula?

You can also do:-
=SUMPRODUCT((A1:A10={"One","two","three","four"})*{1,2,3,4})
This builds up a 2d array where the rows correspond to your data and the columns correspond to the strings "one","two","three" and "four". The elements are set 'true' only where the data matches one of the four strings. Then this array is multiplied by the row of numbers 1,2,3 and 4. 'TRUE' counts as 1 in the multiplication and 'FALSE' counts as 0.

Count the words and multiple by the associated values:
=COUNTIF(A:A,"one")+2*COUNTIF(A:A,"two")+3*COUNTIF(A:A,"three")+4*COUNTIF(A:A,"four")+5*COUNTIF(A:A,"five")
You can extend this formula by adding more terms if necessary, or use a VLOOKUP() table.

Return Data Set When Rows In Column Does Not Match

I have have two data sets which I need to compare. There is a column that is the common identifier between the two, but the 2nd data set, which is updated, has more than the 1st data set.
Here is how I extracted the data sets that I need:
What I'm trying to do is use columns D/I as the key, then see if columns C/H match. If they do not match I want that data returned or just highlighted.
I'm not very familiar with Excel, but I see the issue, in addition to what I described above, as being since the 2nd data set has more rows, the it will return those as highlighted, which it doesn't need to.
Any help would be great!

If I understood your problem correctly, you may try
=C2=INDEX(H:H,MATCH(D2,I:I,0))
and extend / drag this formula to check for more values in D column.
This formula results like this:
This formula compares values in D with values in I column and then compares corresponding C and H values and returns True when they match otherwise returns False.
In other words: This formula checks if a pair of Cx-Dx exactly matches with pair Hy-Iy where x and y are not necessarily equal.
E.g. (refer above screenshot)
C2-D2 matches with H2-I2
C3-D3 matches with H4-I4
C4-D4 matches with H3-I3
and C5-D5 matches with no pair in H:I range.

You can also use COUNTIFS either in a separate column or conditional formatting:-
=COUNTIFS($I:$I,$D2,$H:$H,"<>"&$C2)
to highlight the first two columns and
=COUNTIFS($D:$D,$I2,$C:$C,"<>"&$H2)
to highlight the second two columns.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace consecutive identical values from excel column using python dataframes - python-3.x

As seen in the first table with fact values beneath units are mentioned and the consecutive "Numbered" values should be replaced with blank preserving text values and boolean values in column. The required output can be something similar as follow:

Try using mask and shift: print(df.mask(df.shift() == df).fillna(''))

Related

how to sum columns using column headers and bridge tables in excel

Five random items from a a list into a single cell separated by a comma

Loop through a column and vstack to a new single column using formula

how to specify cell value according to its content in sum formula

Return Data Set When Rows In Column Does Not Match

Categories

Resources