How to efficiently disaggregate data from? - python-3.x

I have Google Analytics data which I am trying to disaggregate.
Below is a simplified version of the dataframe I am dealing with:
date | users | goal_completions
20150101| 2 | 1
20150102| 3 | 2
I would like to disaggregate the data such that each "user" has its own row. In addition, the third column, "goal_completions" will also be disaggregated with the assumption that each user can only have 1 "goal_completion".
The output I am seeking will be something like this:
date | users | goal_completions
20150101| 1 | 1
20150101| 1 | 0
20150102| 1 | 1
20150102| 1 | 1
20150102| 1 | 0
I was able to duplicate each row based on the number of users on a given date, however I can't seem to find a way to disaggregate the "goal_completion" column. Here is what I currently have after duplicating the "users" column:
date | users | goal_completions
20150101| 1 | 1
20150101| 1 | 1
20150102| 1 | 2
20150102| 1 | 2
20150102| 1 | 2
Any help will be appreciated - thanks!

IIUC using repeat create you dfs , then we adjust the two column by cumcount with np.where
df=df.reindex(df.index.repeat(df.users))
df=df.assign(users=1)
df.goal_completions=np.where(df.groupby(level=0).cumcount()<df.goal_completions,1,0)
df
Out[609]:
date users goal_completions
0 20150101 1 1
0 20150101 1 0
1 20150102 1 1
1 20150102 1 1
1 20150102 1 0

Related

How can I get the count of sequential events pairs from a Pandas dataframe?

I have a dataframe that looks like this:
ID EVENT DATE
1 1 142
1 5 167
1 3 245
2 1 54
2 5 87
3 3 165
3 2 178
And I would like to generate something like this:
EVENT_1 EVENT_2 COUNT
1 5 2
5 3 1
3 2 1
The idea is how many items (ID) go from one event to the next one. Don't care about previous states, I just want to consider the next state from the current state (e.g.: for ID 1, I don't want to count a transition from 1 to 3 because first, it goes to event 5 and then to 3).
The date format is the number of days from a specific date (sort of like SAS format).
Is there a clean way to achieve this?
Let's try this:
(df.groupby([df['EVENT'].rename('EVENT_1'),
df.groupby('ID')['EVENT'].shift(-1).rename('EVENT_2')])['ID']
.count()).rename('COUNT').reset_index().astype(int)
Output:
| | EVENT_1 | EVENT_2 | COUNT |
|---:|----------:|----------:|--------:|
| 0 | 1 | 5 | 2 |
| 1 | 3 | 2 | 1 |
| 2 | 5 | 3 | 1 |
Details: Groupby on 'EVENT' and shifted 'EVENT' within each ID, then count.
You could use groupby and shift. We'll also use rename_axis and reset_index to tidy up the final output:
(pd.concat([f.groupby([f['EVENT'], f['EVENT'].shift(-1).astype('Int64')]).size()
for _, f in df.groupby('ID')])
.groupby(level=[0, 1]).sum()
.rename_axis(['EVENT_1', 'EVENT_2']).reset_index(name='COUNT'))
[out]
EVENT_1 EVENT_2 COUNT
0 1 5 2
1 3 2 1
2 5 3 1

Creating A new column based on other columns' values with specific requirement in Python Dataframe

I want to create a new column in Python dataframe with specific requirements from other columns. For example, my python dataframe df:
A | B
-----------
5 | 0
5 | 1
15 | 1
10 | 1
10 | 1
20 | 2
15 | 2
10 | 2
5 | 3
15 | 3
10 | 4
20 | 0
I want to create new column C, with below requirements:
When the value of B = 0, then C = 0
The same value in B will have the same value in C. The same values in B will be classified as start, middle, and end. So for values 1, it has 1 start, 2 middle, and 1 end, for values 3, it has 1 start, 0 middle, and 1 end. And the calculation for each section:
I specify a threshold = 10.
Let's look at values B = 1 :
Start :
C.loc[2] = min(threshold, A.loc[1]) + A.loc[2]
Middle :
C.loc[3] = A.loc[3]
C.loc[4] = A.loc[4]
End:
C.loc[5] = min(Threshold, A.loc[6])
However, the output value of C will be the sum of the above calculations.
When the value of B is unique and not 0. For example when B = 4
C[10] = min(threshold, A.loc[9]) + min(threshold, A.loc[11])
I can solve point 0 and 3. But I'm struggling to solve point 2.
So, the final output will be:
A | B | c
--------------------
5 | 0 | 0
5 | 1 | 45
15 | 1 | 45
10 | 1 | 45
10 | 1 | 45
20 | 2 | 50
15 | 2 | 50
10 | 2 | 50
5 | 3 | 25
10 | 3 | 25
10 | 4 | 20
20 | 0 | 0

Looping to create a new column based on other column values in Python Dataframe [duplicate]

This question already has answers here:
How do I create a new column from the output of pandas groupby().sum()?
(4 answers)
Closed 3 years ago.
I want to create a new column in python dataframe based on other column values in multiple rows.
For example, my python dataframe df:
A | B
------------
10 | 1
20 | 1
30 | 1
10 | 1
10 | 2
15 | 3
10 | 3
I want to create variable C that is based on the value of variable A with condition from variable B in multiple rows. When the value of variable B in row i,i+1,..., the the value of C is the sum of variable A in those rows. In this case, my output data frame will be:
A | B | C
--------------------
10 | 1 | 70
20 | 1 | 70
30 | 1 | 70
10 | 1 | 70
10 | 2 | 10
15 | 3 | 25
10 | 3 | 25
I haven't got any idea the best way to achieve this. Can anyone help?
Thanks in advance
recreate the data:
import pandas as pd
A = [10,20,30,10,10,15,10]
B = [1,1,1,1,2,3,3]
df = pd.DataFrame({'A':A, 'B':B})
df
A B
0 10 1
1 20 1
2 30 1
3 10 1
4 10 2
5 15 3
6 10 3
and then i'll create a lookup Series from the df:
lookup = df.groupby('B')['A'].sum()
lookup
A
B
1 70
2 10
3 25
and then i'll use that lookup on the df using apply
df.loc[:,'C'] = df.apply(lambda row: lookup[lookup.index == row['B']].values[0], axis=1)
df
A B C
0 10 1 70
1 20 1 70
2 30 1 70
3 10 1 70
4 10 2 10
5 15 3 25
6 10 3 25
You have to use groupby() method, to group the rows on B and sum() on A.
df['C'] = df.groupby('B')['A'].transform(sum)

Add non-existent numbers with a zero value in Excel

I have lot of sequential data in Excel.
The missing number in the sequence for Column A, should be populated with zero value in column B.
What formula i could use for this?
Data in excel sheet as below:
A | B
0 | 4
1 | 6
4 | 4
6 | 7
Expected output:
A | B
0 | 4
1 | 6
2 | 0
3 | 0
4 | 4
5 | 0
6 | 7
In B1 of your second table
=IFERROR(VLOOKUP(A1,original_table,2,0),0)
and drag down. Replace original_table with a reference to your first table

Complete Truth Tables Based On Binary

I am trying to figure out names for every combination with truth tables.
In the first table, I have each truth table for a two input and one output system. The inputs are read by row. The outputs are in a binary counted format. Each Output is read by column and is labeled with a hex number 0 to F. The input by row is related to the outputs within the specified output column.
In the second table, I have listed by row how each output column on the first chart works. In each row I have listed the binary logic gate name, if statement in javascript, and a description for how each would work. I have a hyphen for spaces that are not complete.
Are there names for the blank spaces in the gate names in the second table?
Complete Truth Tables
Inputs | Outputs
1 2 | 0 1 2 3 4 5 6 7 8 9 A B C D E F
-----------------------------------------
0 0 | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 | 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 | 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 | 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Num | Gate | Javascript | Return True If
--- | ----- | ---------- | --------------
0 | - | 0 | FALSE
1 | AND | I1&&I2 | I1 AND I2
2 | - | I1&&!I2 | I1 AND NOT I2
3 | - | I1 | I1
4 | - | !I1&&I2 | I2 AND NOT I1
5 | - | I2 | I2
6 | XOR | I1!==I2 | I1 NOT EQUALS I2
7 | OR | I1||I2 | I1 OR I2
8 | NOR | !I1||!I2 | NOT I1 OR NOT I2
9 | XNOR | I1==I2 | I1 EQUALS I2
A | - | !I2 | NOT I2
B | - | !(!I1&&I2) | NOT ( I2 AND NOT I1 )
C | - | !I1 | NOT I1
D | - | !(I1&&!I2) | NOT ( I1 AND NOT I2 )
E | NAND | !I1&&!I2 | NOT I1 AND NOT I2
F | - | 1 | TRUE
Some of the other combinations have gate names, but not all do.
The A and C cases are each an example of a NOT gate, and the 3 and 5 cases are each an example of a BUFFER.
The D case is known as an IMPLY gate, but this is not as commonly known as the others.
For the rest, there are no commonly used gate names because to implement their boolean function would require either no gates (as in TRUE and FALSE), or they would require a combination of two or more of the conventional gates that you have already identified. There may be specific implementations of tools or systems that have created names for these "quasi-gates", but they are not in common use.
See Also
Logic Gate (Wikipedia)
Imply Gate (Wikipedia)

Resources