How to generate a Dataframe whose length equals to the product of all columns lengths? - python-3.x

I am looking for a quick way to generate a long dataframe. For example, the input is:
Column "color": [1,2,3] (length: 3)
Column "weekday": [0,1] (length: 2)
The expected output is:
color weekday
1 0
2 0
3 0
1 1
2 1
3 1
And this output dataframe has the length as 2*3 = 6.
Is there a quick way to generate such dataframes based on the series as the input? And it is possible that there are many columns. Thanks.

Related

Create a new column by extracting the smallest tuple from a data frame column

I have a dataframe with a column that contains tuples. I would like to create a new column that extracts the smallest tuple from the tuple column.
What I have tried so far
mydataframe['min_values'] = mydataframe['tuple_column'].apply(lambda x: min(x))
This above approach seems to work when I have at least 2 tuples, but it fails when I only have one tuple e.g. 5 in the example below. Could you guys please suggest a method that would help me accomplish this task in a better manner?
Example and desired result
Tuple Column
New Column
(1,2,3,5)
1
(10,11)
10
(5)
5
Thanks
(5) is not a tuple, this is 5. Use numpy.min that handles scalar values as input:
import numpy as np
df['New Column'] = df['Tuple Column'].apply(np.min)
Output:
Tuple Column New Column
0 (1, 2, 3, 5) 1
1 (10, 11) 10
2 5 5
Here is a way using map()
df['Tuple Column'].map(lambda x: min(x) if isinstance(x,tuple) else x)
Output:
0 1
1 3
2 5
df1.applymap(lambda x:pd.Series(eval(x)).min())
Output:
0 1
1 3
2 5

how to get value of column2 when column 1 is greater 3 and check this value belong to which Bin

I have one dataframe with two columns , A and B . first i need to make empty bins with step 1 from 1 to 11 , (1,2),(2,3)....(10,11). then check from original dataframe if column B value greater than 3 then get value of column 'A' 2 rows before when column B is greater than 3.
Here is example dataframe :
df=pd.DataFrame({'A':[1,8.5,5.2,7,8,9,0,4,5,6],'B':[1,2,2,2,3.1,3.2,3,2,1,2]})
Required output 1:
df_out1=pd.DataFrame({'Value_A':[8.5,5.2]})
Required_output_2:
df_output2:
Bins count
(1 2) 0
(2,3) 0
(3,4) 0
(4,5) 0
(5,6) 1
(6,7) 0
(7,8) 0
(8,9) 1
(9,10) 0
(10,11) 0
You can index on a shifted series to get the two rows before 'A' satisfies some condition like
out1 = df['A'].shift(3)[df['B'] > 3]
The thing you want to do with the bins is known as a histogram. You can easily do this with numpy like
count, bin_edges = np.histogram(out1, bins=[i for i in range(1, 12)])
out2 = pd.DataFrame({'bin_lo': bin_edges[:-1], 'bin_hi': bin_edges[1:], 'count': count})
Here 'bin_lo' and 'bin_hi' are the lower and upper bounds of the bins.

Is there a way to convert my column of incrementing integers separated by zero to the number of intervals encountered so far in a pandas datafram?

I'm working in pandas and I have a column in my dataframe filled by 0s and incrementing integers starting at one. I would like to add another column of integers but that column would be a counter of how many intervals separated by zero we have encountered to this point. For example my data would like like
Index
1
2
3
0
1
2
0
1
and I would like it to look like
Index IntervalCount
1 1
2 1
3 1
0 1
1 2
2 2
0 2
1 2
Is it possible to do this with vectorized operation or do I have to do this iteratively? Note, it's not important that it be a new column could also overwrite the old one.
You can use cumsum function.
df["IntervalCount"] = (df["Index"] == 1).cumsum()

Pandas groupby value and return observation count to dataset

I have a dataset like the following:
id value
a 0
a 0
a 0
a 0
a 1
a 2
a 2
a 2
b 0
b 0
b 1
b 2
b 2
I want to groupby the "id" column and grab the number of observations in the "value" column, and return a new column in the original dataset that counts the number of times the "value" observation occurs within each id.
An example of the output I'm looking for is represented in column "output":
id value output
a 0 4
a 0 4
a 0 4
a 0 4
a 1 1
a 2 3
a 2 3
a 2 3
b 0 2
b 0 2
b 1 1
b 2 2
b 2 2
When grouping on id "a", there are 4 observations of 0, which is provided in the column "output" for each row that contains id of "a" and value of 0.
I have tried applications of groupby and apply, to no avail. Any suggestions would be very helpful. Thank you.
Update: I figured out a solution for anyone who also faces this problem, and it works well.
grouped = df.groupby(['id','value'])
df['output'] = grouped['value'].transform('count')
This will return the count of observations under each bucket and return that count to each observation that meets that criteria, as shown in the "output" column above.
group by id and and value then count value.
data.groupby(['id' , 'value'])['id'].transform('count')

How to generate all arrangements of two values in 5 columns using Excel?

I have two values: 1 and 0. And I have 5 columns. I need to generate all possible arrangements in Excel.
For example, I have 2 columns and 2 values: 0 , 1. There are only 4 possible arrangements (with repetitions):
1 | 0
0 | 1
0 | 0
1 | 1
I need to generate all posible arrangements of 1 and 0 for 5 columns. Number of possible arrangements with repetition is defined by formula: n^k.
So, for 5 columns and 2 values it is 2^5 = 32 arrangements.
In Excel:
and so on
Is it possible to automate it without typing ones and zeros manually?
You basically want to count from 0 to 31 in binary and then split the binary result out over the columns. You can do it like this:
Column A - just the number i.e. 0, 1, 2, 3, 4
Column B - =DEC2BIN(A2,5)
Columns C to G - =MID($B2,C$1,1) and then drag down and across
For example - for the formula to get the binary digit in the correct column:

Categories

Resources