I have a dataframe such as :
Name Position Value
A 1 10
A 2 11
A 3 10
A 4 8
A 5 6
A 6 12
A 7 10
A 8 9
A 9 9
A 10 9
A 11 9
A 12 9
and I woulde like for each interval of 3 position, to calculate the mean of Values.
And create a new df with start and end coordinates (of length 3 then), with the Mean_value column.
Name Start End Mean_value
A 1 3 10.33 <---- here this is (10+11+10)/3 = 10.33
A 4 6 8.7
A 7 9 9.3
A 10 13 9
Does someone have an idea using pandas please ?
Solution for get each 3 rows (if exist) per Name groups - first get counter by GroupBy.cumcount with integer division and pass it to named aggregations:
g = df.groupby('Name').cumcount() // 3
df = df.groupby(['Name',g]).agg(Start=('Position','first'),
End=('Position','last'),
Value=('Value','mean')).droplevel(1).reset_index()
print (df)
Name Start End Value
0 A 1 3 10.333333
1 A 4 6 8.666667
2 A 7 9 9.333333
3 A 10 12 9.000000
x=[]
y1=[]
r1=len(df)
L1=len(df.columns)
for i in range(r1):
ll=(df.loc[i,'LL'])
ul=(df.loc[i,'UL'])
count1 =0
for j in range(5,L1):
if isinstance(df.iloc[i,j],str):
df.loc[i,j]=0
if ll<=df.iloc[i,j]<=ul:
count1=count1+1
if count1==(L1-5):
x.append('Pass')
else:
x.append('Fail')
y1.append(count1)
se = pd.Series(x)
se1=pd.Series(y1)
df['Min']=min1.values
df['Mean']=mean1.values
df['Median']=median1.values
df['Max']=max1.values
df['Pass Count']=se1.values
df['Result']=se.values
min1 = df.iloc[:,5:].min(axis=1)
mean1=df.iloc[:,5:].astype(float).mean(axis=1,skipna = True)
median1=df.iloc[:,5:].astype(float).median(axis=1,skipna = True)
max1=df.iloc[:,5:].max(axis=1)
count1=df.iloc[:,5:].count(axis=1)
yield1=[]
for i in range(len(se1)):
yd1=(se1[i]/(L1-3))*100
yield1.append(yd1)
se2=pd.Series(yield1)
df['Yield']=se2.values
df1=df.loc[:,['PARAMETER','Min','Mean','Median','Max','Result','Pass Count','Yield']]
df1
Below is my data set, it is sensor data on daily basis. Daily data should be within the Lower Limit (LL) and Upper Limit(UL). I want to count how many days sensors data is within the LL and UL.
I am not able to calculate the number of days for sensor data within LL and UL using Pandas. How can I calculate the number of days for sensor data within LL and UL?
Take a few key ideas
need a list of the columns that go into calc daycols
transpose these columns into an array then to test, gives a boolean array
sum this boolean array and you have your desired calc
df = pd.read_csv(io.StringIO("""sensor location,LL,UL,day1,day2,day3,day4,day5,day6,day7,number of days sensor data within LL and UL
A,1,10,12,6,9,4,9,7,15,5
B,1,12,4,15,7,1,11,1,7,6
C,1,15,13,13,13,10,7,13,13,7
D,1,10,12,1,14,12,15,4,4,3
E,1,20,11,15,8,14,1,14,14,7"""))
daycols = [d for i,d in enumerate(df.columns) if "day" in d and "number" not in d]
df = df.assign(
# use fact true is 1 so sum a truth array gives the answer
daysBetween=lambda dfa: ((dfa.loc[:,daycols].T>=dfa["LL"]) &
(dfa.loc[:,daycols].T<=dfa["UL"])).sum()
)
print(df.to_string(index=False))
output
sensor location LL UL day1 day2 day3 day4 day5 day6 day7 number of days sensor data within LL and UL daysBetween
A 1 10 12 6 9 4 9 7 15 5 5
B 1 12 4 15 7 1 11 1 7 6 6
C 1 15 13 13 13 10 7 13 13 7 7
D 1 10 12 1 14 12 15 4 4 3 3
E 1 20 11 15 8 14 1 14 14 7 7
speed up
It you have many columns then you can use slice capability to identify them and turn into indexes so iloc can be used. Additionally the transpose is not necessary.
dayi = [df.columns.get_loc(c) for c in df.columns[3:-1]]
df = df.assign(
# use fact true is 1 so sum a truth array gives the answer
daysBetween=lambda dfa: ((dfa.iloc[:,dayi]>=dfa["LL"]) &
(dfa.iloc[:,dayi]<=dfa["UL"])).sum()
)
I am working on graph and in need data in below format. I have data in COL A. I need to calculate COL B values as in below picture.
What is the formula for obtaining this in excel?
You can do with cumsum and shift:
# sample data
df = pd.DataFrame({'COL A': np.arange(11)})
df['COL B'] = df['COL A'].shift(fill_value=0).cumsum()
Output:
COL A COL B
0 0 0
1 1 0
2 2 1
3 3 3
4 4 6
5 5 10
6 6 15
7 7 21
8 8 28
9 9 36
10 10 45
Use simple MS technique.
You can use the formula (A3*A2)/2 for COL2
I have multiple dataframes with a different number of rows and columns respectively.
example:
df1:
a b c d
0 1 5 6
8 9 8 7
and df2:
g h
9 8
4 5
6 7
I have to append both the dataframes without a change in their dimensions.
The desired output should be one dataframe Result_df as:
a b c d
0 1 5 6
8 9 8 7
g h
9 8
4 5
6 7
Can anyone please help me to append dataframes without change in their structure.
Thank you
I have two data frames. Examples:
df1:
A B C
5 7 6
8 1 1
1 0 7
3 4 9
5 7 4
9 2 0
df2:
A B C
3 2 1
6 5 7
9 7 9
1 1 2
6 4 5
0 8 6
Both data frames have same index.
What I want is , wherever df1's value is less than 5,
I want to update df2's value to 0, else keep it same.
I tried the following code:
df2[df1<5]=0
but when I am printing df2, its showing same values as original df2.
I know I am missing something really simple.
Please help me.
Thank you.