Spotfire calculate the difference of values in a single column based on date for each group - calculated-columns

I have a requirement of calculating the difference in Spotfire for values in a single column based on the occurrence.
the data looks like
ID
Value
Date
1
7
07/01/2021
1
8
09/01/2021
1
10
10/01/2021
1
15
11/01/2021
1
6
12/01/2021
1
3
15/01/2021
2
10
07/01/2021
2
11
08/01/2021
2
12
09/01/2021
the expected output is
ID
Value
Date
Flag
1
7
07/01/2021
True
1
8
09/01/2021
1
10
10/01/2021
1
15
11/01/2021
1
6
12/01/2021
1
3
15/01/2021
2
10
07/01/2021
False
2
11
08/01/2021
2
12
09/01/2021
the logic is
we need to find the flag by comparing the value of first received and latest received for each id.
for id 1 first received value is 7
for id 1 latest received value is 3
is 7>3 True .
for easy understanding i had sorted the id column.
Thanks in Advance.

Hi I think you can do this in two step:
First step :
Concatenate(String([Value])) OVER ([ID])
Will say you name this column : test
Which will give you (7,8,10,15,6,3)
Second step :
Integer(left([test],1)) > Integer(right([test],1))
ID
Value
Date
test
flag
1
7
07/01/2021
7, 8, 10, 15, 6, 3
True
1
8
09/01/2021
7, 8, 10, 15, 6, 3
True
1
10
10/01/2021
7, 8, 10, 15, 6, 3
True
1
15
11/01/2021
7, 8, 10, 15, 6, 3
True
1
6
12/01/2021
7, 8, 10, 15, 6, 3
True
1
3
15/01/2021
7, 8, 10, 15, 6, 3
True
2
10
07/01/2021
10, 11, 12
False
2
11
08/01/2021
10, 11, 12
False
2
12
09/01/2021
10, 11, 12
False

Related

how to change rows to column in python

I want to convert my dataframe rows to column and take last value of last column.
here is my dataframe
df=pd.DataFrame({'flag_1':[1,2,3,1,2,500],'dd':[1,1,1,7,7,8],'x':[1,1,1,7,7,8]})
print(df)
flag_1 dd x
0 1 1 1
1 2 1 1
2 3 1 1
3 1 7 7
4 2 7 7
5 500 8 8
df_out:
1 2 3 1 2 500 1 1 1 7 7 8 8
Assuming you want a list as output, you can mask the initial values of the list column and stack:
import numpy as np
out = (df
.assign(**{df.columns[-1]: np.r_[[pd.NA]*(len(df)-1),[df.iloc[-1,-1]]]})
.T.stack().to_list()
)
Output:
[1, 2, 3, 1, 2, 500, 1, 1, 1, 7, 7, 8, 8]
For a wide dataframe with a single row, use .to_frame().T in place of to_list() (here with a MultiIndex):
flag_1 dd x
0 1 2 3 4 5 0 1 2 3 4 5 5
0 1 2 3 1 2 500 1 1 1 7 7 8 8

How to aggregate n previous rows as list in Pandas DataFrame?

As the title says:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9,10])
Having a dataframe with 10 values we want to aggregate say last 5 rows and put them as list into a new column:
>>> a new_col
0
0 1
1 2
2 3
3 4
4 5 [1,2,3,4,5]
5 6 [2,3,4,5,6]
6 7 [3,4,5,6,7]
7 8 [4,5,6,7,8]
8 9 [5,6,7,8,9]
9 10 [6,7,8,9,10]
How?
Due to how rolling windows are implemented, you won't be able to aggregate the results as you expect, but we still can reach your desired result by iterating each window and storing the values as a list of values:
>>> new_col_values = [
window.to_list() if len(window) == 5 else None
for window in df["column"].rolling(5)
]
>>> df["new_col"] = new_col_values
>>> df
column new_col
0 1 None
1 2 None
2 3 None
3 4 None
4 5 [1, 2, 3, 4, 5]
5 6 [2, 3, 4, 5, 6]
6 7 [3, 4, 5, 6, 7]
7 8 [4, 5, 6, 7, 8]
8 9 [5, 6, 7, 8, 9]
9 10 [6, 7, 8, 9, 10]

Check the missing integers from a range in Python

Given a building infos dataframe as follows:
id floor type
0 1 13 office
1 2 12 office
2 3 9 office
3 4 9 office
4 5 7 office
5 6 6 office
6 7 9 office
7 8 5 office
8 9 5 office
9 10 5 office
10 11 4 retail
11 12 3 retail
12 13 2 retail
13 14 1 retail
14 15 -1 parking
15 16 -2 parking
16 17 13 office
I want to check if in the column floor, there are missing floors (except for floor 0, which is by default not existing).
Code:
set(df['floor'])
Out:
{-2, -1, 1, 2, 3, 4, 5, 6, 7, 9, 12, 13}
For example, for the dataset above (-2, -1, 1, 2, ..., 13), I want to return an indication floor 8, 10, 11 are missing in your dataset. Otherwise, just returns no missing floor in your dataset.
How could I do that in Pandas or Numpy? Thanks a lot for your help at advance.
Use np.setdiff1d for difference with range created np.arange and omitted 0:
arr = np.arange(df['floor'].min(), df['floor'].max() + 1)
arr = arr[arr != 0]
out = np.setdiff1d(arr, df['floor'])
out = ('no missing floor in your dataset'
if len(out) == 0
else f'floor(s) {", ".join(out.astype(str))} are missing in your dataset')
print (out)
floor(s) 8, 10, 11 are missing in your dataset

pd.Series(pred).value_counts() how to get the first column in dataframe?

I apply pd.Series(pred).value_counts() and get this output:
0 2084
-1 15
1 13
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
dtype: int64
When I create a list I get only the second column:
c_list=list(pd.Series(pred).value_counts()), Out:
[2084, 15, 13, 10, 7, 4, 3, 3, 3, 2, 2, 2, 2]
How do I get ultimately a dataframe that looks like this including a new column for size% of total size?
df=
[class , size ,relative_size]
0 2084 , x%
-1 15 , y%
1 13 , etc.
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
You are very nearly there. Typing this in the blind as you didn't provide a sample input:
df = pd.Series(pred).value_counts().to_frame().reset_index()
df.columns = ['class', 'size']
df['relative_size'] = df['size'] / df['size'].sum()

Create a dataset from two different dataset

I have two different data set:
1. state VDM MDM OM
AP 1 2 5
GOA 1 2 1
GU 1 2 4
KA 1 5 1
2. Attribute:Value Support Item
VDM:1 4 1
VDM:2 0 2
VDM:3 0 3
VDM:4 0 4
VDM:5 0 5
MDM:1 0 6
MDM:2 3 7
MDM:3 0 8
MDM:4 0 9
MDM:5 1 10
OM:1 2 11
OM:2 0 12
OM:3 0 13
OM:4 1 14
OM:5 1 15
The first dataset only contains 1-5 values.
The second dataset holds the Attribute:Value pair and it's occurrences and a sequence number (Item).
I want a Dataset which looks like below:
state Item Number
AP 1, 7, 15
GOA 1, 7, 11
GU 1, 7, 14
KA 1, 10, 11
None of these are really appealing to me. But sometimes you just have to thrash about to get your data munged.
Attempt #0
a = dict(zip(df2['Attribute:Value'], df2['Item']))
cols = ['VDM', 'MDM', 'OM']
b = {
'Item Number':
[', '.join([str(a[f'{c}:{t._asdict()[c]}']) for c in cols]) for t in df1.itertuples()]
}
df1[['state']].assign(**b)
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #1
a = dict(zip(df2['Attribute:Value'], df2['Item'].astype(str)))
d1 = df1.set_index('state').astype(str)
r1 = (d1.columns + ':' + d1).replace(a) # Thanks #anky_91
# r1 = (d1.columns + ':' + d1).applymap(a.get)
r1
VDM MDM OM
state
AP 1 7 15
GOA 1 7 11
GU 1 7 14
KA 1 10 11
Then
pd.DataFrame({'state': r1.index, 'Item Number': [*map(', '.join, zip(*map(r1.get, r1)))]})
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #2
a = dict(zip(df2['Attribute:Value'], df2['Item'].astype(str)))
cols = ['VDM', 'MDM', 'OM']
b = {
'Item Number':
[*map(', '.join, zip(*[[a[f'{c}:{i}'] for i in df1[c]] for c in cols]))]
}
df1[['state']].assign(**b)
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #3
from itertools import cycle
a = dict(zip(zip(*df2['Attribute:Value'].str.split(':').str), df2['Item'].astype(str)))
d = df1.set_index('state')
b = {
'Item Number':
[*map(', '.join, zip(*[map(a.get, zip(cycle(d), np.ravel(d).astype(str)))] * 3))]
}
df1[['state']].assign(**b)
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #4
a = pd.Series(dict(zip(
zip(*df2['Attribute:Value'].str.split(':').str),
df2.Item.astype(str)
)))
df1.set_index('state').stack().astype(str).groupby(level=0).apply(
lambda s: ', '.join(map(a.get, s.xs(s.name).items()))
).reset_index(name='Item Number')
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Here is another approach using stack, map and unstack:
s = df.set_index('state').stack()
s_map = df2.set_index(['Attribute:Value'])['Item']
s.loc[:] = (s.index.get_level_values(1) + ':' + s.astype(str)).map(s_map)
s.unstack().astype(str).apply(', '.join, axis=1).reset_index(name='Item Number')
[out]
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
I feel like this is merge and pivot problem
s=df2['Attribute:Value'].str.split(':',expand=True).assign(Item=df2.Item)
s[1]=s[1].astype(int)
s1=df1.melt('state')
s1.merge(s,right_on=[0,1],left_on=['variable','value']).pivot('state','variable','Item')
Out[113]:
variable MDM OM VDM
state
AP 7 15 1
GOA 7 11 1
GU 7 14 1
KA 10 11 1

Resources