How to make a data function property dynamically changing based on filters-Spotfire? - spotfire

In Spotfire, I created a new column using R to calculate the rolling sum of events in my data. My ideal outcome is to have this column values change based on the filters being applied.
In the parameters, I see that I can limit the data based on "active filters" and markings, which is exactly what I want. I've noticed that when the Date is changed, the rolling sum calculation using the first date of the filtered data as a start point. Is there any way to avoid this?
This is my R-code in Spotfire:
library(dplyr, exclude = c("filter", "lag"))
library(zoo)
df %>%
mutate(ym = as.yearmon(Date, "%m-%d-%Y")) %>%
arrange(ym, Date, Category) %>%
group_by(Category, ym, Date) %>%
summarize(rolling = n_distinct(Events),
Num_Months = first(Num_Months), .groups = "drop") %>%
group_by(Category) %>%
mutate(rolling = rollapplyr(rolling,
1:n() - findInterval(ym-(Num_Months+1)/12, ym), sum)) %>%
ungroup %>%
arrange(ym, Category) %>%
select(Date, Category, rolling)

Related

How to find certain value similar to vlookup in excel? And how to create dataframe with for loop?

I have two dataframes both have 2 columns (2 variables).
I was thinking that I could use something similar to vlookup in excel?
And maybe I could create a new dataset using for loop and put the quotients in this dataset I don't know how exactly I could do that.
(I ALSO NEED TO PUT THE VALUES IN A DATASET so the suggested post does not answer my question completely)
example:
dataframe1
number amount
1 2
2 3
3 4
dataframe2
number amount
1 5
2 6
4 2
Assuming that you imported Dataframe1 as Dataframe1, and Dataframe2 as Dataframe2, and both are data.frame.
library(tidyverse)
Dataframe1 %>%
inner_join(Dataframe2 %>% rename(Amount2 = Amount),
by="id") -> Dataframe
At this point you can perform your operation
library(tidyverse)
Dataframe %>%
mutate(result = Amount/Amount2) -> Dataframe
and check if the column result is what you were looking for.
To find the highest ratio:
Dataframe$result %>% max(na.rm = T)
But there are many other ways to record this value; this is the most straightforward.

filter nearest row by date in pyspark

I want to filter by date on my pyspark dataframe.
I have a dataframe like this:
+------+---------+------+-------------------+-------------------+----------+
|amount|cost_type|place2| min_ts| max_ts| ds|
+------+---------+------+-------------------+-------------------+----------+
|100000| reorder| 1.0|2020-10-16 10:16:31|2020-11-21 18:50:27|2021-05-29|
|250000|overusage| 1.0|2020-11-21 18:48:02|2021-02-09 20:07:28|2021-05-29|
|100000|overusage| 1.0|2019-05-12 16:00:40|2020-11-21 18:44:04|2021-05-29|
|200000| reorder| 1.0|2020-11-21 19:00:09|2021-05-29 23:56:25|2021-05-29|
+------+---------+------+-------------------+-------------------+----------+
And I want to filter just one row for every possible ‍‍cost_type which has the nearest time to ds
for example for ds = '2021-05-29' my filter should select the second and fourth row. But for ds = '2020-05-01' should select the first and third row of my dataframe. If my ds was in the range of ‍‍‍min_ts and max_ts my filter should select that row for every cost type.
A possible way is to assign row numbers based on some conditions:
Whether ds is between min_ts and max_ts.
If not, the smaller of the absolute date difference between ds and min_ts, or between ds and max_ts.
from pyspark.sql import functions as F, Window
w = Window.partitionBy('cost_type').orderBy(
F.col('ds').cast('timestamp').between(F.col('min_ts'), F.col('max_ts')).desc(),
F.least(F.abs(F.datediff('ds', 'max_ts')), F.abs(F.datediff('ds', 'min_ts')))
)
df2 = df.withColumn('rn', F.row_number().over(w)).filter('rn = 1').drop('rn')

How to replace the null valued column with the values store in the list with the corresponding indexes in CSV using python and pandas?

Check if the value of the cell in some specific row of CLOSE DATE is blank then proceed with formul of adding 3 days to the SOLVED DATE and update the value of the cell
I'm using pandas library and jupyter Notebook as my text editor.
d is the object of my csv file
for index, row in d.iterrows():
startdate = row["SOLVED DATE"]
print(index, startdate)
enddate = pd.to_datetime(startdate) + pd.DateOffset(days=3)
row["CLOSE DATE"]=enddate
#d.iloc[index,10]=enddate
l1.append(enddate)
L1 is the list that contains the values in datetime format
and i need to replace the values of the column named "CLOSE DATE" with the values of the L1 and update my csv file accordingly
Welcome to the Stackoverflow Community!
Iterrows() is usually a slow method and should be avoided in most cases. There are a few ways we can do your task.
Making Two Dataframes = Null DF & Not Null DF and imputing values in the Null DF then merging the two
Imputing values in the Null Df itself.
As a supplementary on the logic of adding the updated date column. It is as follows.
Let's first take the "SOLVED DATE" and store it in a new series,
let's call it "new_date".
Let's Modify the "new_date" by adding 3 days.
Once done set this "new_date" as the value of the column you want to be updated.
In terms of code
# 1st Method
import pandas as pd
null = d.loc[d['CLOSE DATE'].isna() == True]
not_null = d.loc[d['CLOSE DATE'].isna() != True]
new_date = null['SOLVED DATE]
new_date = pd.to_datetime(new_date) + pd.DateOffset(days=3)
null['CLOSE DATE'] = new_date
d = pd.concat([null not_null], axis = 0)
d = d.reset_index(drop = True)
# 2nd Method
import pandas as pd
new_date = d.loc[d['CLOSE DATE'].isna() == True,'SOLVED DATE]
new_date = pd.to_datetime(new_date) + pd.DateOffset(days=3)
d['CLOSE DATE'] = d['CLOSE DATE'].fillna(new_date)

Assign values to a datetime column in Pandas / Rename a datetime column to a date column

Dataframe image
I have created the following dataframe 'user_char' in Pandas with:
## Create a new workbook User Char with empty datetime columns to import data from the ledger
user_char = all_users[['createdAt', 'uuid','gasType','role']]
## filter on consumers in the user_char table
user_char = user_char[user_char.role == 'CONSUMER']
user_char.set_index('uuid', inplace = True)
## creates datetime columns that need to be added to the existing df
user_char_rng = pd.date_range('3/1/2016', periods = 25, dtype = 'period[M]', freq = 'MS')
## converts date time index to a list
user_char_rng = list(user_char_rng)
## adds empty cols
user_char = user_char.reindex(columns = user_char.columns.tolist() + user_char_rng)
user_char
and I am trying to assign a value to the highlighted column using the following command:
user_char['2016-03-01 00:00:00'] = 1
but this keeps creating a new column rather than editing the existing one. How do I assign the value 1 to all the indices without adding a new column?
Also how do I rename the datetime column that excludes the timestamp and only leaves the date field in there?
Try
user_char.loc[:, '2016-03-01'] = 1
Because your column index is a DatetimeIndex, pandas is smart enough to translate the string '2016-03-01' into datetime format. Using loc[c] seems to hint to pandas to first look for c in the index, rather than create a new column named c.
Side note: the DatetimeIndex of time-series data is conventionally used as the (row) index of a DataFrame, not in the columns. (There's no technical reason why you can't use time in the columns, of course!) In my experience, most of the PyData stack is built to expect "tidy data", where each variable (like time) forms a column, and each observation (timestamp value) forms a row. The way you're doing it, you'll need to transpose your DataFrame before calling plot() on it, for example.

Calculating Variance and Variance% in Dax

In my PBIX File, I have measures that calculate Revenue, COGS, Gross Margin etc.
Revenue = Sum(Amt)
More measures that calculate value for Last year Revenue_LY, COGS_LY and GM_LY.
Revenue_LY = CALCULATE (
[Revenue],
FILTER (
ALL ( 'Date' ),
'Date'[FinYear]= MAX ( 'Date'[FinYear] ) - 1 && 'Date'[FinPeriod] = max('Date'[FinPeriod])
)
)
Now I need variance and variance% measures for each which compare data against last year and budget. The amount of measures is just getting too many.
Revenue_CY-LY = CALCULATE([Revenue],KEEPFILTERS(Versions[VersionCode] = "Act")) - CALCULATE([Revenue_LY],KEEPFILTERS(Versions[VersionCode] = "Act"))
Revenue_CY-LY% = IF([Revenue_CY-LY] < 0, -1, 1) *
IF(
ABS(DIVIDE([Revenue_CY-LY],[Revenue])) > 99.9,
"n/a",
ABS(DIVIDE([Revenue_CY-LY],[Revenue])*100)
)
Is there a way to summarize the measures used. I don't want to create individual measures of each variance.
Yes. You can create a dynamic measure.
First create Revenue, COGS, Gross Margin, etc. measures.
Revenue = SUM([Amt])
COGS = SUM([Cost])
Gross Margin = [Revenue] - [COGS]
...
Then you create a table with one row for each of your measures:
My Measures = DATATABLE("My Measure", STRING, {{"Revenue"}, {"COGS"}, {"Gross Margin"}})
The names don't need to align with your actual measures, but they will be displayed so make them presentable.
Then you create a measure on that table which will dynamically be the same as the selected row in the table:
Selected Measure = SWITCH(SELECTEDVALUE('My Measures'[My Measure], BLANK()), "Revenue", [Revenue], "COGS", [COGS], "Gross Margin", [Gross Margin], BLANK())
Next you go and create all the complicated time-intelligence measures using the [Selected Measure] as the base:
Dynamic_LY = CALCULATE (
[Selected Measure],
FILTER (
ALL ( 'Date' ),
'Date'[FinYear]= MAX ( 'Date'[FinYear] ) - 1 && 'Date'[FinPeriod] = max('Date'[FinPeriod])
)
)
And then you can do [Dynamic_CY-LY] and [Dynamic_CY-LY %] in a similar manner to the ones in your question, replacing references to the [Revenue] measure with references to the dynamic measures.
Now you can either use a slicer on the 'My Measures'[My Measure] column to dynamically change every instance of [Dynamic_CY-LY] and the other dynamic measures, or you can add a filter on each visualisation to filter 'My Measures'[My Measure].
It might be that you'd also like to have a default value for [Selected Measure] instead of defaulting to BLANK(); just put that in last position in the SWITCH() function.

Resources