SSRS Filter expression date OR date - visual-studio-2012

I have a dataset in SSRS 2010. It returns a value for Start_Date and End_Date. I need to filter the dataset with the following:
Start_Date >= 05/04/2012 or Start_Date < 04/04/2013 or End_Date < 04/04/2013
I have tried:
Expression:=(start_date >= '05/04/2012') or (start_date < '04'04'2013') or (end_date < '05/04/2013')
Operator: =
Value: =True
but unfortunately it does not apply the filter selection.
Can anyone help please?

I am assuming you want reocords between 05/04/2012 and ' 04/04/2013' for start date or end date < 04/04/2013. If that's what you want the records which have Start_Date between 05/04/2012 and ' 04/04/2013' then you are using the wrong operator OR. You should use And or Between so that you can get the data between that 2 dates. More info on logical operator here..
UPDATE
If you want Use OR in filter then use as below
Expression: =(Fields!Start_Date.Value >= 05/04/2012) OR (Fields!Start_Date.Value < 04/04/2013 ) OR (Fields!End_Date.Value < 04/04/2013)
Operator: =
Value: =TRUE

Related

How can I create a column that flags when another datetime column has changed date?

How can I create a column 'Marker that flags (0 or 1) when another datetime column 'DT' has changed date?
df = pd.DataFrame()
df['Obs']=float_array
df['DT'] = pd.to_datetime(datetime_array)
df['Marker'] = 0
print(type(df['DT'].dt))
<class 'pandas.core.indexes.accessors.DatetimeProperties'>
df['Marker'] = df.where(datetime.date(df.DT.dt) == datetime.date(df.DT.shift(1).dt),1)
TypeError: an integer is required (got type DatetimeProperties)
Use Series.dt.date for convert to dates and for convert True/False to 1/0 is used Series.view:
df['Marker'] = (df.DT.dt.date == df.DT.dt.date.shift()).view('i1')
Or numpy.where:
df['Marker'] = np.where(df.DT.dt.date == df.DT.dt.date.shift(), 0, 1)

Set days since first occurrence based on multiple columns

I have a pandas dataset with this structure:
Date datetime64[ns]
Events int64
Location object
Day float64
I've used the following code to get the date of the first occurrence for location "A":
start_date = df[df['Location'] == 'A'][df.Events != 0].iat[0,0]
I now want to update all of the records after the start_date with the number of days since the start_date, where Day = df.Date - start_date.
I tried this code:
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).days
However, that code returns an error:
AttributeError: 'Series' object has no attribute 'days'
The problem seems to be that the code recognizes df.Date as an object instead of a datetime. Anyone have any ideas on what is causing this problem?
Try, you need to add the .dt accessor.
df.loc[df.Location == country, 'Day'] = (df.Date - start_date).dt.days

Athena query results show null values despite is not null condition in query

I have the following query which I run in Athena. I would like to receive all the results that contain a tag in the 'resource_tags_aws_cloudformation_stack_name'. However, when I run the query my results show me rows where the 'resource_tags_aws_cloudformation_stack_name' is empty and I don't know what I am doing wrong.
SELECT
cm.line_item_usage_account_id,
pr.line_of_business,
cm.resource_tags_aws_cloudformation_stack_name,
SUM(CASE WHEN cm.line_item_product_code = 'AmazonEC2'
THEN line_item_unblended_cost * 0.97
ELSE cm.line_item_unblended_cost END) AS discounted_cost,
CAST(cm.line_item_usage_start_date AS DATE) AS start_day
FROM cost_management cm
JOIN prod_cur_metadata pr ON cm.line_item_usage_account_id = pr.line_item_usage_account_id
WHERE cm.line_item_usage_account_id IN ('1234504482')
AND cm.resource_tags_aws_cloudformation_stack_name IS NOT NULL
AND cm.line_item_usage_start_date
BETWEEN date '2020-01-01'
AND date '2020-01-30'
GROUP BY cm.line_item_usage_account_id,pr.line_of_business, cm.resource_tags_aws_cloudformation_stack_name, CAST(cm.line_item_usage_start_date AS DATE), pr.line_of_business
HAVING sum(cm.line_item_blended_cost) > 0
ORDER BY cm.line_item_usage_account_id
I modified my query to exclude ' ' and that seems to work:
SELECT
cm.line_item_usage_account_id,
pr.line_of_business,
cm.resource_tags_aws_cloudformation_stack_name,
SUM(CASE WHEN cm.line_item_product_code = 'AmazonEC2'
THEN line_item_unblended_cost * 0.97
ELSE cm.line_item_unblended_cost END) AS discounted_cost,
CAST(cm.line_item_usage_start_date AS DATE) AS start_day
FROM cost_management cm
JOIN prod_cur_metadata pr ON cm.line_item_usage_account_id = pr.line_item_usage_account_id
WHERE cm.line_item_usage_account_id IN ('1234504482')
AND NOT cm.resource_tags_aws_cloudformation_stack_name = ' '
AND cm.line_item_usage_start_date
BETWEEN date '2020-01-01'
AND date '2020-01-30'
GROUP BY cm.line_item_usage_account_id,pr.line_of_business, cm.resource_tags_aws_cloudformation_stack_name, CAST(cm.line_item_usage_start_date AS DATE), pr.line_of_business
HAVING sum(cm.line_item_blended_cost) > 0
ORDER BY cm.line_item_usage_account_id
You can try space use case as below
AND Coalesce(cm.resource_tags_aws_cloudformation_stack_name,' ') !=' '
Or if you have multiple spaces try. The below query is not good if spaces required in actual data
AND Regexp_replace(cm.resource_tags_aws_cloudformation_stack_name,' ') is not null
Adding to this you may also have special char like CR or LF in data. Although its rare scenario

pyspark sql: how to count the row with mutiple conditions

I have a dataframe like this after some operations;
df_new_1 = df_old.filter(df_old["col1"] >= df_old["col2"])
df_new_2 = df_old.filter(df_old["col1"] < df_old["col2"])
print(df_new_1.count(), df_new_2.count())
>> 10, 15
I can find the number of rows individually like above by calling count(). But how can I do this using pyspark sql row operation. i.e aggregating by row. I want to see the result like this;
Row(check1=10, check2=15)
Since you tagged pyspark-sql, you can do the following:
df_old.createOrReplaceTempView("df_table")
spark.sql("""
SELECT sum(int(col1 >= col2)) as check1
, sum(int(col1 < col2)) as check2
FROM df_table
""").collect()
Or use the API functions:
from pyspark.sql.functions import expr
df_old.agg(
expr("sum(int(col1 >= col2)) as check1"),
expr("sum(int(col1 < col2)) as check2")
).collect()

Filter on month and date irrespective of year in python

I have a column of data one of them being a date and am expected to drop the rows that have leap dates. It is a range of years so I was hoping to drop any that matched the 02-29 filter.
The one way I used is to add additional columns, extract the month and date separately and then filter on the data as shown below. It serves the purpose but obviously not good from an efficiency perspective
df['Yr'], df['Mth-Dte'] = zip(*df['Date'].apply(lambda x: (x[:4], x[5:])))
df = df[df['Mth-Dte'] != '02-29']
Is there a better way to implement this by directly applying the filter on the column in the dataframe?
Adding the data
ID Date
22398 IDM00096087 1/1/2005
22586 IDM00096087 1/1/2005
21790 IDM00096087 1/2/2005
21791 IDM00096087 1/2/2005
14727 IDM00096087 1/3/2005
Thanks in advance
Convert to datetime and use boolean mask.
import pandas as pd
data = {'Date': {14727: '1/3/2005',
21790: '1/2/2005',
21791: '1/2/2005',
22398: '1/1/2005',
22586: '29/2/2008'},
'ID': {14727: 'IDM00096087',
21790: 'IDM00096087',
21791: 'IDM00096087',
22398: 'IDM00096087',
22586: 'IDM00096087'}}
df = pd.DataFrame(data)
Option1, convert + dt:
df.Date = pd.to_datetime(df.Date)
# Filter away february 29
df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))] # ~ for not equal to
Option2, convert + strftime:
df.Date = pd.to_datetime(df.Date)
# Filter away february 29
df[df.Date.dt.strftime('%m%d') != '0229']
Option3, without conversion:
mask = pd.to_datetime(df.Date).dt.strftime('%m%d') != '0229'
df[mask]

Resources