Spotfire - advanced row level security - spotfire

I'm working on row level security in Spotfire (6.5) report.
It should be implemented on 3 levels, lets call it L1, L2 and L3. There is additional mapping table that contains Userlogins and specified values on all levels where user has access. Additionaly if user is not in mapping table he is some kind of Root user so he has access to everything.
On DB side it looks like that:
CREATE TABLE SECURITY
(
USER_ID VARCHAR2(100 BYTE)
, L1 VARCHAR2(100 BYTE)
, L2 VARCHAR2(100 BYTE)
, L3 VARCHAR2(100 BYTE)
--, L1L2L3 VARCHAR2(100 BYTE) -- option there could be one column that contains lowest possible level
);
INSERT INTO SECURITY (USER_ID, L1) VALUES ('UNAME1','A');
INSERT INTO SECURITY (USER_ID, L2) VALUES ('UNAME2','BB');
INSERT INTO SECURITY (USER_ID, L3) VALUES ('UNAME3','CCC');
CREATE TABLE SECURED_DATA
(
L1 VARCHAR2(100 BYTE)
, L2 VARCHAR2(100 BYTE)
, L3 VARCHAR2(100 BYTE)
, V1 NUMBER
);
INSERT INTO SECURED_DATA (L1, V1) VALUES ('A',1);
INSERT INTO SECURED_DATA (L1, L2, V1) VALUES ('B','BB',2);
INSERT INTO SECURED_DATA (L1, L2, L3, V1) VALUES ('C','CC','CCC',3);
Finally I've made Information Link and then I've changed its' sql code to something like that:
SELECT
M.*
FROM
SECURITY S
INNER JOIN SECURED_DATA M
ON
(
M.L1 = S.L1
AND S.USER_ID = (%CURRENT_USER%)
)
UNION ALL
SELECT
M.*
FROM
SECURITY S
INNER JOIN SECURED_DATA M
ON
(
M.L2 = S.L2
AND S.USER_ID = (%CURRENT_USER%)
)
UNION ALL
SELECT
M.*
FROM
SECURITY S
INNER JOIN SECURED_DATA M
ON
(
M.L3 = S.L3
AND S.USER_ID = (%CURRENT_USER%)
)
UNION ALL
SELECT
M.*
FROM
SECURED_DATA M
WHERE
(
SELECT
COUNT(1)
FROM
SECURITY S
WHERE S.USER_ID = (%CURRENT_USER%)
)
=0
It works fine, but I'm worndering if there is more smart and more Spotfire way to get it?
Many thanks and regards,
Maciej

My guess on "more smart and more Spotfire way" is that you want to be able to cache a single data set and use it for multiple users, limiting it in the analytic rather than in the data pull. There is some danger to this, if we're doing it for security's sake, because the data will technically be in the analytic, and if they have permission to edit and add visualizations, you no longer control what they can and cannot see. If there's any authoring allowed in Web Player for the specific analytic, I recommend all securities be done DataBase-side.
If you want to do it in Spotfire anyways, here is my recommendation:
Have an Information Link (for example case, named IL_SecurityCheck) which is Select * from SECURITY WHERE S.USER_ID = (%CURRENT_USER%).
If they move from a cover page to the page with the data in it, you can put the code in the script to change pages; if not, you can use a method I explained here: Spotfire Current Date in input field with calendar popup to fire off a script on open.
Button Script required:
from Spotfire.Dxp.Data import *
crossSource = Document.Data.Tables["IL_SecurityCheck"]
rowCount = crossSource.RowCount
rowIndexSet = IndexSet(rowCount, True)
print rowCount
#rowCount = Document.Data.Tables["Managed Care UpDownStream"].RowCount
colCurs = DataValueCursor.CreateFormatted(crossSource.Columns["L1"])
colCurs2 = DataValueCursor.CreateFormatted(crossSource.Columns["L2"])
colCurs3 = DataValueCursor.CreateFormatted(crossSource.Columns["L3"])
x = ""
if rowIndexSet.IsEmpty != True:
for row in crossSource.GetRows(rowIndexSet, colCurs):
if colCurs.CurrentValue is not None:
x += "[L1] = '" + colCurs.CurrentValue + "' and "
for row in crossSource.GetRows(rowIndexSet, colCurs2):
if colCurs2.CurrentValue is not None:
x += "[L2] = '" + colCurs2.CurrentValue + "' and "
for row in crossSource.GetRows(rowIndexSet, colCurs3):
if colCurs3.CurrentValue is not None:
x += "[L3] = '" + colCurs3.CurrentValue + "' and "
x = x[:len(x) - 4]
else:
x = "1=1"
Document.Properties["SecurityLimits"] = x
Visualization Data Limited by Expression: ${SecurityLimits}

Related

MssqlRow to json string without knowing structure and data type on compile time [duplicate]

Using PostgreSQL I can have multiple rows of json objects.
select (select ROW_TO_JSON(_) from (select c.name, c.age) as _) as jsonresult from employee as c
This gives me this result:
{"age":65,"name":"NAME"}
{"age":21,"name":"SURNAME"}
But in SqlServer when I use the FOR JSON AUTO clause it gives me an array of json objects instead of multiple rows.
select c.name, c.age from customer c FOR JSON AUTO
[{"age":65,"name":"NAME"},{"age":21,"name":"SURNAME"}]
How to get the same result format in SqlServer ?
By constructing separate JSON in each individual row:
SELECT (SELECT [age], [name] FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
FROM customer
There is an alternative form that doesn't require you to know the table structure (but likely has worse performance because it may generate a large intermediate JSON):
SELECT [value] FROM OPENJSON(
(SELECT * FROM customer FOR JSON PATH)
)
no structure better performance
SELECT c.id, jdata.*
FROM customer c
cross apply
(SELECT * FROM customer jc where jc.id = c.id FOR JSON PATH , WITHOUT_ARRAY_WRAPPER) jdata (jdata)
Same as Barak Yellin but more lazy:
1-Create this proc
CREATE PROC PRC_SELECT_JSON(#TBL VARCHAR(100), #COLS VARCHAR(1000)='D.*') AS BEGIN
EXEC('
SELECT X.O FROM ' + #TBL + ' D
CROSS APPLY (
SELECT ' + #COLS + '
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) X (O)
')
END
2-Can use either all columns or specific columns:
CREATE TABLE #TEST ( X INT, Y VARCHAR(10), Z DATE )
INSERT #TEST VALUES (123, 'TEST1', GETDATE())
INSERT #TEST VALUES (124, 'TEST2', GETDATE())
EXEC PRC_SELECT_JSON #TEST
EXEC PRC_SELECT_JSON #TEST, 'X, Y'
If you're using PHP add SET NOCOUNT ON; in the first row (why?).

ATHENA/PRESTO complex query with multiple unnested tables

i have i would like to create a join over several tables.
table login : I would like to retrieve all the data from login
table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user
table meeting : calculating the Nb_of_meetings for each db & for each user
table live : calculating the Nb_of_live for each db & for each user
I have those queries with the right results :
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid;
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;
But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.
select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4
Do you have an idea ?
Regards.
The easiest way is to work with with to structure the subquery and then reference them.
with parameter reference:
You can use WITH to flatten nested queries, or to simplify subqueries.
The WITH clause precedes the SELECT list in a query and defines one or
more subqueries for use within the SELECT query.
Each subquery defines a temporary table, similar to a view definition,
which you can reference in the FROM clause. The tables are used only
when the query runs.
Since you already have working sub queries, the following should work:
with logins as
(
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
)
,visits as
(
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid
)
,meetings as
(
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid
)
,chats as
(
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id
)
select *
from logins l
left join visits v
on l.dbid = v.dbid
and l.userid = v.userid
left join meetings m
on l.dbid = m.dbid
and l.userid = m.userid
left join chats c
on l.dbid = c.dbid
and l.userid = c.userid;

Auto increment id in delta table while inserting

I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not matched.
I want to add column ID to the final delta table and increment it each time we insert data. This column identify each row in our delta table. Is there any way to put that in place ?
def Merge(dict1, dict2):
res = {**dict1, **dict2}
return res
def create_default_values_dict(correspondance_df,marketplace):
dict_output = {}
for field in get_nan_keys_values(get_mapping_dict(correspondance_df, marketplace)):
dict_output[field] = 'null'
# We want to increment the id row each time we perform an insertion (TODO TODO TODO)
# if field == 'id':
# dict_output['id'] = col('id')+1
# else:
return dict_output
def create_matched_update_dict(mapping, products_table, updates_table):
output = {}
for k,v in mapping.items():
if k == 'source_name':
output['products.source_name'] = lit(v)
else:
output[products_table + '.' + k] = F.when(col(updates_table + '.' + v).isNull(), col(products_table + '.' + k)).when(col(updates_table + '.' + v).isNotNull(), col(updates_table + '.' + v))
return output
insert_dict = create_not_matched_insert_dict(mapping, 'products', 'updates')
default_dict = create_default_values_dict(correspondance_df_products, 'Cdiscount')
insert_values = Merge(insert_dict, default_dict)
update_values = create_matched_update_dict(mapping, 'products', 'updates')
delta_table_products.alias('products').merge(
updates_df_table.limit(20).alias('updates'),
"products.barcode_ean == updates.ean") \
.whenMatchedUpdate(set = update_values) \
.whenNotMatchedInsert(values = insert_values)\
.execute()
I tried to increment the column id in the function create_default_values_dict but it's seems to not working well, it doesn't auto increment by 1. Is there another way to solve this problem ? Thanks in advance :)
Databricks has IDENTITY columns for hosted Spark
https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html#parameters
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
[ ( [ START WITH start ] [ INCREMENT BY step ] ) ]
This works on Delta tables.
Example:
create table gen1 (
id long GENERATED ALWAYS AS IDENTITY
, t string
)
Requires Runtime version 10.4 or above.
Delta does not support auto-increment column types.
In general, Spark doesn't use auto-increment IDs, instead favoring monotonically increasing IDs. See functions.monotonically_increasing_id().
If you want to achieve auto-increment behavior you will have to use multiple Delta operations, e.g., query the max value + add it to a row_number() column computed via a window function + then write. This is problematic for two reasons:
Unless you introduce an external locking mechanism or some other way to ensure that no updates to the table happen in-between finding the max value and writing, you can end up with invalid data.
Using row_number() will reduce parallelism to 1, forcing all the data through a single core, which will be very slow with large data.
Bottom line, you really do not want to use auto-increment columns with Spark.
Hope this helps.

DocumentDB multiple filter query on array

Using the DocumentDB query playground, I am working on a filter type of query. I have a set of attributes in my data that are set up to allow the user to search by the specific attribute. Each attribute type becomes and OR statement if multiple items are selected from the name in the name/value collection. If attributes are selected that differ (i.e. color and size) this becomes an AND statement.
SELECT food.id,
food.description,
food.tags,
food.foodGroup
FROM food
JOIN tag1 IN food.tags
JOIN tag2 IN food.tags
WHERE (tag1.name = "snacks" OR tag1.name = "granola bars")
AND (tag2.name = "microwave")
This query works beautifully in the playground.
The main issue is that I have up to 12 attributes, and maybe more. Once I hit 5 joins, that is my maximum allowed number of joins, so the query below doesn't work. (note that this isn't playground data, but a sample of my own)
SELECT s.StyleID FROM StyleSearch s
JOIN a0 in s.Attributes
JOIN a1 in s.Attributes
JOIN a2 in s.Attributes
JOIN a3 in s.Attributes
JOIN a4 in s.Attributes
JOIN a5 in s.Attributes
WHERE (a0 = "color-finish|Grey" OR a0 = "color-finish|Brown" OR a0 = "color-finish|Beige")
AND (a1 = "fabric-type|Polyester" OR a1 = "fabric-type|Faux Leather")
AND (a2 = "design-features|Standard" OR a2 = "design-features|Reclining")
AND (a3 = "style_parent|Contemporary" OR a3 = "style_parent|Modern" OR a3 = "style_parent|Transitional")
AND (a4 = "price_buckets|$1500 - $2000" OR a4 = "price_buckets|$2000 and Up")
AND (a5 = "dimension_width|84 in +")
I am not 100% sure I am using the proper query to perform this, but a simple where clause per below which works in SQL brings back anything matching in the or statements so I end up with items from each "AND statement.
SELECT s.StyleID FROM StyleSearch s
JOIN a in s.Attributes
WHERE (a = "color-finish|Grey" OR a = "color-finish|Brown" OR a = "color-finish|Beige")
AND (a = "fabric-type|Polyester" OR a = "fabric-type|Faux Leather")
AND (a = "design-features|Standard" OR a = "design-features|Reclining")
AND (a = "style_parent|Contemporary" OR a = "style_parent|Modern" OR a = "style_parent|Transitional")
AND (a = "price_buckets|$1500 - $2000" OR a = "price_buckets|$2000 and Up")
AND (a = "dimension_width|84 in +")
Here is an example of the data:
{
"StyleID": "chf_12345-bmc",
"Attributes": [
"brand|chf",
"color|red",
"color|yellow",
"dimension_depth|30 in +",
"dimension_height|counter height",
"materials_parent|wood",
"price_buckets|$500 - $1000",
"style_parent|rustic",
"dimension_width|55 in +"
]
}
I am looking for the proper way to handle this. Thanks in advance.
Is it possible for you to change the structure of your document to add filter attributes specifically for your query on e.g.
{
"StyleID": "chf_12345-bmc",
"Attributes": [
"brand|chf",
"color|red",
"color|yellow",
"dimension_depth|30 in +",
"dimension_height|counter height",
"materials_parent|wood",
"price_buckets|$500 - $1000",
"style_parent|rustic",
"dimension_width|55 in +"
],
"filter_color": "red,yellow",
"filter_fabric_type":"Polyester,leather"
}
This would eliminate the join restriction because now your query looks something like this:
SELECT s.StyleID FROM StyleSearch s
WHERE (CONTAINS(s.filter_color, "Grey") OR CONTAINS(s.filter_color, "Red"))
AND (CONTAINS(s.filter_fabric_type, "Polyester") OR CONTAINS(s.filter_fabric_type, "Leather"))
Of course this does mean that you have additional fields to maintain.
You might also consider writing a stored proc for this and using javascript to loop through your collection and filtering that way: DocumentDB stored procedure tutorial

Need to fetch n percentage of rows in u-sql query

Need help in writing u-sql query to fetch me top n percentage of rows.I have one dataset from which need to take total count of rows and take top 3% rows from dataset based on col1. Code which I have written is :
#count = SELECT Convert.ToInt32(COUNT(*)) AS cnt FROM #telData;
#count1=SELECT cnt/100 AS cnt1 FROM #count;
DECLARE #cnt int=SELECT Convert.ToInt32(cnt1*3) FROM #count1;
#EngineFailureData=
SELECT vin,accelerator_pedal_position,enginefailure=1
FROM #telData
ORDER BY accelerator_pedal_position DESC
FETCH #cnt ROWS;
#telData is my basic dataset.Thanks for help.
Some comments first:
FETCH currently only takes literals as arguments (https://msdn.microsoft.com/en-us/library/azure/mt621321.aspx)
#var = SELECT ... will assign the name #var to the rowset expression that starts with the SELECT. U-SQL (currently) does not provide you with stateful scalar variable assignment from query results. Instead you would use a CROSS JOIN or other JOIN to join the scalar value in.
Now to the solution:
To get the percentage, take a look at the ROW_NUMBER() and PERCENT_RANK() functions. For example, the following shows you how to use either to answer your question. Given the simpler code for PERCENT_RANK() (no need for the MAX() and CROSS JOIN), I would suggest that solution.
DECLARE #percentage double = 0.25; // 25%
#data = SELECT *
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)
) AS T(pos);
#data =
SELECT PERCENT_RANK() OVER(ORDER BY pos) AS p_rank,
ROW_NUMBER() OVER(ORDER BY pos) AS r_no,
pos
FROM #data;
#cut_off =
SELECT ((double) MAX(r_no)) * (1.0 - #percentage) AS max_r
FROM #data;
#r1 =
SELECT *
FROM #data CROSS JOIN #cut_off
WHERE ((double) r_no) > max_r;
#r2 =
SELECT *
FROM #data
WHERE p_rank >= 1.0 - #percentage;
OUTPUT #r1
TO "/output/top_perc1.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();
OUTPUT #r2
TO "/output/top_perc2.csv"
ORDER BY p_rank DESC
USING Outputters.Csv();

Resources