"Expanding" dimension security (user uploaded attr.)

"Expanding" dimension security (user uploaded attr.) - security

I'm new to the world of SSAS and Cubes, and this question/title might be way off (as I have no idea how to formulate it. Apologize if that's the case).
Anyway, here goes. I was asked to take a look on a cube (not made by, making it more complex), allowing user-uploaded *.csv files to limit the data in the Cube.
The setup seems to match the Dynamic Security used here: Analysis Services Dynamic Security
Three tables are in play
+-----------------+
| User |
+-----------------+
| (PK) DW_EK_User |
| User |
+-----------------+
+--------------+
| UserUpload |
+--------------+
| DW_EK_Upload |
| DW_EK_User |
| DW_EK_Person |
| GroupNo |
| GroupLabel |
+--------------+
+-------------------+
| Person |
+-------------------+
| (PK) DW_EK_Person |
| __ |
| __ |
| __ |
+-------------------+
The user now uploads a *.csv with ID's of interest, including a label. These are temporarily stored in the fact-table UserUpload and used to filter and only show the results for included ID's.
My question is, if its possible to include the uploaded GroupLabels as a filter?
If my *.csv look like this:
ID1 GroupA
ID5 GroupA
ID2 GroupA
ID2 GroupB
I would like to be able to see the measures for the individual groups. Now I see the measure for all ID's.
I'm looking into Named Sets, but the data is in the "wrong" table to do like this:
Exists(
StrToSet("[User].[User].[All].[" + UCase(Mid(Username, InStr(1, Username, "\") + 1)) + "]"),
[Person].[DW EK Person].[All].Children,
"Measure")
This will return the Username from the User-Dimension.

Related

Efficiently update rows of a postgres table from another table in another database based on a condition in a common column

I have two pandas DataFrames:
df1 from database A with connection parameters {"host":"hostname_a","port": "5432", "dbname":"database_a", "user": "user_a", "password": "secret_a"}. The column key is the primary key.
df1:
| | key | create_date | update_date |
|---:|------:|:-------------|:--------------|
| 0 | 57247 | 1976-07-29 | 2018-01-21 |
| 1 | 57248 | | 2018-01-21 |
| 2 | 57249 | 1992-12-22 | 2016-01-31 |
| 3 | 57250 | | 2015-01-21 |
| 4 | 57251 | 1991-12-23 | 2015-01-21 |
| 5 | 57262 | | 2015-01-21 |
| 6 | 57263 | | 2014-01-21 |
df2 from database B with connection parameters {"host": "hostname_b","port": "5433", "dbname":"database_b", "user": "user_b", "password": "secret_b"}. The column id is the primary key (these values are originally the same than the one in the column key in df1; it's only a renaming of the primary key column of df1).
df2:
| | id | create_date | update_date | user |
|---:|------:|:-------------|:--------------|:------|
| 0 | 57247 | 1976-07-29 | 2018-01-21 | |
| 1 | 57248 | | 2018-01-21 | |
| 2 | 57249 | 1992-12-24 | 2020-10-11 | klm |
| 3 | 57250 | 2001-07-14 | 2019-21-11 | ptl |
| 4 | 57251 | 1991-12-23 | 2015-01-21 | |
| 5 | 57262 | | 2015-01-21 | |
| 6 | 57263 | | 2014-01-21 | |
Notice that the row[2] and row[3] in df2 have more recent update_date values (2020-10-11 and 2019-21-11 respectively) than their counterpart in df1 (where id = key) because their creation_date have been modified (by the given users).
I would like to update rows (i.e. in concrete terms; create_date and update_date values) of df1 where update_date in df2 is more recent than its original value in df1 (for the same primary keys).
This is how I'm tackling this for the moment, using sqlalchemy and psycopg2 + the .to_sql() method of pandas' DataFrame:
import psycopg2
from sqlalchemy import create_engine
connector = psycopg2.connect(**database_parameters_dictionary)
engine = create_engine('postgresql+psycopg2://', creator=connector)
df1.update(df2) # 1) maybe there is something better to do here?
with engine.connect() as connection:
df1.to_sql(
name="database_table_name",
con=connection,
schema="public",
if_exists="replace", # 2) maybe there is also something better to do here?
index=True
)
The problem I have is that, according to the documentation, the if_exists argument can only do three things:
if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
Therefore, to update these two rows, I have to;
1) use .update() method on df1 using df2 as an argument, together with
2) replacing the whole table inside the .to_sql() method, which means "drop+recreate".
As the tables are really large (more than 500'000 entries), I have the feeling that this will need a lot of unnecessary work!
How could I efficiently update only those two newly updated rows? Do I have to generate some custom SQL queries to compares the dates for each rows and only take the ones that have really changed? But here again, I have the intuition that, looping through all rows to compare the update dates will take "a lot" of time. How is the more efficient way to do that? (It would have been easier in pure SQL if the two tables were on the same host/database but it's unfortunately not the case).

Pandas can't do partial updates of a table, no. There is a longstanding open bug for supporting sub-whole-table-granularity updates in .to_sql(), but you can see from the discussion there that it's a very complex feature to support in the general case.
However, limiting it to just your situation, I think there's a reasonable approach you could take.
Instead of using df1.update(df2), put together an expression that yields only the changed records with their new values (I don't use pandas often so I don't know this offhand); then iterate over the resulting dataframe and build the UPDATE statements yourself (or with the SQLAlchemy expression layer, if you're using that). Then, use the connection to DB A to issue all the UPDATEs as one transaction. With an indexed PK, it should be as fast as this would ever be expected to be.
BTW, I don't think df1.update(df2) is exactly correct - from my reading, that would update all rows with any differing fields, not just when updated_date > prev updated_date. But it's a moot point if updated_date in df2 is only ever more recent than those in df1.

Cross-referencing values from a reference table with fuzzy inputs

I've got a Microsoft Access database with several tables. I've thrown 2 of those into an Excel file to simplify my work, but either an Access or Excel solution can be used for this. Below are examples of the data that needs to be manipulated, but in those records there's a lot of other columns and information.
I've got Table 1 (Input Table):
| Bank | Reference |
|-----------------|-----------|
| Chase Bank LLC | |
| JPMorgan Chase | |
| Chase | |
| Bank of America | |
| Bank of America | |
| Wells Fargo | |
The Reference column is empty. I want to fill it based on the reference table, which contains the IDs that would go into the Reference column.
Table 2 (Reference Table):
| Bank | ID |
|-----------------|-----------|
| Chase Bank | 1 |
| Bank of America | 2 |
| Wells Fargo | 3 |
So the solution would fill the "Reference" column like this:
| Bank | Reference |
|-----------------|-----------|
| Chase Bank LLC | 1 |
| JPMorgan Chase | 1 |
| Chase | 1 |
| Bank of America | 2 |
| Bank of America | 2 |
| Wells Fargo | 3 |
Since this is taken from a database's table, these aren't really ordered records. The purpose of this is to create a relationship in an already-existing database that didn't have those relationships set up.

a join between the 2 text fields, in an Update query, will provide a write of the ID for those records that exactly match.
there is no technology/option for the non matching; you can only apply some creative designs... for instance the chase bank does match for the first 10 characters... so for the non matched you could set up a temp table with a new field that is Left(fieldname,10)...join on this new field to get the ID into the temp table - - and then do a 2nd Update query to move the ID again finally using the full name

add uniqueness to a table column just for some cases in mysql using knex

I'm using mysql. I want a column to have unique values just in some cases.
Example, the table can have the next vales:
+----+-----------+----------+------------+
| id | user_id | col1 | col2 |
+----+-----------+----------+------------+
| 1 | 2 | no | no |
| 2 | 2 | no | no |
| 3 | 3 | no | yes |
| 4 | 2 | yes | no |
| 5 | 2 | no | yes |
+----+-----------+----------+------------+
I want the no|no to be able to repeat for the same user but no the yes|no combination. Is this possible in mysql? And with knex?
My migration fot that table looks like this
return knex.schema.createTable('myTable', table => {
table.increments('id').unsigned().primary();
table.integer('uset_id').unsigned().notNullable().references('id').inTable('table_user').onDelete('CASCADE').index();
table.string('col1').defaultTo('yes');
table.string('col2').defaultTo('no');
});

That doesn't seem to be easy task to do. You would need partial unique index over multiple columns.
I couldn't spot that mysql would support partial indexes https://dev.mysql.com/doc/refman/8.0/en/create-index.html
So it could Something like what is described here, but using triggers for that seems a bit overkill https://dba.stackexchange.com/questions/41030/creating-a-partial-unique-constraint-for-mysql

Should all fields that are visible on a screen be validated in Gherkin?

We are creating Gherkin feature files for our application to create executable specifications. Currently we have files that look like this:
Given product <type> is found
When the product is clicked
Then detailed information on the product appears
And the field text has a value
And the field price has a value
And the field buy is available
We are wondering if this whole list of and keywords that validate if fields are visible on the screen is the way to go, or if we should shorten that to something like 'validate input'.

We have a similar case in that our service can return a lot of 10's of elements for each case that we could validate. We do not validate every element for each interaction, we only test the elements that are relevant to the test case.
To make it easier to maintain and switch which elements we are using, we use scenario outlines and tables of examples.
Scenario Outline: PO Boxes correctly located
When we search in the USA for "<Input>"
Then the address contains
| Label | Text |
| PO Box | <PoBox> |
| City name | <CityName> |
| State code | <StateCode> |
| ZIP Code | <ZipCode> |
| +4 code | <ZipPlus4> |
Examples:
| ID | Input | PoBox | CityName | StateCode | ZipCode |
| 01 | PO Box 123, 12345 | PO Box 123 | Boston | MA | 12345 |
| 02 | PO Box 321, Whitefish | PO Box 123 | Whitefish | MN | 54321 |
By doing it this way, we have a generic step "the address contains" that uses the 'Label' and 'Text' to test the individual elements. It is a neat and tidy way to test a lot of potential combinations - but it probably depends on your individual use case - how important all of the fields are.

You only need to validate the ones that provide business value, which is probably all of them. I would avoid using tech terms like "field" because it isn't related to a behavior. Al Mills is right on for using the tables.
I'd word it like this:
Scenario Outline: Review product details
Given I find the product <Type>
When I select the product
Then detailed information on the product appears including
| Description | <Description> |
| Price | <Price> |
And I can buy the product
Examples:
| Type | Description | Price |
| Hose | Rubber Hose | 31.99 |
| Sprinkler | Rotating Sprinker | 12.99 |
The words I chose are behaviors or whats, not technical implementations or hows.

Core Data - fetch distinct records with all properties

considering I have core data objects stored like this:
|Name | ActionType | Content | Date |
|-----|------------|---------|-----------|
|Abe | Create | "Hello" | 2014-07-01|
|Cat | Create | "Well" | 2014-07-01|
|Abe | Create | "Hi" | 2014-07-02|
|Bob | Edit | "Yo" | 2014-07-03|
|Cat | Delete | "What" | 2014-07-04|
|Abe | Edit | "Haha" | 2014-07-05|
I would like to get the last action of each user, so the results would be
|Abe | Edit | "Haha" | 2014-07-05|
|Cat | Delete | "What" | 2014-07-04|
|Bob | Edit | "Yo" | 2014-07-03|
Does anyone knows how to do that with a NSFetchRequest? So far from what I've gathered, if you want to use "group by", you can only retrieve the values in the group by cause (it will return "Abe, Cat, Bob" without the rest of the data in the core data object). Similar with "returnsDistinctResults", it will not return the whole object.
I have a feeling that core data is not equipped for that, any helps & hints would be appreciated!

Core Data is an object graph, not a database. Core Data itself has no concept of uniqueness, it's up to you to implement that in your application. This is most typically done using the find or create pattern. This pattern helps you prevent duplicate objects from being stored.
That said, you CAN return distinct results from Core Data using the NSDictionaryResultType. This will not prevent duplicates from being stored, but can be used to return distinct results from a fetch. There is an example of this in the programming guide. You can give this request all properties for a given entity by working with the NSEntityDescription of the managed object you are fetching.
For getting the object with the "last" timestamp for each, you actually want to get the object with the maximum value for that key path. That can be done a number of ways - a subquery, key path operators, expressions, etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

"Expanding" dimension security (user uploaded attr.) - security

Related

Efficiently update rows of a postgres table from another table in another database based on a condition in a common column

Cross-referencing values from a reference table with fuzzy inputs

add uniqueness to a table column just for some cases in mysql using knex

Should all fields that are visible on a screen be validated in Gherkin?

Core Data - fetch distinct records with all properties

Categories

Resources