What algorithm to use to exchange data between multiple parties

What algorithm to use to exchange data between multiple parties - security

Let's say there are Alice, Bob, Eve and Arbitrator.
And let's say
Alice has a table of records
| id | pet type | birth date |
|----------------------------------
| 1 | cat | 2010-03-03 |
| 2 | dog | 2011-06-12 |
Bob has a table of records
| id | pet type | color |
|-------------------------------|
| 2 | dog | white |
| 3 | bird | green |
Eve has a table of records
| id | pet type | size |
|--------------------------------
| 1 | cat | small |
| 3 | bird | small |
Now everyone wants to enrich his own data by the neighbor's data with the corresponding id, but without disclosuring this id, for example,
Alice wants her data to be like the following
| id | pet type | birth date | color | size |
|------------------------------------------------------
| 1 | cat | 2010-03-03 | | small |
| 2 | dog | 2011-06-12 | white | |
Bob wants his data to be like the following
| id | pet type | birth date | color | size |
|------------------------------------------------------
| 2 | dog | 2011-06-12 | white | |
| 3 | bird | | green | small |
and so on.
Arbitrator coordinates all the exchange operations between the parties and also matches the data using corresponding encrypted id fields from the dataset of each party, so parties must communicate through the arbitrator, but not directly to each other.
Also arbitrator must be able to ensure that
hash(Alice's id = 2) = hash(Bob's id = 2), hash(Bob's id = 3) = hash(Eve's id = 3)
and so on, but must not be able to recover original identifiers, and also arbitrator must not be able to brute-force the encrypted identifiers (so if talking about some kind of hashes - they must be salted)
To simplify things for Alice, Bob and Eve - they would like to have only a single key to encrypt own identifiers, but this key should be different for each party, i.e.
F1(alive_key(alice_id)) = F2(bob_key(bob_id)) = F3(eve_key(eve_id))
where, F1, F2, F3 - are some functions the arbitrator applies to encrypted identifiers of Alice, Bob and Eve, and these functions does not decrypt the original identifiers, but lead the encrypted identifiers to be the same.
So the question - is there any algorithm that can help to solve such an issue?

Related

How to create sketches for million of data using Spark?

I have a data frame something like this
| UserID | Platform | Genre | Publisher |
| -------- | ------- |-----------|-----------|
| 1 | PS2. | FPS | Activision|
| 2 | PS1. | Race |EA Sports. |
| 3 | PS2. |RTS |Microsoft. |
| 4 | Xbox. | Race |EA Sports. |
Now from the above data frame, I want to build a Map that has column name and value as keys and a set of user Id as values.
For Ex
Platform_PS2 = [1,3]
Platform_Xbox = [4]
Platform_PS1 = [2]
Genere_Race = [2,4]
Basically, for these arrays, I want to build sketches at the end

Excel formula to apply a type of true/false condition

I have two excel files; the original file contains 10K+ patient names and medical condition, the goal is to identify patients (about 400+) with special conditions so that the mail that gets sent to them is different than the rest of the list.
Original File Template:
Last Name
First Name
Diagnosis
Doe
John
Cancer
Smith
John
HIV
Smith
Jayne
Broken Arm
Rock
Dwayne
Common Cold
Foster
Jane
Common Cold
Mailing Template:
Last Name
First Name
Type of Mail
Doe
John
Smith
John
Smith
Jayne
Rock
Dwayne
Foster
Jane
In the Mailing Template, I want to classify the Type of Mail based on the diagnosis. Common diagnosis would be "LV1" and anything that I would identify as a special diagnosis, like cancer or HIV, would be "LV2"
My initial approach would be to filter the Original File by the special diagnosis and then use a True/False condition of that filtered list against the Mailing template and manually flag LV1 or LV2. But is there a method or formula that could scan the Original File to look for the keywords (eg cancer and HIV) and automatically assign the corresponding names in the Mailing List with "LV1" or "LV2"?

I believe if you can cover all the cases you're interested in, it's possible with a IF(OR) statement, for example:
Let's say B5 is the cell with the diagnosis, in your target cell (where you want "LV1" or "LV2" to appear) you will write the next formula:
=IF(OR(B5="Common*", B5="Broken*"), "LV1", "LV2")
Note the "*" in the diagnosis condition text, it will allow any cell that begins with such text to be considered true. For example, "Common*" will consider both "Common Cold" and "Common Fever" as "LV1" cases.
This solution may be problematic if you have a lot of different diagnoses to cover.

Exact Matches
If you expect to add additional condition/mailing types down the road, =XLOOKUP() would be a good option.
In column D this would match the diagnosis to a set of values in column F, and return the value in column G.
You can add as many diagnosis/mailing type values as you need without changing formulas.
In cell D2: =XLOOKUP(C2,F:F,G:G):
| | A | B | C | D | E | F | G |
|---+-----------+------------+-------------+----------------------+---+-------------+-------|
| 1 | Last Name | First Name | Diagnosis | Type of Mail | | Match | Index |
| 2 | Doe | John | Cancer | =XLOOKUP(C2,F:F,G:G) | | Cancer | LV2 |
| 3 | Smith | John | HIV | LV2 | | HIV | LV2 |
| 4 | Smith | Jayne | Broken Arm | LV1 | | Broken Arm | LV1 |
| 5 | Rock | Dwayne | Common Cold | LV1 | | Common Cold | LV1 |
| 6 | Foster | Jane | Common Cold | LV1 | | | |
Note =XLOOKUP() uses the same concept as using =INDEX(G:G, MATCH(C2, F:F, 0)) in previous versions of excel (and produces identical results).
Wildcard Matching
To support using keywords, you would then need to set the [match_mode] argument in =XLOOKUP() equal to 2, which adds the ability to use wildcards (eg * and ?).
The following would match any diagnosis where the first word matches any first wordcommon using common*.
In cell D2: =XLOOKUP(LEFT(C2, IFERROR(SEARCH(" ", C2)-1, LEN(C2)))&"*",F:F,G:G,0,2)
| | A | B | C | D | E | F | G |
|---+-----------+------------+-----------------+-------------------------------------------------------------------------+---+------------+----------------|
| 1 | Last Name | First Name | Diagnosis | Type of Mail | | Match | Index |
| 2 | Doe | John | Cancer | =XLOOKUP(LEFT(C2, IFERROR(SEARCH(" ", C2)-1, LEN(C2)))&"*",F:F,G:G,0,2) | | Cancer | LV2 |
| 3 | Smith | John | HIV | LV2 | | HIV | LV2 |
| 4 | Smith | Jayne | Broken Arm | LV1 | | Broken Arm | LV1 |
| 5 | Rock | Dwayne | Common Cold | Matches Common | | Common | Matches Common |
| 6 | Foster | Jane | Common Anything | Matches Common | | | |
You would need to adjust some in the event there is crossover in keywords or to search for multi-word keyword, but this should be a good place to start.

Cross-referencing values from a reference table with fuzzy inputs

I've got a Microsoft Access database with several tables. I've thrown 2 of those into an Excel file to simplify my work, but either an Access or Excel solution can be used for this. Below are examples of the data that needs to be manipulated, but in those records there's a lot of other columns and information.
I've got Table 1 (Input Table):
| Bank | Reference |
|-----------------|-----------|
| Chase Bank LLC | |
| JPMorgan Chase | |
| Chase | |
| Bank of America | |
| Bank of America | |
| Wells Fargo | |
The Reference column is empty. I want to fill it based on the reference table, which contains the IDs that would go into the Reference column.
Table 2 (Reference Table):
| Bank | ID |
|-----------------|-----------|
| Chase Bank | 1 |
| Bank of America | 2 |
| Wells Fargo | 3 |
So the solution would fill the "Reference" column like this:
| Bank | Reference |
|-----------------|-----------|
| Chase Bank LLC | 1 |
| JPMorgan Chase | 1 |
| Chase | 1 |
| Bank of America | 2 |
| Bank of America | 2 |
| Wells Fargo | 3 |
Since this is taken from a database's table, these aren't really ordered records. The purpose of this is to create a relationship in an already-existing database that didn't have those relationships set up.

a join between the 2 text fields, in an Update query, will provide a write of the ID for those records that exactly match.
there is no technology/option for the non matching; you can only apply some creative designs... for instance the chase bank does match for the first 10 characters... so for the non matched you could set up a temp table with a new field that is Left(fieldname,10)...join on this new field to get the ID into the temp table - - and then do a 2nd Update query to move the ID again finally using the full name

Grouping a list of items in to as equal in numbers as possible

Requirement: Split a list in to 4 separate groups, based on a value for each row.
| Player | Skill |
| ------------- |:-------------:|
| Player 1 | 10000 |
| Player 2 | 50000 |
| Player 3 | 2000 |
| Player 4 | 11000 |
| Player 5 | 7525 |
| Player 6 | 100 |
| Player 7 | 999 |
| Player 8 | 14579 |
| Player 9 | 26700 |
So in the example above, these players would be split in to 4 groups:
| Group | # of players |
| ------------- |:-------------:|
| Group1 | 2 |
| Group2 | 2 |
| Group3 | 2 |
| Group4 | 3 |
The number of players in a group needs to be as close as possible, however, as a group, the groups total Skill needs to around similar in numbers each time.
Before I go too far down the rabbit hole (as wording a question like this in a simple google search is not turning out very well) are there any built in functions of Excel that can be leveraged to achieve this or possible efforts in VBA that can be explored to achieve the required result?

This isn't an answer! But suppose you try a simple algorithm:
Calculate average skill level (ASL) for all 9 players
Set TSG (total skill for group) to zero.
Loop:Take largest skill Level (LSL) of remaining players
If TSG+LSL>ASL
Go to next group
Else
Add to total skill (TSG) for this group
Remove player from list
Repeat loop until no players remaining.
If you apply this by hand to your data you should get:
Average=30725.75
+---------+---------+---------+---------+
| Group 1 | Group 2 | Group 3 | Group 4 |
+---------+---------+---------+---------+
| 50000 | 26700 | 14579 | 10000 |
| | 2000 | 11000 | 7525 |
| | 999 | | |
| | 100 | | |
| | | | |
| 50000 | 29799 | 25579 | 17525 |
+---------+---------+---------+---------+
Clearly there are a couple of issues - you might not want a single group containing only player with highest skill level. Also you might want to re-average the remaining players after taking out the most skilful player. Should be a starting point though - could be implemented fairly easily with formulas or VBA.

Cognos BI - Join Results of Multiple Queries into Single Table

Cognos BI question here - I have two data tables – one contains the Last Name and SS # of customers, and another table has “Extended Info” about those customers. Element ID is the data element being stored, Ext Cis Value has the data value, and SS Number ties it back to a customer.
I want to build a single list which lists all customers, as well as the corresponding values for each of the three data elements in the ExtendedInfo table. In this case it’s #13 (Email Address), #15 (Prospect Type) and #16 (Prospect Source)
Here is the data I have today:
ProspectData table:
| Last Name | SS # |
|-----------------------|-----------|
| ABC Construction, LLC | S10000104 |
| XYZ Construction, LLC | S10000106 |
ExtendedInfo table:
| Element Id | Ext Cis Value | SS Number |
|------------|---------------|-----------|
| 13 | HAS#EMAIL.COM | S10000104 |
| 13 | NO#EMAIL.COM | S10000106 |
| 15 | HOT PROSPECT | S10000104 |
| 15 | WARM PROSPECT | S10000106 |
| 16 | External | S10000106 |
| 16 | Internal | S10000104 |
I've been able to JOIN these two tables together to create a result like this, but only by applying a filter to ExtendedInfo to return a single field. Example as shown:
| SS # | Last Name | Email Address |
|-----------|-----------------------|---------------|
| S10000104 | ABC Construction, LLC | HAS#EMAIL.COM |
| S10000106 | XYZ Construction, LLC | NO#EMAIL.COM |
I am trying to set up a single query which will contain five columns: SS Number, Last Name, Email Address (#13 on Element ID), Prospect Type (#15) and Prospect Source (#16). I envision it looking like this:
| SS # | Last Name | Email Address | Prospect Type | Prospect Source |
|-----------|-----------------------|---------------|---------------|-----------------|
| S10000104 | ABC Construction, LLC | HAS#EMAIL.COM | HOT PROSPECT | Internal |
| S10000106 | XYZ Construction, LLC | NO#EMAIL.COM | WARM PROSPECT | External |
So far, the closest I’ve come to this is adding a new query on the ExtendedInfo table which has a filter applied for Element ID, then using JOIN to join the result of that query and the ProspectData table. However, I don’t know how (or if it’s practical) to create 3 individual queries on ExtendedInfo (Email, Prospect Type, Prospect Source) and join them all to ProspectData.
This seems like a simple task, but I’m not sure how to do it. Any suggestions? Thanks in advance for your help.

You don't have to join the tables three times. In fact, you only have to join once. You can construct your custom columns at the model/report layer.
Join ProspectData and ExtendedInfo on SS Number with a standard inner join
The result will look like this:
| Element Id | Ext Cis Value | SS Number | SS Number | Last Name |
|------------|---------------|-----------|-----------|-----------------------|
| 13 | HAS#EMAIL.COM | S10000104 | S10000104 | ABC Construction, LLC |
| 13 | NO#EMAIL.COM | S10000106 | S10000106 | XYZ Construction, LLC |
| 15 | HOT PROSPECT | S10000104 | S10000104 | ABC Construction, LLC |
| 15 | WARM PROSPECT | S10000106 | S10000106 | XYZ Construction, LLC |
| 16 | External | S10000106 | S10000106 | XYZ Construction, LLC |
| 16 | Internal | S10000104 | S10000104 | ABC Construction, LLC |
Now, at the model layer (if doing this in Framework manager) or in the resultant result query (if doing this in a report) add three new data items, Email Address, Prospect Type, Prospect Source with the following expressions:
Email Address
CASE
WHEN position('#',[Ext Cis Value]) > 0 THEN [Ext Cis Value]
ELSE null
END
Prospect Type
CASE
WHEN position('PROSPECT',[Ext Cis Value]) > 0 THEN [Ext Cis Value]
ELSE null
END
Prospect Source
CASE
WHEN position('ternal',[Ext Cis Value]) > 0 THEN [Ext Cis Value]
ELSE null
END
Set the Aggregate Function attribute for the three new data items to 'Maximum'. This should cause your result to roll up to a single row, with values in each of the three new data items.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What algorithm to use to exchange data between multiple parties - security

Related

How to create sketches for million of data using Spark?

Excel formula to apply a type of true/false condition

Cross-referencing values from a reference table with fuzzy inputs

Grouping a list of items in to as equal in numbers as possible

Cognos BI - Join Results of Multiple Queries into Single Table

Categories

Resources