How to respectively join two tables' multi-part in presto

How to respectively join two tables' multi-part in presto - presto

Now I have two tables whose data format is like follow:
table 1:
col_time_range col_user_id
A 1
A 2
A 3
B 1
B 2
table 2:
col_user_id
1
2
3
4
I want to join these 2 tables to respectively get the not existed user id for time range A and B, like:
col_time_range col_user_id
A 4
B 2
B 3
so how can I write this query?

Related

Matching values in Excel

I have two tables in this form:
Name
Function1
A
3
B
4
C
20
E
5
Name
Function2
A
6
B
8
C
2
D
25
And I would like to create table look like this:
Name
Function1
Function2
A
3
6
B
4
8
C
20
2
D
-
25
E
5
-
How can I pair those values and creates this type of table?

If those functions are numbers, as in your example, you can use Consolidate:
Consolidate
data
As quick example, I consolidated both tables in 1 single table activating options top row and left column:

Filtering two columns: keep all the rows associated to one ID if exists a value in the second column

I have a table with various columns but i need to filter it based on two, the table structure is the following:
ID Test
1 A
1 B
1 C
2 B
2 D
3 A
4 A
4 C
4 D
5 B
5 C
What i need to do is keeping all rows associated to one ID if exists the case where the test is "A", the filtered table should then be:
ID Test
1 A
1 B
1 C
3 A
4 A
4 C
4 D
Is there a way to do this?

Alternatively you can use:
=FILTER(A1:B11,COUNTIFS(A1:A11,A1:A11,B1:B11,"A"))
Or, based on your comment:
=LET(X,COUNTIFS(B1:B11,B1:B11,L1:L11,"A"),INDEX(FILTER(B1:L11,X),SEQUENCE(SUM(X)),{1;11}))

If you have Excel365 and access to dynamic formulas then try below-
=FILTER(A1:B11,ISNUMBER(MATCH(A1:A11,UNIQUE(FILTER(A1:A11,B1:B11="A")),0)))

Subtract a subset of columns from a key column in Pandas Pivot

I have a pivot table with multiple columns of data in a time series:
A B C D
11/1/2018 1 5 5 7
11/2/2018 2 6 6 8
11/3/2018 3 7 7 9
The values in the data columns are not important for this example. I would like to subtract the value in the "key" column (column A in this case) from a subset of columns: B & C in this case. I would then like to drop any columns not in the subset or the key column. Result would be:
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4
I have subtracted columns in the past via code like this:
df['dif'] = df['B'] -df['A']
But this will add the "dif" column. I would like to replace column B with B-A values. Also, instead of passing the instructions one at a time (B-A, C-A), would like to pass the list something like "if column in list, subtract key column, else drop column."
Thanks

pandas.DataFrame.sub with axis=0
When subtracting a Series from a DataFrame Pandas will align the columns of the DataFrame with the index of the Series by default. This is what happens when you use the - operator. However, when you use the pandas.DataFrame.sub method, you can override that default and specify that the DataFrame should align its index with the index of the Series.
def f(d, key, subset):
return d[[key]].join(d[subset].sub(d[key], axis=0))
f(df, 'A', ['B', 'C'])
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4

You can use apply to substract A from the subset columns that you choose and finally join again with A.
df['A'].to_frame().join(df[['B','C']].apply(lambda x: x - df['A']))
A B C
11/1/2018 1 4 4
11/2/2018 2 4 4
11/3/2018 3 4 4

how to divide single row into multiple coumns in excel(i have thousand records)

how to split if the rows with ID
Ex:
Consider two columns A & B
A B
1 apple,orange
2 orange,banana
Then how to split the data based on ID. Expected output like below:
A B
1 apple
1 orange
2 orange
2 banana

How to rank entries (according to a field value) which are dynamically grouped by another field?

Given the following (toy model) tables:
Table1
Company Date User FK_Subject
A 10.05.2016 1 1
A 10.06.2016 1 2
A 10.07.2016 2 3
B 10.03.2016 1 4
B 10.04.2016 2 5
B 15.05.2016 1 6
Table2
PK_Subject Subject_text
1 One
2 Two
3 Three
4 Four
5 Five
6 Six
There is a relation between primary key PK_Subjectand foreign key FK_Subject.
I want to filter entries from Table1 displaying Company and Subject_text of only the most recent date given a User (or combination of Users).
For this I created a calculated column in Table1 called [DateRank] with the formula
=RANKX(FILTER(Table1; [Company]=EARLIER([Company]) && [User]=EARLIER([User])); [Date])
giving a new Table1
Company Date User FK_Subject DateRank
A 10.05.2016 1 1 2
A 10.06.2016 1 2 1
A 10.07.2016 2 3 1
B 10.03.2016 1 4 2
B 10.04.2016 2 5 1
B 15.05.2016 1 6 1
I can create this pivot table which works as intended if you filter only a single user. When you keep a combination of users (in this toy-model case: all 2 of them), it shows two entries per company:
Company Date Subject_text Sum of FK_Subject
A 10.06.2016 Two 2
A 10.07.2016 Three 3
B 10.04.2016 Five 5
B 15.05.2016 Six 6
However, the desired output for both users selected would be the most recent entry taken over both users (i.e. the whole table):
Company Date Subject_text Sum of FK_Subject
A 10.07.2016 Three 3
B 15.05.2016 Six 6
As hinted in the text the solution should generalize to arbitrary many users and combinations thereof.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to respectively join two tables' multi-part in presto - presto

Related

Matching values in Excel

Filtering two columns: keep all the rows associated to one ID if exists a value in the second column

Subtract a subset of columns from a key column in Pandas Pivot

how to divide single row into multiple coumns in excel(i have thousand records)

How to rank entries (according to a field value) which are dynamically grouped by another field?

Categories

Resources