Conditional Inner join in sqlite python - python-3.x

I have three tables a, b and c.
Table a is related with table b through column key.
table b is related with table c through columns word, sense and speech. In addition table c holds column id.
Now some rows in a.word have no matching value with b.word, based on that
I want to inner join tables on condition if a.word = b.word then join, otherwise compare only a.end_key = b.key.
As a result I want to have table in form of a with extra columns of start_id and end_id from c matching with key_start and key_end.
I tried following sql command with python:
CREATE TABLE relations
AS
SELECT * FROM
c
INNER JOIN
a
INNER JOIN
b
ON
a.end_key = b.key
AND
a.start_key = b.key
AND
b.word = c.word
AND
b.speech = c.speech
AND
b.sense = c.sense
OR
a.word = b.word
a:
+-----------+---------+------+-----------+
| key_start | key_end | word | relation |
+-----------+---------+------+-----------+
| k5 | k1 | tree | h |
| k7 | k2 | car | m |
| k200 | k3 | bad | ho |
+-----------+---------+------+-----------+
b:
+-----+------+--------+-------+
| key | word | speech | sense |
+-----+------+--------+-------+
| k5 | sky | a | 1 |
| k2 | car | a | 1 |
| k3 | bad | n | 2 |
+-----+------+--------+-------+
c:
+----+---------+--------+-------+
| id | word | speech | sense |
+----+---------+--------+-------+
| 0 | light | a | 1 |
| 0 | dark | b | 3 |
| 1 | neutral | a | 2 |
+----+---------+--------+-------+
Edit for clarification:
The values of tables a, b and c hold hundreds thousands lines, so there are matching values in the tables. Table a is related to table b with end_key ~ key and start_key~key relation. Table b is related to c through word sense and speech, there are values which match in each of these columns.
The desired table is in form
start_id|key_start|key_end|end_id|relation
Where start_id matches key_start and key_end matches end_id.

EDIT new answer
The problem with the proposed query lies in the use of AND's and OR's (and likely missing (...)). This statement
a.word = b.word then join, otherwise compare only a.end_key = b.key.
would translate to:
AND (a.word= b.word OR a.end_key = b.key).
Maybe try it like this:
ON
b.word = c.word
AND
b.speech = c.speech
AND
b.sense = c.sense
AND
(a.word = b.word OR a.end_key = b.key)
It would be a good idea to test in a sqlite manager (eg command line sqlite3, DB Browser for sqlite) before you try it in python; troubleshooting is much easier. And of course test the SELECT before you implement it in a CREATE TABLE.
You could clarify your question by showing the desired columns and result in relations table that this sample data would create (there is nothing between b and c that would match on word, speech, sense). Also the description of the relationship between a and b is confusing. In the first paragraph it says Table a is related with table b through column key. Should key be word?

Related

Combining rows of a table using similar cells of one of its columns in Excel

I have a table (1) in Excel, with two columns, in which at the first column (A) there are some numbers and at the second column (B) there are some letters. I want to have a method to make another table (2) from (1) to put different letters at the first column then to put in each row the numbers that were corresponded to letters in table (1).
For example, let the table (1) is:
| A | B |
|---|---|
| 1 | a |
| 1 | b |
| 2 | a |
| 2 | c |
| 3 | b |
| 4 | b |
What is a method in Excel which make the following combination table:
| a | 1 | 2 | |
| b | 1 | 3 | 4 |
| c | 2 | | |
in which letters are in first column and in each row there are the numbers that were in relationship with the row's letter in table (1)?
As per below screenshot use below formula to C1 cell.
=UNIQUE(B1:B6)
And following formula to D2 cell then drag down
=TRANSPOSE(UNIQUE(FILTER($A$1:$A$6,$B$1:$B$6=C1)))

Create multiple fields as arrays in Pyspark?

I have a dataframe with multiple columns as such:
| ID | Grouping | Field_1 | Field_2 | Field_3 | Field_4 |
|----|----------|---------|---------|---------|---------|
| 1 | AA | A | B | C | M |
| 2 | AA | D | E | F | N |
I want to create 2 new columns and store an list of of existing columns in new fields with the use of a group by on an existing field. Such that my new dataframe would look like this:
| ID | Grouping | Group_by_list1 | Group_by_list2 |
|----|----------|----------------|----------------|
| 1 | AA | [A,B,C,M] | [D,E,F,N] |
Does Pyspark have a way of handling this kind of wrangling with a dataframe to create this kind of an expected result?
Added inline comments, Check below code.
df \
.select(F.col("id"),F.col("Grouping"),F.array(F.col("Field_1"),F.col("Field_2"),F.col("Field_3"),F.col("Field_4")).as("grouping_list"))\ # Creating array of required columns.
.groupBy(F.col("Grouping"))\ # Grouping based on Grouping column.
.agg(F.first(F.col("id")).alias("id"),F.first(F.col("grouping_list")).alias("Group_by_list1"),F.last(F.col("grouping_list")).alias("Group_by_list2"))\ # first value from id, first value from grouping_list list, last value from grouping_list
.select("id","Grouping","Group_by_list1","Group_by_list2")\ # selecting all columns.
.show(false)
+---+--------+--------------+--------------+
|id |Grouping|Group_by_list1|Group_by_list2|
+---+--------+--------------+--------------+
|1 |AA |[A, B, C, M] |[D, E, F, N] |
+---+--------+--------------+--------------+
Note: This solution will give correct result only if DataFrame has two rows.

How to combine two columns into one in Sqlite and also get the underlying value of the Foreign Key?

I want to be able to combine two columns from a table into one column then to to be able to get the actual value of the foreign keys. I can do these things individually but not together.
Following the answer below I was able to combine the two columns into one using the first sql statement below.
How to combine 2 columns into a new one in sqlite
The combining process is shown below:
+---+---+
|HT | AT|
+---+---+
|1 | 2 |
|5 | 7 |
|9 | 5 |
+---+---+
into one column as shown:
+---+
|HT |
+---+
| 1 |
| 5 |
| 9 |
| 2 |
| 7 |
| 5 |
+---+
The second SQL statement show's the actual value of each foreign key corresponding to each foreign key id. The Foreign Key Table.
+-----+------------------------+
|T_id | TN |
+-----+------------------------+
| 1 | 'Dallas Cowboys |
| 2 | 'Chicago Bears' |
| 5 | 'New England Patriots' |
| 7 | 'New York Giants' |
| 9 | 'New York Jets' |
+-----+------------------------+
sql = "SELECT * FROM (SELECT M.HT FROM M UNION SELECT M.AT FROM Match)t"
The second sql statement lets me get the foreign key values for each value in M.HT.
sql = "SELECT M.HT, T.TN FROM M INNER JOIN T ON M.HT = T.Tid WHERE strftime('%Y-%m-%d', M.ST) BETWEEN \'2015-08-01\' AND \'2016-06-30\' AND M.Comp = 6 ORDER BY M.ST"
Result of second SQL statement:
+-----+------------------------+
| HT | TN |
+-----+------------------------+
| 1 | 'Dallas Cowboys |
| 5 | 'New England Patriots' |
| 9 | 'New York Jets' |
+-----+------------------------+
But try as I might I have not been able to combine these queries!
I believe the following will work (assuming that the tables are Match and T and baring the WHERE and ORDER BY clauses for brevity/ease) :-
SELECT DISTINCT(m.ht), t.tn
FROM
(SELECT Match.HT FROM Match UNION SELECT Match.AT FROM Match) AS m
JOIN T ON t.tid = m.ht
JOIN Match ON (m.ht = Match.ht OR m.ht = Match.at)
/* WHERE and ORDER BY clauses using Match as m only has columns ht and at */
WHERE strftime('%Y-%m-%d', Match.ST)
BETWEEN \'2015-08-01\' AND \'2016-06-30\' AND Match.Comp = 6
ORDER BY Match.ST
;
Note only tested without the WHERE and ORDER BY clause.
That is using :-
DROP TABLE IF EXISTS Match;
DROP TABLE IF EXISTS T;
CREATE TABLE IF NOT EXISTS Match (ht INTEGER, at INTEGER, st TEXT DEFAULT (datetime('now')));
CREATE TABLE IF NOT EXISTS t (tid INTEGER PRIMARY KEY, tn TEXT);
INSERT INTO T (tn) VALUES('Cows'),('Bears'),('a'),('b'),('Pats'),('c'),('Giants'),('d'),('Jets');
INSERT INTO Match (ht,at) VALUES (1,2),(5,7),(9,5);
/* Directly without the Common Table Expression */
SELECT
DISTINCT(m.ht), t.tn,
Match.st /*<<<<< Added to show results of obtaining other values from Matches >>>>> */
FROM
(SELECT Match.HT FROM Match UNION SELECT Match.AT FROM Match) AS m
JOIN T ON t.tid = m.ht
JOIN Match ON (m.ht = Match.ht OR m.ht = Match.at)
/* WHERE and ORDER BY clauses here using Match */
;
Noting that limited data (just the one extra column) was used for brevity
Results in :-

SUM of multiple VLOOKUP

It seems like a simple problem, but I do not manage to solve it. I have the following tables:
Values
| Key | Value |
|-----|-------|
| A | 1 |
| B | 2 |
| C | 3 |
Results
| Foo | Bar |
|-----|-----|
| A | B |
| C | B |
| A | A |
| B | C |
| ... | ... |
What I am looking for is a final row in the Results table that looks for the key in the Values table, takes its value and sums all the keys in a column (i.e. FOO and BAR). The final result would be:
| Foo | Bar |
|-----|-----|
| A | B |
| C | B |
| A | A |
| B | C |
|-----|-----|
| 7 | 8 |
I have been trying with different VLOOKUP, INDEX and MATCH functions, but still I am not able. Any ideas?
I asume you want a solution without extra columns. Then you are into Array formulas (a.k.a CSE or ControlShiftEnter functions).
Combination of {=SUM(VLOOKUP(...))} doesn't work, but combination of {=SUM(SUMIF(...))} does:
in A12 enter =SUM(SUMIF($A$1:$A$3;A7:A10;$B$1:$B$3)) and save with Ctrl+Shift+Enter. You then can copy this to B12.
Problem is you will need to change the Array function every time you add values to the list A7:B10 (or you initially make the range sufficiently large) ... this would speak more for extra =VLOOKUP() columns as suggested by CustomX.
I'm not sure of other solutions, but you could solve this by using an extra 2 columns, E and F for example.
Enter this in column E: =VLOOKUP(C2;$A$1:$B$3;2;0)
Enter this in column F: =VLOOKUP(D2;$A$1:$B$3;2;0)
Pull the formulas down and add a SUM at the bottom of column C and D to calculate columns E and F.
Extra: These are the columns I used for your examples.
Key = column A
Value = column B
Foo = column C
Bar = column D

Display all matching values in one comma separated cell

I have two columns of data in an Excel 2010 spreadsheet. In Column A is a category, and in Column B is a value. There will be multiple values in Column B for each unique category in Column A.
What I want to achieve in a separate sheet is to display all of the values for each each unique category in one comma (or semi-colon etc) separated cell.
For example, if my first sheet looks like this:
----------------------
| Category | Value |
----------------------
| Cat1 | Val A |
| Cat1 | Val B |
| Cat1 | Val C |
| Cat2 | Val D |
| Cat3 | Val E |
| Cat3 | Val F |
| Cat3 | Val G |
| Cat3 | Val H |
----------------------
I'd want to display the following in another sheet:
---------------------------------------
| Category | Value |
---------------------------------------
| Cat1 | Val A,Val B,Val C |
| Cat2 | Val D |
| Cat3 | Val E,Val F,Val G, Val H |
---------------------------------------
Can this be achieved with a formula? Vlookup will only find the first matching value, of course. I've Googled it, but the individual search terms involved in the query are so generic I'm getting swamped with inappropriate results.
Please try (in a copy on another sheet):
Insert a column on the left with =IF(B2<>B3,"","x") in A2 (assuming Category is in B1). In D2 put =IF(B1=B2,D1&", "&C2,C2) and copy both formulae down to suit. Copy and Paste Special Values over the top. Filter on ColumnA for x and delete selected rows. Unfilter and delete ColumnA.

Resources