I have an Oracle DB table with one column that contains comma-separated values (not my design!). In a web application when user attempts to add an object that results in making a new entry into this table, I am trying to check for duplicates.
I know how to check for single value (e.g. trying to add "ABC") but not sure how to do it if user is adding ABC, DEF, GHI, ...
Let's say table is called PRINTER_MAPPING_CFG and column in question in called RUNOUTS. A typical row might look like:
001, 002, 006, 008, 009
I use the following to check for single value:
SELECT COUNT(*)
FROM PRINTER_MAPPING_CFG
WHERE ',' || RUNOUTS || ',' LIKE '%,' || '001' || ',%'
If user is adding 003, 004, 006, 007, 008, I am not sure how to proceed (here 006 and 008 are already in the table).
I can split and search for each separately but looks wasteful, if there is an alternative.
Right, not the best design one could imagine.
See whether this helps:
SQL> with
2 printer_mapping_cfg (id, runouts) as
3 -- this is what you currently have. I included the ID column
4 -- as you probably don't have just one row in that table, do you?
5 (select 1, '001, 002, 006, 008, 009' from dual union all
6 select 2, '005, 006, 007' from dual
7 ),
8 new_value (runouts) as
9 -- this is what user enters
10 (select '003, 004, 006, 007, 008' from dual), --> 006 and 008 exist for ID = 1
11 split_pmc as --> 006 and 007 exist for ID = 2
12 (select p.id,
13 trim(regexp_substr(p.runouts, '[^,]+', 1, column_value)) val
14 from printer_mapping_cfg p cross join
15 table(cast(multiset(select level
16 from dual
17 connect by level <= regexp_count(p.runouts, ',') + 1
18 ) as sys.odcinumberlist))
19 )
20 select s.id,
21 listagg(s.val, ', ') within group (order by s.val) duplicates
22 from split_pmc s
23 where s.val in (select trim(regexp_substr(n.runouts, '[^,]+', 1, level))
24 from new_value n
25 connect by level <= regexp_count(n.runouts, ',') + 1
26 )
27 group by s.id
28 order by s.id;
ID DUPLICATES
---------- ------------------------------
1 006, 008
2 006, 007
SQL>
What does it do?
lines #1 - 7 represent data you already have
lines #8 - 10 is an input string, the one that user types
lines #11 - 19 are used to split existing values into rows
lines #20 - 28 represent the final select, where
lines #23 - 26 check whether values we found in split_pmc CTE exist in newly added values (which are already split into rows)
line #21 aggregates duplicate values into a single string
[EDIT]
As you already have the PRINTER_MAPPING_CFG table, your code would begin with
SQL> with
2 new_value (runouts) as ...
You'd still reference the PRINTER_MAPPING_CFG table just as I did.
Another condition(s) would be added somewhere at line #18 (in code I posted above).
Related
I have the following table -
Columns -
row number (name - colA),
integer - a random number (colB) might be any value.
id - short string, might be null (colC).
1 2 'a'
2 4 'b'
3 6
4 1 'c'
Expected result -
1 2 'a'
2 4 'b' 'a'
3 6 'a, b'
4 1 'c' NULL
5 3 'd' 'b'
The process of generating the last column -
For each row - Taking colB from that row ("examined row") and test each row above it (after ordering by colA) ("tested row").
examined_row.colB >= tested_row.colB. If so - we take ColC.
After checking all the tested rows - we string_agg(colC)
Does anyone has idea how to do it in SQL without massive inner join ?
Thought about string agg, however, for the first part (the condition) - I have no idea how to handle the value from the examined row and compare it to the tested row, since analytical function can't be into string_agg().
Also sub query might not help as I see it.
The only solution I thought is inner join, but it not effecient.
Even if you have solution with array_agg is also better.
Thanks !
Below is for BigQuery Standard SQL
#standardSQL
select colA, colB, colC,
(
select string_agg(colC, ', ')
from t.candidates
where t.colB > colB
order by colA
) as colD
from (
select *, array_agg(struct(colB, colC)) over win as candidates
from `project.dataset.table`
window win as (order by colA rows between unbounded preceding and 1 preceding )
) t
if to apply to sample data from your question - output is
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two data frames
The first frame is my IDs, some 'old code' matches to one 'Master ID'. Some OLD code are not matched to a Master ID.
ID Dataframe
MASTER ID OLD CODE
MASTER1 1A
MASTER1 1B
MASTER2 2
MASTER3 3
4
Sales
OLD CODE Salesvalues
1A 10
1B 15
2 6
3 8
4 5
If I am doing a right join or an outer join, it returns more rows then my original sales table. How I can make a join on the first matching 'MASTER ID' match and keeping the same number of rows(no multiple duplicate rows). I would like if there is no match for the'old code' on 'master ID', that will returns NA.
Expected Merge dataframe
OLD CODE Salesvalues MASTER ID (Join column)
1A 10 MASTER1
1B 15 MASTER1
2 6 MASTER2
3 8 MASTER3
4 5 NA
See if this works for you.
Sales.merge(ID Dataframe,on='OLD_CODE',how ='outer')
I need to compare one column values to another column values in the below table "BRAND". I tried considering the length of the string alone between two columns but in spite of that.. the values in the columns differ and so it is giving incorrect result.
Table BRAND:
ID brand_1 brand_2 Status
---------------------------------------------------
1 SAC SAC True
2 APP BBB BBB APP True
3 ABC OND DEG DEG ABC OND True
4 GIF APP GIF False
5 GHY PPA GHY PPA ABC False
6 MNC CGA IPK GIT ABC ITY False
I need to return the rows that are False since there is no matching between the columns brand_1 and brand_2. The Status column doesn't exist in the data; I added it here to demonstrate which rows are considered "false" and should be returned. This column shouldn't exist in the output either.
Output:
ID brand_1 brand_2
-------------------------------------------
4 GIF APP GIF
5 GHY PPA GHY PPA ABC
6 MNC CGA IPK GIT ABC ITY
Please help me out.
Here's one way (not the fastest, but easy to write and maintain). The idea is to split each input string into its components, sort them alphabetically and then reassemble them. I do that using lateral (so each row is processed independently of other rows), a JSON trick to split the strings, and LISTAGG to put them back together.
with
brand (id, brand_1, brand_2) as (
select 1, 'SAC' , 'SAC' from dual union all
select 2, 'APP BBB' , 'BBB APP' from dual union all
select 3, 'ABC OND DEG', 'DEG ABC OND' from dual union all
select 4, 'GIF' , 'APP GIF' from dual union all
select 5, 'GHY PPA' , 'GHY PPA ABC' from dual union all
select 6, 'MNC CGA IPK', 'GIT ABC ITY' from dual
)
select id, brand_1, brand_2,
case when b1 = b2 then 'True' else 'False' end as status
from brand,
lateral( select listagg(token, ' ') within group (order by token) as b1
from json_table( '["' || replace(brand_1, ' ','","') || '"]',
'$[*]' columns(token varchar2 path '$'))
),
lateral( select listagg(token, ' ') within group (order by token) as b2
from json_table( '["' || replace(brand_2, ' ','","') || '"]',
'$[*]' columns(token varchar2 path '$'))
)
;
ID BRAND_1 BRAND_2 STATUS
-- ----------- ----------- ------
1 SAC SAC True
2 APP BBB BBB APP True
3 ABC OND DEG DEG ABC OND True
4 GIF APP GIF False
5 GHY PPA GHY PPA ABC False
6 MNC CGA IPK GIT ABC ITY False
Edit: To get the desired result in the edited question, remove the case expression from the SELECT clause, and add a WHERE clause at the end of the query: ... where b1 != b2 (assuming the input strings can't be null; if they can be null, you will have to handle according to your business needs).
Customer# Date Qty, Cost
12 1/2/2013 3 500
12 1/3/2013 5 200
12 1/4/2013 4 200
13 1/5/2013 1 150
14 1/6/2013 2 110
14 1/7/2013 1 110
15 1/8/2013 1 110
I have a table similar to the above table (with millions of records and 26 column).I would like to create two table based of this one. the first table is to show me the first order of each customer and its associated column and the second one is to show me data for the second order of each customer ( if they don't have it will be null).
the Result i am looking for
Table one- First order
Customer#, Date , Qty, Cost
12 , 1/2/2013, 3, 500
13 , 1/5/2013, 1, 150
14 , 1/6/2013, 2, 110
15 , 1/8/2013, 1, 110
Table two- second order table
Customer#, Date , Qty , Cost
12 , 1/3/2013, 5, 200
14 , 1/7/2013, 1 , 110
The formula i tried but failed to work
=INDEX(B:D,MATCH(A3,A:A,0))
I would appreciate if someone shares their ideas how to use the Index and match function in excel to solve this question.
I was able to solve the issue above using Tableau. I just used the Index() function to calculate the rank based on their order date and id and filtered by the rank to get the first and second order table.
I want some cohort analysis on a userbase. We have 2 tables "signups" and "sessions", where users and sessions both have a "date" field. I'm looking to formulate a query that yields a table of numbers (with some blanks) that shows me: a count of users who created an account on a particular day and ho also have a session created , indicating that he returned on that day, 3rd day, 7th day and 14 day.
created_at d1 d3 d7 d14
05/07/2007 12 * * *
04/07/2007 49 21 1 2
03/07/2007 45 30 * 3
02/07/2007 47 41 18 12
...
In this case, 47 users who created an account on 2/07/2007 returned after 3 days(d3)
Can I perform this in a single MySQL query?
Yes you can:
Select Signups.date as created at,
count (distinct case when datediff(sessions.date, signups.date)=1 then signups.users else null end) as d1,
count (distinct case when datediff(sessions.date, signups.date)=3 then signups.users else null end) as d3,
count (distinct case when datediff(sessions.date, signups.date)=7 then signups.users else null end) as d7,
count (distinct case when datediff(sessions.date, signups.date)=14 then signups.users else null end) as d14 from signups
left join sessions using(users)
group by 1