I have a dataset as this:
Group Owner
ABC John
ABC
TTT
TTT
TTT
CBS Alen
CBS Tim
SGD
SGD
Now I need search the dataset to find all rows whose Owner are ALL empty, like TTT and SGD, (not ABC because it has a row whose owner is John). But I only need select one item not all of them (better the first one). How could I do this using c#?
Since you haven't specified a database, I'll use MySQL5.6 in sqlfiddle.com but the approach would most likely be similar in any relational database.
First, let's set up the schema:
create table x (grp varchar(10), ownr varchar(10), row int);
insert into x (grp, ownr, row) values ('abc', 'john', 1);
insert into x (grp, ownr, row) values ('abc', '', 2);
insert into x (grp, ownr, row) values ('ttt', '', 3);
insert into x (grp, ownr, row) values ('ttt', '', 4);
insert into x (grp, ownr, row) values ('ttt', '', 5);
insert into x (grp, ownr, row) values ('cbs', 'alan', 6);
insert into x (grp, ownr, row) values ('cbs', 'tim', 7);
insert into x (grp, ownr, row) values ('sgd', '', 8);
insert into x (grp, ownr, row) values ('sgd', '', 9);
The first step is to get a list of the groups that you don't want in the output. The query for that would be:
select distinct grp from x where ownr <> ''
grp
---
abc
cbs
So then you simply ask for all rows with a group other than those (I'll also order by row here):
select * from x where grp not in (
select distinct grp from x where ownr <> ''
) order by row
and that gets you all the other rows:
grp ownr row
--- ---- ---
ttt 3
ttt 4
ttt 5
sgd 8
sgd 9
Now here's where it becomes slightly unclear what you want. If you just want the first of that overall set, you can simply use a limiting clause such as:
select * from x where grp not in (
select distinct grp from x where ownr <> ''
) order by row limit 1
grp ownr row
--- ---- ---
ttt 3
If, however, you need the first of each group, it can be done with an aggregating clause as follows:
select grp, '' as ownr, min(row) as row
from x where grp not in (
select distinct grp from x where ownr <> ''
) group by grp
grp ownr row
--- ---- ---
sgd 8
ttt 3
Obviously I've made certain assumptions about things like:
what an empty owner is;
what you consider the "first" of each subset to be; and
which database you're using for a back end.
But the general approach should remain the same even if those assumptions need to be modified.
Related
I have two SQLite tables as Sqlite Table 1 and Sqlite Table 2.
Table1 has ID,Name and Code columns. Table2 has ID, Values and Con columns.
I want to create Excel as ID,Name,Code and Values Columns. ID,Name and Code columns comes from Table1 and Values column comes from table2 with sum value of Values column of table2 with two conditions are ID columns should be match and Con column satisfied with Done Value.
Below image is for reference:
I would approach this problem in steps.
First extract the sql tables into pandas dataframes. I am no expert on that aspect of the problem, but assuming you have two dataframes like the following:
df1 = ID Name Code
0 1 a 1a
1 2 b 2b
2 3 a 3c
and
df2 = ID Values Con
0 1 5 Done
1 2 9 No
2 1 7 Done
3 2 4 No
4 1 8 No
5 3 1 Done
def sumByIndex(dx, row):
# return sum value or 0 if ID doesn't exist
idx = row['ID']
st = list(dx['ID'])
if idx in st:
return dx[dx['ID'] == idx]['Values'].values[0]
else:
return 0
def combineFrames(d1, d2):
#Return updated version of d1 with "Values" column added
d3 = d2[d2['Con'] == 'Done'].groupby('ID', as_index= False).sum()
d1['Values'] = d1.apply( lambda row: sumByIndex(d3, row), axis = 1)
return d1
then print(combineFrames(df1, df2)) yields:
ID Name Code Values
0 1 a 1a 12
1 2 b 2b 0
2 3 a 3c 1
My program obtains the data from sqllite table 1 and sqlite table 2 in the form of lists (tuples and lists) with the corresponding values of ID, Name, Code and ID, Values, Con by making the request to the database like this 'SELECT * FROM sqlite table 1'
# sqlite table 1
table1 = [[5674, 'a', '1a'], [3385, 'b', '2b'], [5548, 'a', '3c']]
# sqlite table 2
table2 = [(5674, 5, 'Done'), (3385, 9, 'No'), (5674, 7, 'Done'), (3385, 4, 'No'), (5674, 8, 'No'), (5548, 1, 'Done')]
To begin I will add all the values Values in a dictionary that matches it with the corresponding ID
map_values = {table2[i][0]:0 for i in range(len(table2))}
for i in range(len(table2)):
if (table2[i][2] == 'Done'):
map_values[table2[i][0]] += table2[i][1]
then I define the pandas.DataFrame() instance using sqlite table 1 by this way:
df = pd.DataFrame(table1, index=[i for i in range(1, len(table1)+1)], columns=["ID", "Name", "Code"])
also the values of "Values" are stored in that order to later be added with a new Values column.
df["Values"] = list(map_values.values())
output:
ID Name Code Values
1 5674 a 1a 12
2 3385 b 2b 0
3 5548 a 3c 1
excel:
df.to_excel(r'./excel_file.xlsx', index=False)
I'm attempting to perform some sort of upsert operation in U-SQL where I pull data every day from a file, and compare it with yesterdays data which is stored in a table in Data Lake Storage.
I have created an ID column in the table in DL using row_number(), and it is this "counter" I wish to continue when appending new rows to the old dataset. E.g.
Last inserted row in DL table could look like this:
ID | Column1 | Column2
---+------------+---------
10 | SomeValue | 1
I want the next rows to have the following ascending ids
11 | SomeValue | 1
12 | SomeValue | 1
How would I go about making sure that the next X rows continues the ID count incrementally such that the next rows each increases the ID column by 1 more than the last?
You could use ROW_NUMBER then add it to the the max value from the original table (ie using CROSS JOIN and MAX). A simple demo of the technique:
DECLARE #outputFile string = #"\output\output.csv";
#originalInput =
SELECT *
FROM ( VALUES
( 10, "SomeValue 1", 1 )
) AS x ( id, column1, column2 );
#newInput =
SELECT *
FROM ( VALUES
( "SomeValue 2", 2 ),
( "SomeValue 3", 3 )
) AS x ( column1, column2 );
#output =
SELECT id, column1, column2
FROM #originalInput
UNION ALL
SELECT (int)(x.id + ROW_NUMBER() OVER()) AS id, column1, column2
FROM #newInput
CROSS JOIN ( SELECT MAX(id) AS id FROM #originalInput ) AS x;
OUTPUT #output
TO #outputFile
USING Outputters.Csv(outputHeader:true);
My results:
You will have to be careful if the original table is empty and add some additional conditions / null checks but I'll leave that up to you.
I have a situation like this (T-SQL):
Table 1: dbo.Printers
EmulationID EmulationDescription PrinterID Name
34,15,2 NULL 12 HP 1234
15,2 NULL 13 IBM 321
15 NULL 14 XYZ
Table 2: dbo.Emulations
EmulationID Description
34 HP
15 IBM
2 Dell
EmulationID column in dbo.Printers table is nvarchar/unicode string datatype, and integer datatype in the dbo.Emulations table.
Now I have to UPDATE the **EmulationDescription** column in the dbo.Printers table using a lookup on the dbo.Emulations table through the EmulationID column.
I need to get data like this in the dbo.Printers table:
EmulationID EmulationDescription PrinterID Name
34,15,2 HP,IBM,Dell 12 HP 1234
15,2 IBM,Dell 13 IBM 321
15 IBM 14 XYZ
Can someone help me in detail, on how to get this issue resolved ?
I created the user-defined function dbo.ParseIdListToTable to convert string data in one row into multiple rows. However, I do not know to proceed further, on how to exactly join and then update.
Any suggestion will be greatly appreciated.
You could do something like this:
CREATE FUNCTION [dbo].[CSVToTable] (#InStr VARCHAR(MAX))
RETURNS #TempTab TABLE
(id int not null)
AS
BEGIN
;-- Ensure input ends with comma
SET #InStr = REPLACE(#InStr + ',', ',,', ',')
DECLARE #SP INT
DECLARE #VALUE VARCHAR(1000)
WHILE PATINDEX('%,%', #INSTR ) <> 0
BEGIN
SELECT #SP = PATINDEX('%,%',#INSTR)
SELECT #VALUE = LEFT(#INSTR , #SP - 1)
SELECT #INSTR = STUFF(#INSTR, 1, #SP, '')
INSERT INTO #TempTab(id) VALUES (#VALUE)
END
RETURN
END
GO
DECLARE #Description VARCHAR(1000)
SELECT P.EmulationID,
(SELECT #Description = COALESCE(#Description + ',', '') + QUOTENAME(Description)
FROM dbo.Emulations
WHERE EmulationID IN (SELECT * FROM dbo.CSVToTable(P.EmulationID))) AS 'Emulation Description,
P.PrinterID,
P.Name
FROM dbo.Printers P
Is there a way to delete all duplicate rows and the original entry in either excel or access?
I need to delete whole rows that match in 3 columns. Here is a visual (Bottom table is what the table should become; in this case the duplicates + original with the same Part number, manufacturer and manufacture number are deleted):
This seems to work for me in Access:
DELETE FROM parts
WHERE EXISTS
(
SELECT p2.[PART NUMBER], p2.[MANUFACTURER], p2.[MANUFACTURER NUMBER]
FROM parts p2
WHERE parts.[PART NUMBER] = p2.[PART NUMBER]
AND parts.[MANUFACTURER] = p2.[MANUFACTURER]
AND parts.[MANUFACTURER NUMBER] = p2.[MANUFACTURER NUMBER]
GROUP BY p2.[PART NUMBER], p2.[MANUFACTURER], p2.[MANUFACTURER NUMBER]
HAVING COUNT(*) > 1
)
When I run it on my test data...
PART NUMBER MANUFACTURER QUALITY MANUFACTURER NUMBER
----------- ------------ ------- -------------------
123 GORD 1 750
123 OTHER 3 321
123 OTHER 4 321
...it deletes the two "OTHER" rows but leaves the "GORD" row alone.
DELETE * FROM MyTable WHERE PartNumber in (SELECT MyTable.PartNumber
FROM MyTable
GROUP BY MyTable.PartNumber
HAVING (((Sum(1))>1)));
This should do it for you. It checks all three fields, and deletes the original and all duplicates.
DELETE
parts.*
FROM parts
WHERE (( ((SELECT Count (*)
FROM parts AS P
WHERE ( P.partnum & P.manf & P.manfnum =
parts.partnum & parts.manf & parts.manfnum )
AND ( P.partnum <= parts.partnum ))) > 1 ));
Can you give me a query,that converts the rows values which are of type varchars into a single column with any delimiter.
e.g
table with 2 columns
col1 |col2
1 | m10
1 | m31
2 | m20
2 | m50
now i want output as
col1| col2
1|m10:m31
2|m20:m50
Do you always have matched pairs, no more no less?
select
col1,
count(*)
from table
group by col1
having count(*) <> 2
would give you zero results?
if so, you can just self join...
declare #delimiter varchar(1)
set #delimiter = :
select
t1.col1, t1.col2 + #delimiter + t2.col2
from tablename t1
inner join tablename t2
on t1.col1 = t2.col1
and t1.col2 <> t2.col2
One way to do that is using cursors.
With the cursor you can fetch a row at a time!
Pseudo-code would be:
if actual_col1 = last_col1
then col2_value = col2_value + actual_col2
else
insert into #temptable value(col1, col2_value)
col2_value = actual_col2
end
Check HERE to know how to use them.
use this solution :
SELECT list(col2, ':') as col2 FROM table_name group by col1 ;
Please use the below logic, the table #t1 will be the final table.
create table #t123(a char(2), b char(2))
go
create table #t1(a char(2), c char(100) default '')
go
Insert into #t123 values ('a','1')
Insert into #t123 values ('a','2')
Insert into #t123 values ('a','3')
Insert into #t123 values ('b','1')
Insert into #t123 values ('c','1')
Insert into #t123 values ('d','1')
Insert into #t123 values ('d','1')
go
insert into #t1 (a) Select distinct a from #t123
go
Select distinct row_id = identity(8), a into #t1234 from #t123
go
Declare #a int, #b int, #c int, #d int, #e int, #f char(2), #g char(2), #h char(2)
Select #a =min(row_id), #b=max(row_id) from #t1234
While #a <= #b
Begin
Select #f = a , #h = '', #g = '' from #t1234 where row_id = #a
Update #t1 set c = '' where a = #f
Select row_id = identity(8), b into #t12345 from #t123 where a = #f
Select #c =min(row_id), #d=max(row_id) from #t12345
While #c <= #d
begin
Select #g = b from #t12345 where row_id = #d
Update #t1 set c = #g +' '+ c where a = #f --change delimiter
Select #d = #d-1
End
Drop table #t12345
Select #a = #a+1
End
go
Select * from #t1 -- final table with transposed values