Using result of a set in a select query in python - python-3.x

I have a select query to fetch specific columns from a mysql table. These specific columns must come from the intersection of the two sets.
Suppose I have two sets
A = {'age', 'description', 'name', 'payment_mode', 'id'}
# has all the column names of the table
and
B={'name', 'id', 'class'}
# input column names coming from a file (where values can be changed)
Now I do
inter= A.intersection(B)
# would result in {'name', 'id'}
So I would want the select query to be select {inter} from customers.
How can a write a dynamic query to get the column names to come from the result of the intersection?

Since you already have the desired columns in the set inter, all you need to do is construct the query string.
In Python 3.6 and above, you can use f-strings to generate the query as follows:
query = f"select {', '.join(inter)} from customers"
Or, you can do normal concatenation as follows:
query = 'select ' + ', '.join(inter) + ' from customers'
Both cases will result in query being "select name, id from customers" when inter is {'name', 'id'}

Related

Python SQLite 3 query multiple query

I've a problem to in building a query for Python SQLite3 to do the following:
Count a word which appears in columns, if word appears more than 1 time count one.
I've attached a picture to illustrate my table format.
I tried this but the result still counts duplicate values with same ID.
"SELECT id, value, count(value) FROM table WHERE type like'%hi%' GROUP BY value ORDER BY COUNT(*)<1 DESC"
The result needs to be like:
Hi all you need can be achieved with GROUP BY clause.
This should help:
SELECT
id
,value
,1 AS cnt
FROM table
GROUP BY id, value
ORDER BY id
What you're looking for is DISTINCT clause or GROUP BY as mentioned by Peter.
for GROUP BY use this syntax:
SELECT
id
,value
,1 AS cnt
FROM table
GROUP BY id, value
for DISTINCT use this one:
SELECT DISTINCT
id
,value
,1 AS cnt
FROM table

SQL - Insert a row into table only if not exists and count is less than a threshold

Other similar questions deal these two problems separately, but I want to merge them in a single statement. I'm using Python3 with psycopg2 library on a PostgreSQL database.
I have a table with 3 fields: ID, name, bool (BIT)
I want to insert a row in this table only if does not exists an other row with same 'name' and 'bool' = 0 and the total count of rows with same ID is less than a given threshold.
More specific my table should contain at most a given threshold number of rows with same ID. Those rows can have the same 'name', but only one of those rows with same ID and same 'name' can have 'bool'= 0.
I tried with this:
INSERT INTO table
SELECT 12345, abcdf , 0 FROM table
WHERE NOT EXISTS(
SELECT * FROM table
WHERE ID = 12345 AND name = abcdf AND bool = 0)
HAVING (SELECT count(*) FROM table
WHERE ID = 12345) < threshold
RETURNING ID;
but the row is inserted anyway.
Then I tried the same statement replacing 'HAVING' with 'AND', but it insert all the threshold rows together.

Microsoft Excel Power Query: Select columns that contain strings from a string list

Background
I have a dataset with 10,000+ variables as column headers, which I want to reduce to the amount needed. I know how to select a sample of columns by listing columns that contain manually specified strings, say "glu" and "pep", that the columns must contain in order to be selected. This is the M code used to select the sample columns:
let
Source = Excel.CurrentWorkbook(){[Name="data"]}[Content],
ColumnsToSelect = List.Select(Table.ColumnNames(Source), each Text.Contains(_, "glu") or Text.Contains(_, "pep")),
SelectColumns = Table.SelectColumns(Source, ColumnsToSelect)
in
SelectColumns
This Power Query produces a table that i call "Data". Since I want to select columns based on multiple strings they must contain, I have made a dynamic list of strings that I have called "Outcomes". I want my Power Query to utilize this list of strings when choosing what columns to select.
Question
Is it possible to get my Power Query to utilize this dynamic list in the List.Select() or Table.SelectColumns() function or any other function, that will make my Power Query select only the columns that contain the strings on the list?
Use with this lines:
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
Source2 = Excel.CurrentWorkbook(){[Name="Outcomes"]}[Content],
Outcomes = Source2[Outcomes],
UnpivotedColumns = Table.UnpivotOtherColumns(Source, {}, "ColumnNames", "Filters"),
FilteredRows = Table.SelectRows(UnpivotedColumns, each List.AnyTrue(List.Transform(Outcomes, (substring) => Text.Contains([Filters], substring)))),
ColumnNames = List.Sort(List.Distinct(FilteredRows[ColumnNames]),Order.Ascending),
SelectColumns = Table.SelectColumns(Source,ColumnNames)
in
SelectColumns
the magic is in this line:
FilteredRows = Table.SelectRows(UnpivotedColumns, each List.AnyTrue(List.Transform(Outcomes, (substring) => Text.Contains([Filters], substring)))),

WHERE variable = ( subquery ) in OpenSQL

I'm trying to retrieve rows from a table where a subquery matches an variable. However, it seems as if the WHERE clause only lets me compare fields of the selected tables against a constant, variable or subquery.
I would expect to write something like this:
DATA(lv_expected_lines) = 5.
SELECT partner contract_account
INTO TABLE lt_bp_ca
FROM table1 AS tab1
WHERE lv_expected_lines = (
SELECT COUNT(*)
FROM table2
WHERE partner = tab1~partner
AND contract_account = tab1~contract_account ).
But obviously this select treats my local variable as a field name and it gives me the error "Unknown column name "lv_expected_lines" until runtime, you cannot specify a field list."
But in standard SQL this is perfectly possible:
SELECT PARTNER, CONTRACT_ACCOUNT
FROM TABLE1 AS TAB1
WHERE 5 = (
SELECT COUNT(*)
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT );
So how can I replicate this logic in RSQL / Open SQL?
If there's no way I'll probably just write native SQL and be done with it.
The program below might lead you to an Open SQL solution. It uses the SAP demo tables to determines the plane types that are used on a specific number of flights.
REPORT zgertest_sub_query.
DATA: lt_planetypes TYPE STANDARD TABLE OF s_planetpp.
PARAMETERS: p_numf TYPE i DEFAULT 62.
START-OF-SELECTION.
SELECT planetype
INTO TABLE lt_planetypes
FROM sflight
GROUP BY planetype
HAVING COUNT( * ) EQ p_numf.
LOOP AT lt_planetypes INTO DATA(planetype).
WRITE: / planetype.
ENDLOOP.
It only works if you don't need to read fields from TAB1. If you do you will have to gather these with other selects while looping at your results.
For those dudes who found this question in 2020 I report that this construction is supported since ABAP 7.50. No workarounds are needed:
SELECT kunnr, vkorg
FROM vbak AS v
WHERE 5 = ( SELECT COUNT(*)
FROM vbap
WHERE kunnr = v~kunnr
AND vkorg = v~vkorg )
INTO TABLE #DATA(customers).
This select all customers who made 5 sales orders within some sales organization.
In ABAP there is no way to do the query as in NATIVE SQL.
I would advice not to use NATIVE SQL, instead give a try to SELECT/ENDSELECT statement.
DATA: ls_table1 type table1,
lt_table1 type table of table1,
lv_count type i.
SELECT PARTNER, CONTRACT_ACCOUNT
INTO ls_table1
FROM TABLE1.
SELECT COUNT(*)
INTO lv_count
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT.
CHECK lv_count EQ 5.
APPEND ls_table1 TO lt_table1.
ENDSELECT
Here you append to ls_table1 only those rows where count is equals to 5 in selection of table2.
Hope it helps.

how to join two or more tables and result set having all distinct values

I have some 20 excel files containing data. all the tables have same columns like id name age location etc..... each file has distinct data but i don't know if data in one file is again repeated in another file. so i want to join all the files and the result st should contain distinct values. please help me out with this problem as soon as possible. i want the result set to be stored in an access database.
I would recomend either linking the sheets in acces, or importing the sheets as tabels.
Then from there try to determine using a DISTINCT select from the tables/sheets the keys required, and only selecting the records as required.
In SQL, you can use JOIN or NATURAL JOIN to join tables. I would look into NATURAL JOIN since you said all tables have the same values.
After that you can use DISTINCT to get distinct values.
I'm not sure if this is what you're looking for though: your question asks about excel but you've tagged it with SQL.
If you can use all the tables in one query, you can use a union to get the distinct rows:
select id, name, age, location from Table1
union
select id, name, age, location from Table2
union
select id, name, age, location from Table3
union
...
You can insert the records directly from the result:
insert into ResultTable
select id, name, age, location from Table1
union
....
If you only can select from one table at a time, you can skip the insert of rows that are already in the table:
insert into ResultTable
select t.id, t.name, t.age, t.location from Table1 as t
left join ResultTable as r on r.id = t.id
where r.id is null
(Assuming that id is a unique field identifying the record.)
It seems the unique set of data you want is this:
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T1
UNION
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T1
...but that you then want to arbitrarily apply a sequence of integers as id (rather than using the id values from the Excel tables).
Because Access Database Engine does not support common table expressions and Excel does not support VIEWs, you will have to repeat that UNION query as derived tables (hopefully the optimizer will recognize the repeat?) e.g. using a correlated subquery to get the row number:
SELECT (
SELECT COUNT(*) + 1
FROM (
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T1
UNION
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T1
) AS DT1
WHERE DT1.name < DT2.name
) AS id,
DT2.name, DT2.loc
FROM (
SELECT T2.name, T2.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T2
UNION
SELECT T2.name, T2.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T2
) AS DT2;
Note:
i want the result set to be stored in
an access database
Then maybe you should migrate the Excel data into a staging table in your Access database and do the data scrubbing from there. At least you could put that derived table into a VIEW :)
Join is to combine two tables by matching the values in corresponding columns. In result, you will get a merged table which consists of the first table, plus the matched rows copied from the second table. You can use DIGBD add-in for excel

Resources