How do you specify column names with psycopg2 - python-3.x

I have a SQL statement such as:
INSERT INTO my_table (col1, col2, col3) VALUES (1,2,3)
I am using psycopg2 to insert data as follows:
cur.execute(
sql.SQL("INSERT INTO{} VALUES (%s, %s, %s)").format(sql.Identifier('my_table')),[1, 2, 3]
)
I don't see how to specify column names into the insert statement though? The above sql.sql is "assuming" that 1,2,3 are in order of col1, col2 and col3. For instance, at times when I want to say insert only col3, how would I specify the column name with sql.sql?

The execute is just executing the SQL code, so you can just mention the columns as per standard PostgreSQL INSERT statement like
INSERT INTO TABLE_ABC (col_name_1,col_name_2,col_name_3) VALUES (1, 2, 3)"

Related

How to get top 3 columns and their values across multiple columns (dynamically) in BigQuery

I have a table that looks like this
select 'Alice' AS ID, 1 as col1, 3 as col2, -2 as col3, 9 as col4
union all
select 'Bob' AS ID, -9 as col1, 2 as col2, 5 as col3, -6 as col4
I would like to get the top 3 absolute values for each record across the four columns and then format the output as a dictionary or STRUCT like below
select
'Alice' AS ID, [STRUCT('col4' AS column, 9 AS value), STRUCT('col2',3), STRUCT('col3',-2)] output
union all
select
'Bob' AS ID, [STRUCT('col1' AS column, -9 AS value), STRUCT('col4',-6), STRUCT('col3',5)]
output
output
I would like it to be dynamic, so avoid writing out columns individually. It could go up to 100 columns that change
For more context, I am trying to get the top three features from the batch local explanations output in Vertex AI
https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions
I have looked up some examples, would like something similar to the second answer here How to get max value of column values in a record ? (BigQuery)
EDIT: the data is actually structured like this. If this can be worked with more easily, this would be a better option to work from
select 'Alice' AS ID, STRUCT(1 as col1, 3 as col2, -2 as col3, 9 as col4) AS featureAttributions
union all
SELECT 'Bob' AS ID, STRUCT(-9 as col1, 2 as col2, 5 as col3, -6 as col4) AS featureAttributions
Consider below query.
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM sample_table UNPIVOT (value FOR column IN (col1, col2, col3, col4))
)
GROUP BY ID;
Query results
Dynamic Query
I would like it to be dynamic, so avoid writing out columns individually
You need to consider a dynamic SQL for this. By refering to the answer from #Mikhail you linked in the post, you can write a dynamic query like below.
EXECUTE IMMEDIATE FORMAT("""
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM sample_table UNPIVOT (value FOR column IN (%s))
)
GROUP BY ID
""", ARRAY_TO_STRING(
REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT AS STRUCT * EXCEPT (ID) FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);
For updated sample table
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
UNPIVOT (value FOR column IN (col1, col2, col3, col4))
)
GROUP BY ID;
EXECUTE IMMEDIATE FORMAT("""
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
UNPIVOT (value FOR column IN (%s))
)
GROUP BY ID
""", ARRAY_TO_STRING(
REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT featureAttributions FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);

Column list specification in INSERT OVERWRITE statement

While trying to OVERWRITE the Hive table with specific columns from Spark(Pyspark) using a dataframe getting the below error
pyspark.sql.utils.ParseException: u"\nmismatched input 'col1' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 36)\n\n== SQL ==\ninsert OVERWRITE table DB.TableName (Col1, Col2, Col3) select Col1, Col2, Col3 FROM dataframe\n------------------------------------^^^\n"
Based on the https://issues.apache.org/jira/browse/HIVE-9481 looks like column-list is still not supported in INSERT OVERWRITE, so on trying to run without the overwrite keyword still gives me the same error.
sparkSession.sql("insert into table DB.TableName (Col1, Col2, Col3) select Col1, Col2, Col3 FROM dataframe")
Note: The above works fine when the specific column-list is not
specified, and the columns between the tables match.
But on trying the same via Hive Terminal goes through fine.
INSERT INTO TABLE DB.TableName (Col1, Col2, Col3) select Col1, Col2, Col3 from DB.TableName2;
Should any property or configuration be set or passed through the Spark-Submit
Please do let me know if you need more data or information..

Python and SQLite3: how to write less values than a table has colums

I am using Python3 and SQLite3. I have a table with 6 columns, but I am only inserting values to 4 of the columns.
execute("INSERT INTO VALUES (?, ?, ?, ?)", [list_name])
The error I get:
ERROR: table blacklisted_ips has 6 columns but 4 values were supplied
list_name = [(24, 'somestring', '2018/05/06', '2018-05-13 10:16:34'), (24, 'somestring', '2018/05/06', '2018-05-13 10:16:34')]
So the question is how do I write to only 4 of the columns?
Specify the columns you care about.
INSERT INTO sometable
(foo, bar, baz, quux)
VALUES
(42, 420, 3.14, 0)
The SQLite page for INSERT operation specifies:
If a column-name list is specified, then the number of values in each
term of the VALUE list must match the number of specified columns.
Each of the named columns of the new row is populated with the
results of evaluating the corresponding VALUES expression.
What you want would be:
INSERT INTO blacklisted_ips (col1, col2, col3, col4) VALUES (?, ?, ?, ?)

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

I'm new in Cassandra and I have a trouble inserting some rows in a database getting the error of the title.
I use cassandra 1.0.8 and cqlsh for doing changes in my database.
Next, I explain the given steps before I get the error:
CREATE A COLUMN FAMILY
CREATE TABLE test (
col1 int PRIMARY KEY,
col2 bigint,
col3 boolean,
col4 timestamp
);
INSERT SEVERAL ROWS WITHOUT SPECIFICYING ALL OF COLUMNS OF THE TABLE
insert into test (col1, col2, col3) values (1, 100, true);
insert into test (col1, col2, col3) values (2, 200, false);
SELECT FOR CHECKING THAT ROWS HAVE BEEN INSERTED CORRECTLY
select * from test;
The result is the following:
INSERT A ROW SPECIFICYING A VALUE FOR THE col4 (NOT SPECIFIED BEFORE)
insert into test (col1, col2, col3, col4) values (3, 100, true, '2011-02-03');
SELECT FOR CHECKING THAT ROW HAS BEEN INSERTED CORRECTLY
select * from test;
In this SELECT is the error. The result is the following:
SELECT EACH COLUMN OF THE TABLE SEPARATELY
select col1 from test;
select col2 from test;
select col3 from test;
select col4 from test;
it works fine and shows the right values:
Then, my question is: what's the problem in the first SELECT? what's wrong?
Thanks in advance!!
NOTE:
If I define col4 as Integer rather than a timestamp it works. However, I've tried to insert col4 as the normalized format yyyy-mm-dd HH:mm (I've tried with '2011-02-03 01:05' and '2011-02-03 01:05:10') but it doesn't work.
Cassandra 1.0.8 shipped with CQL2 and that's where your problem is coming from. I managed to recreate this in 1.0.8 but it works fine with 1.2.x so my advice is upgrade if you can.
In C* 1.2.10
cqlsh> update db.user set date='2011-02-03 01:05' where user='JCTYpjJlM';
cqlsh> SELECT * from db.user ;
user | date | password
-----------+--------------------------+----------
xvkYQKerQ | null | 765
JCTYpjJlM | 2011-02-03 01:05:00+0200 | 391
#mol
Weird, try to insert col4 as Integer (convert to milliseconds first) or use the normalized format : yyyy-mm-dd HH:mm
Accodring to the doc here, you can omit the time and just input the date but it seems that breaks something in your case

SELECT statement from two dataframes using RMySQL

Consider two data frames, dataFrame1 and dataFrame2:
dataFrame1 has N columns (colmn1, ..., colmnN)
dataFrame2 has 3 columns (col1, col2, col3)
Can I write a statement like:
Select colmn1, colmn2, ..., colmnN, col1, col2 from dataFrame1, dataFrame2
using RMySQL?
Maybe you want package sqldf instead.
Try this:
library("sqldf")
sqldf("select colmn1, colmn2, ..., colmnN, col1, col2 from dataFrame1, dataFrame2")
of course you must replace ... with actual column names.

Resources