U-Sql Create table statement failing - azure

I'm trying to create a U-sql table from two tables using Create table as select (CTA's) as below -
DROP TABLE IF EXISTS tpch_query2_result;
CREATE TABLE tpch_query2_result
(
INDEX idx_query2
CLUSTERED(P_PARTKEY ASC)
DISTRIBUTED BY HASH(P_PARTKEY)
) AS
SELECT
a.P_PARTKEY
FROM part AS a INNER JOIN partsupp AS b ON a.P_PARTKEY == b.PS_PARTKEY;
But while running the U-sql query im getting the below error -
E_CSC_USER_QUALIFIEDCOLUMNNOTFOUND: Column 'P_PARTKEY' not found in rowset 'a'.
Line 11
E_CSC_USER_QUALIFIEDCOLUMNNOTFOUND: Column 'PS_PARTKEY' not found in rowset 'b'.
Not sure about the error. Can someone provide some insights on this error.Thanks

The error normally indicates that the specified column does not exists in the specified rowset referenced by a (i.e., part) or b (i.e., partsupp). What is the schema of either of these tables? do they have columns of the expected names?

Related

Does it have an equivalent query in cql/cassandra for "alter table add column if column not exists"?

I need to add the column for table that if the column is not existed.
I ran the Alter Table <table> add <column_name> <type>; however, it will have this error message if the column already exists.
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid column name <column_name> because it conflicts with an existing column"
Does it have a way to do a similar check on whether the column exists or not?
Thank
So I checked Tasos's answer, and it works with Apache Cassandra 4.1. It does not work with 4.0 or any version prior to that.
If you're using an older version, you can try querying system_schema like this:
SELECT COUNT(*) FROm system_schema.columns
WHERE keyspace_name = 'nosql1' AND table_name = 'users' AND column_name='description';
If you're trying to do something programatically, you can check whether or not that query returns 0 or 1, and then apply the ALTER TABLE command.
According to the official documentation of Cassandra 4.1 you can use the IF NOT EXISTS clause, i.e.:
ALTER TABLE addamsFamily ADD IF NOT EXISTS gravesite varchar;
I quote (emphasis mine):
The ALTER TABLE statement can:
ADD a new column to a table. The primary key of a table cannot ever be altered. A new column, thus, cannot be part of the primary key. Adding a column is a constant-time operation based on the amount of data in the table. If the new column already exists, the statement will return an error, unless IF NOT EXISTS is used in which case the operation is a no-op.
[...]
For versions before 4.1, you need to use Aaron's answer based on system_schema.

Azure Data Factory - Exists transformation in Data Flow with generic dataset

I'm having issues using the Exists Transformation within a Data Flow with a generic dataset.
I have two sources (one from staging table "sourceStg", one from DWH table "sourceDwh") and want to compare if the UniqueIdentifier-Column in the staging table is existing in the UniqueIdentifier-Column in the DWH table. For that I have a generic data set which I query with a SQL statement containing parameters.
When I open the "Exists settings" I cannot choose any Column from the source in the conditions since the source is generic and has no Projection until I run the data flow. However, I have a parameter which I get from the parent pipeline which provides me the name of the Column containing the UniqueIdentifier (both column names in staging / DWH are the same).
I tried to add following statement "byName($UniqueIdentifier)" in the left and right column field but the engine resolves them both as the sourceStg-Column since the prefix of the source-transformations is missing and it defaults to the first one. What I basically now try to achieve is having some statement as followed defining the correct source-transformation and the column containing the unique identifier with a parameter.
exists(sourceStg#$UniqueIdentifier == sourceDwh#$UniqueIdentifier)
But either the expression cannot be parsed or the result does not retrieve the actual UniqueIdentifier value from the column but writes the statement (e.g. sourceStg#$UniqueIdentifier) as column value.
The only workaround I found so far is having two derived columns which adds a suffix to the UniqueIdentifier-Column in one source and a new parameter $UniqueIdentiferDwh which is populate with the parameter $UniqueIdentifier and the same suffix as used in the derived column.
Any Azure Data Factory experts out there to help?
Thanks in advance!

How do we create a generic mapping dataflow in datafactory that will dynamically extract data from different tables with different schema?

I am trying to create a azure datafactory mapping dataflow that is generic for all tables. I am going to pass table name, the primary column for join purpose and other columns to be used in groupBy and aggregate functions as parameters to the DF.
parameters to df
I am unable to refernce this parameter in groupBy
Error: DF-AGG-003 - Groupby should reference atleast one column -
MapDrifted1 aggregate(
) ~> Aggregate1,[486 619]
Has anyone tried this scenario? Please help if you have some knowledge on this or if it can be handled in u-sql script.
We need to first lookup your parameter string name from your incoming source data to locate the metadata and assign it.
Just add a Derived Column previous to your Aggregate and it will work. Call the column 'groupbycol' in your Derived Column and use this formula: byName($group1).
In your Agg, select 'groupbycol' as your groupby column.

How can we write pandas dataframe to a Netezza Database directly using pyodbc?

I have Netezza database at remote server and i am trying to write to the database using Pyodbc.
The connection work's while reading from the database..However while trying to write i am not able to write to the Netezza database. It shows the following error:
"Error: ('HY000', '[HY000] ERROR: Column 4 : Field cannot contain null values (46) (SQLExecDirectW)')"
On further inspecting the Column 4, i found no Null value in the specified problem.
Also, the snippet of the code which i am using to write to the database is as follows:
for row in Full_Text_All.itertuples():
srows = str(row[1:]).strip("()")
query2 = "insert into MERGED_SOURCES values('+srows+')"
where,
Full_Text_All is the name of the dataframe
MERGED_SOURCES is the name of the table.
It might be that Column 4 has been defined as NOT NULL when the table was created.
If you may have access to the DDL of the table, you should be able to check this.
If the NOT NULL option was specified for Column 4, I suggest you to double check the data you are trying to insert into the table: for them, the value correspondent to Column 4 should not be null.

Converting a list column to a set column in cassandra/DSE

While iterating on a new feature my team created a list column in our DSE database. We now want it to be a set column. I dropped the column and created it again as a set column and got this error:
ALTER TABLE sometable DROP somecolumn;
ALTER TABLE sometable ADD somecolumn set<text>;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot add a collection with the name integrations because a collection with the same name and a different type (list) has already been used in the past"
There isn't even any data in the column. Is there not some sort of hard delete override? We can change the name but I really don't like the idea of a name that will not work if anyone tries to use it. Do I have to remake the whole table?
The best option is to add an alternate column with a different name or create a new table.
Technically it is possible to drop and recreate columns, but if you already have data in these columns on disk and in backups, it may create problems bringing nodes back online if they crash. (You cant load old data of one format into new columns with a different format)
If you really must do this, you can do the following:
ALTER TABLE sometable DROP somecolumn;
ALTER TABLE sometable ADD somecolumn int;
ALTER TABLE sometable DROP somecolumn;
ALTER TABLE sometable ADD somecolumn set<text>;
(Based on a comment in Cassandra: Adding new column to the table)

Resources