CQL User Defined Type data import Syntax Errors - cassandra

I have created a UDT made up of fields from three or four columns of data. One of the field contains a letter inside parens, for example (c) or (d). When importing the csv file using cqlsh's COPY FROM, I get an error message:
Syntax error in CQL query …..mismatched import ‘(‘ expecting ‘)’ (….column 3, column 4) VALUES (10.2[(]c…).
I have tried importing csv file with fields where the letter does not have brackets and get:
Syntax error in CQL query …..mismatched import ‘c‘ expecting ‘)’ (….column 3, column 4) VALUES (10.2[c]…)
I have tried importing csv file without a letter in the field and get:
Syntax error in CQL query …..mismatched import ‘,‘ expecting ‘)’ (….column 4) VALUES (10.2,…)
The UDT is made up of integers and text. It appears that importing the csv file containing the UDT which includes a letter inside a bracket (e.g. (c)) generates a data violation as does a letter with no bracket as does and as does field with no value in it.

Have you tried character escaping using double dollar ($$) or double quotes ('') ? http://docs.datastax.com/en/cql/3.3/cql/cql_reference/escape_char_r.html

Related

How to separate stringin databricks

I try to separete a string like LESOES DO OMBRO (M75) using a function split_part in databricks, but occurs an error: AnalysisException: Undefined function: 'SPLIT_part'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'. I need to separate the code in parentheses of the rest of the text.
I have a column "patologia" the column is for example LESOES DO OMBRO (M75) and I need a new column with the value M75
If I understood correctly and you need a new column with the value that's between parentheses in another column, then you can extract such value with regular expression, like this
from pyspark.sql.functions import regexp_extract
regex_df = spark.createDataFrame([("LESOES DO OMBRO (M75)",)], "patologia: string")
extracted_col_df = regex_df.withColumn("extracted_value", regexp_extract("patologia", r'\(([^)]+)\)', 1))
extracted_col_df.show()
+---------------------+---------------+
|patologia |extracted_value|
+---------------------+---------------+
|LESOES DO OMBRO (M75)|M75 |
+---------------------+---------------+

ADF: Pass dynamic Where Clause as a string with quotes

I have a lookup that retrieves a few records from a SQL Server table containing server, database, schema, table name and a whole where clause. These values are passed to a copy data (within a ForEach) In the copy data i have tried to use two different Dynamic query statement, but I seem to get an error. And can't figure out where I'm going wrong.
Values in table:
SRC_SERVERNAME
SRC_DATABASE
SRC_SCHEMANAME
SRC_TABLENAME
SRC_WHERE_DATE_CLAUSE
SQ01
NAV
dbo
Company$Sales Invoice Header
where [Posting Date] >= '2021-01-01'
Source setup:
Error for statement 1:
A database operation failed with the following error: 'Incorrect syntax near '.'.'
Incorrect syntax near '.'., SqlErrorNumber=102,Class=15,State=1,
Error for statement 2:
A database operation failed with the following error: 'Incorrect syntax near '.'.'
Incorrect syntax near '.'., SqlErrorNumber=102,Class=15,State=1,
Statement 1 (query):
SELECT *
FROM #{item().SRC_SERVERNAME}.#{item().SRC_DATABASENAME}.#{item().SRC_SCHEMANAME}.#{item().SRC_TABLENAME},' ',#{item().SRC_WHERE_DATE_CLAUSE}
Statement 2 (dynamic query with concat):
#concat('select * from ',item().SRC_SERVERNAME,'.',item().SRC_DATABASENAME,'.',item().SRC_SCHEMANAME,'.',item().SRC_TABLENAME,' ',item().SRC_WHERE_DATE_CLAUSE)
There is a syntax error in your query.
In the 4-part naming of SQL database, server name/database name/schema name/table name should be separated by ‘.’.
When you have space or other special characters in the name of server/database/schema/table, they should be embedded inside square braces [].
#concat('select * from [',item().SRC_SERVERNAME, '].[',item().SRC_DATABASENAME,'].[',item().SRC_SCHEMANAME,'].[',item().SRC_TABLENAME, '] ',item().SRC_WHERE_DATE_CLAUSE)

PySpark - data mismatch error when trying to split a column content

I'm trying to use PySpark's split() method on a column that has data formatted like:
[6b87587f-54d4-11eb-95a7-8cdcd41d1310, 603, landing-content, landing-content-provider]
my intent is to extract the 4th element after the last comma.
I'm using a syntax like:
mydf.select("primary_component").withColumn("primary_component_01",f.split(mydf.primary_component, "\,").getItem(0)).limit(10).show(truncate=False)
But I'm consistently getting this error:
"cannot resolve 'split(mydf.primary_component, ',')' due to data
type mismatch: argument 1 requires string type, however,
'mydf.primary_component' is of
structuuid:string,id:int,project:string,component:string
type.;;\n'Project [primary_component#17,
split(split(primary_component#17, ,)[1], \,)...
I've also tried escaping the "," using \, \\ or not escaping it at all and this doesn't make any difference. Also, removing the ".getItem(0)" produces no difference.
What am I doing wrong? Feeling a dumbass but I don't know how to fix this...
Thank you for any suggestions
You are getting the error:
"cannot resolve 'split(mydf.`primary_component`, ',')' due to data
type mismatch: argument 1 requires string type, however,
'mydf.`primary_component`' is of
struct<uuid:string,id:int,project:string,component:string>
because your column primary_component is using a struct type when split expects string columns.
Since primary_component is already a struct and you are interested in the value after your last comma you may try the following using dot notation
mydf.withColumn("primary_component_01","primary_component.component")
In the error message, spark has shared the schema for your struct as
struct<uuid:string,id:int,project:string,component:string>
i.e.
column
data type
uuid
string
id
int
project
string
component
string
For future debugging purposes, you may use mydf.printSchema() to show the schema of the spark dataframe in use.

Convert varchar string to Currency format in db2 SQL

I have a column from which i have to extract String and then format it back to US currency format with 2 decimal places.
For example :
Column value : {tag}0000020000890|
From this, I have to match the tag and extract 20000890, and format it to 200,008.90
I have extracted the part with below code:
LTRIM(REGEXP_SUBSTR('match pattern', 1,1,'i',,1), '0')
Where match pattern is '\{tag\}(.*?)\|'
With this, I am able to extract 20000890
And then I tried the below to_char and to_number function on top of it to format as comma separated currency with 2 decimal points.
to_char(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0'), '99G999G999D99')
But this throws below error:
Sql error -20447, sqlstate 22007 sqlerrmc 99G999G999D99
Sysibm.Varchar-format
Then I tried,
to_char(to_number(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0')), '99G999G999D99')
But this also throws error:
Sql error -20476, sqlstate 22018 sqlermc DECFLOAT_FORMAT; 99G999G999D99
I'm not sure what causes this error.
The format that you try to use is supported starting from V11.5 only.
TO_CHAR V11.5
TO_CHAR V11.1
Compare the Table 2. Format elements for decimal floating-point to varchar table from both links.
Moreover, you must cast a string to a numeric value in the 1-st parameter of TO_CHAR:
SELECT TO_CHAR(DECFLOAT(REGEXP_SUBSTR(V, '\{tag\}(.*?)\|', 1, 1, 'i', 1)), '99,999,999.99')
FROM (VALUES '{tag}0000020000890|') T(V);
Take a look at VARCHAR_FORMAT. It is the function TO_CHAR is mapped to. The group separator is not G, but "," or ".". Basically, you have to replace your formatting string 99G999G999D99 with something like 99,999,999.99.
The Db2 documentation has more examples on that.

Postgresql COPY empty string as NULL not work

I have a CSV file with some integer column, now it 's saved as "" (empty string).
I want to COPY them to a table as NULL value.
With JAVA code, I have try these:
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', HEADER true)";
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '' HEADER true)";
I get: PSQLException: ERROR: invalid input syntax for type numeric: ""
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '\"\"' HEADER true)";
I get: PSQLException: ERROR: CSV quote character must not appear in the NULL specification
Any one has done this before ?
I assume you are aware that numeric data types have no concept of "empty string" ('') . It's either a number or NULL (or 'NaN' for numeric - but not for integer et al.)
Looks like you exported from a string data type like text and had some actual empty string in there - which are now represented as "" - " being the default QUOTE character in CSV format.
NULL would be represented by nothing, not even quotes. The manual:
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV format.
You cannot define "" to generally represent NULL since that already represents an empty string. Would be ambiguous.
To fix, I see two options:
Edit the CSV file / stream before feeding to COPY and replace "" with nothing. Might be tricky if you have actual empty string in there as well - or "" escaping literal " inside strings.
(What I would do.) Import to an auxiliary temporary table with identical structure except for the integer column converted to text. Then INSERT (or UPSERT?) to the target table from there, converting the integer value properly on the fly:
-- empty temp table with identical structure
CREATE TEMP TABLE tbl_tmp AS TABLE tbl LIMIT 0;
-- ... except for the int / text column
ALTER TABLE tbl_tmp ALTER col_int TYPE text;
COPY tbl_tmp ...;
INSERT INTO tbl -- identical number and names of columns guaranteed
SELECT col1, col2, NULLIF(col_int, '')::int -- list all columns in order here
FROM tbl_tmp;
Temporary tables are dropped at the end of the session automatically. If you run this multiple times in the same session, either just truncate the existing temp table or drop it after each transaction.
Related:
How to update selected rows with values from a CSV file in Postgres?
Rails Migrations: tried to change the type of column from string to integer
postgresql thread safety for temporary tables
Since Postgres 9.4 you now have the ability to use FORCE_NULL. This causes the empty string to be converted into a NULL. Very handy, especially with CSV files (actually this is only allowed when using CSV format).
The syntax is as follow:
COPY table FROM '/path/to/file.csv'
WITH (FORMAT CSV, DELIMITER ';', FORCE_NULL (columnname));
Further details are explained in the documentation: https://www.postgresql.org/docs/current/sql-copy.html
If we want to replace all blank and empty rows with null then you just have to add emptyasnull blanksasnull in copy command
syntax :
copy Table_name (columns_list)
from 's3://{bucket}/{s3_bucket_directory_name + manifest_filename}'
iam_role '{REDSHIFT_COPY_COMMAND_ROLE}' emptyasnull blanksasnull
manifest DELIMITER ',' IGNOREHEADER 1 compupdate off csv gzip;
Note: It will apply for all the records which contains empty/blank values

Resources