How to prevent pyspark from adding double quotes to table name

How to prevent pyspark from adding double quotes to table name - apache-spark

I have a dataframe which am trying to store into database like below
oversampled_df.write \
.format('jdbc') \
.option('truncate', 'true') \
.options(url=EXT_DB_URL,
driver='oracle.jdbc.driver.OracleDriver',
dbtable=DEST_DB_TBL_NAME) \
.mode('overwrite') \
.save()
yet it keeps adding double quotes " to column names, how can I remove this to be able to query from the tables without including them i.e.
instead of
select "description" from schema.table;
to be
select description from schema.table;

I faced the same problem too, my workaround is
create the table manually in oracle
CREATE TABLE schema_name.table_name(
table_catalog VARCHAR2(255 BYTE),
table_schema VARCHAR2(255 BYTE)
)
add option("truncate", "true")
oversampled_df.write.format('jdbc').options(
url='jdbc:oracle:thin:schema/user#ip:port/dbname',
driver='oracle.jdbc.driver.OracleDriver',
dbtable='schema_name.table_name',
user='user',
password='password').option("truncate", "true")\
.mode('overwrite').save()
worked for me, hoped it helps

From the sounds of it, it looks like Oracle is treating your column names as quoted identifiers where to query against it, you need it to use the double quotes and it is also case sensitive. A workaround I found was making sure that all the columns in your DataFrame are capitalised (they can have digits and underscores as well) before saving to Oracle so it treats them as nonquoted identifiers. Them, you should be able to query them in either lower or upper case e.g. DESCRIPTION or description, without the need for double quotes.

Related

Replace single slash with double slash

I have the below Apache Spark Dataframe df_result where it has one column Name.
df_result.repartition(1).write.option('header','true').option("delimiter","|").option("escape", "\\").mode("overwrite").csv(path)
In this column, the values are like below.
Name
.....
John
Mathew\M
In the second row, there is a \ character. When I export this to csv using the above script, it generates the value as Mathew\M in the file. Ideally, I need the value as Mathew\\M in the file (ie, single \ should be replaced with \\). Is there a way to do this using the option or any other ways?
Am using Apache Spark 3.2.1.

Does this help? Seems to work for me
df.withColumn('Name', regexp_replace('Name','\\\\',r'\1\\\\')).write.option('header','true').option("delimiter","|").option("escape", "\\").mode("overwrite").csv("/tmp/output/wk/bn.cv")

Replace semi-colon with nothing if NOT preceded by " (double quote)

I have a string like "Column";"Column";"Column".
However, several times I see:
"Column";"Column;";"Column"
(Notice the extra semicolon in the second field).
Is it possible to find all instances where a semicolon (;) is not surrounded by double quotes (") and replace these with nothing?
something like replace(#string,'[a-z][0-9];','') ?
"Column";"Column;";"Column" turns into "Column";"Column";"Column"
"Value";"Value;";"Value" turns into "Value";"Value";"Value"
"Something";";Something else;";"Another ;thing" turns into "Something";"Something else";"Another thing"

Without knowing your table's definition, this is a vague answer.
In SQL Server 2017 (if I recall correctly), support for CSV formats were added to BULK INSERT, meaning that you could specify both your column and row separators and quote identifiers. For the above, this would mean your FIELDTERMINATOR would need the value ';' and the FIELDQUOTE would need the value '"'. This will, however, leave the remaining ; characters that are surrounded in double quotes.
As such, what I would propose is to create a staging table, where all the columns are a (n)varchar, BULK INSERT your data into that and then INSERT the data into your production table, with REPLACE operators to remove the remaining ; characters and strongly typing them.
In pseudo-SQL this would look like something like this:
BULK INSERT Staging.YourTable
FROM 'C:\YourFilePath\YourFile.txt'
WITH (FORMAT='CSV',
FIELDQUOTE='"',
FIELDTERMINATOR=';');
INSERT INTO Production.YourTable (Column1, Column2, Column3, Column4)
SELECT REPLACE(Column1,';',''),
TRY_CONVERT(int,REPLACE(Column2,';','')),
TRY_CONVERT(date,REPLACE(Column3,';',''),103),
REPLACE(Column4,';','')
FROM Staging.YourTable;

Not sure if this is oversimplification, but if you really have that string in a #string then I see no reason this shouldn't work:
replace(#string, ';";"', '";"')

Oracle conditionally adding spaces into data

I have a table that was given to me with some 'incorrect' data. The format of the data should be:
"000 00000"
There are TWO spaces. WHERE the spaces are can be different for different records, so for example one record could be the previous example and another could be "00 00 0000". The problem is that the data came in, in some instances with only a single space. (so "000 00000").
Ideally, id like to do this in a query to fix the data that's been loaded via an update statement. If this is easier done outside of oracle, that's fine, I can re-load the data (its a bit of data, at almost 400,000 rows).
What would be the easiest way to find the single space and add another as needed, or leave it alone if there are already two spaces?
I am currently working on a query to split the string ON the spaces, trim the data then put it all back together with 2 spaces.... its not working out too well in testing.
Thanks in advance!

here is the query to find single space record , try making CASE statement as needed.
WITH sample_data AS (SELECT '000 00000' value FROM dual UNION ALL
SELECT '00 00 0000' value FROM dual UNION ALL
SELECT '000 00000' value FROM dual )
Select * from sample_data where REGEXP_COUNT(VALUE,'[[:space:]]') =1

My database won't accept strings with letters

I'm using Mariadb and have the table setup with VARCHAR(30). When I insert a string containing numbers like "192" and then select it I'm able to print out 192. When I insert a string like "a48" it just seems to be ignored. I've tried inserting a complete letter string "a" and I still get nothing. In the Mariadb documentation for VARCHAR(M) I found this:
"If a unique index consists of a column where trailing pad characters are stripped or ignored, inserts into that column where values differ only by the number of trailing pad characters will result in a duplicate-key error"
I'm not sure if that could have anything to do with it? I am using letters just to make it easier to parse the data on my client side program. If I don't find a solution I will probably just pad it on the server after selecting.
Does anybody have any suggestions on what's going on here, or things I could try to find the problem?

Assuming that melon is the column to receive the string, then you should put single quotes around the $melon variable in the query, like this:
query("REPLACE INTO state (id, melon, image) VALUES (1, '$melon', $image)");
String values should be surrounded by single quotes; numeric values don't need to be.
Because the target column is a varchar(30) the value should always be surrounded by single quotes. MariaDB works out what you mean when you supply a numeric value, but it doesn't understand an alphanumeric value without single quotes. Both will work if you use single quotes, as shown.
To avoid SQL injection errors, it is better to use prepared statements, as described at https://www.w3schools.com/php/php_mysql_prepared_statements.asp.

LIKE clause in Sybase/SAP ASE trimmed at the end?

The emp table below has no ENAME ending in three spaces. However, the following SQL statement behaves like the clause is trimmed at the end (like a '%' pattern), because it returns all records:
select ENAME from dbo.emp where ENAME like '% '
I tried many other database platforms (including SQL Server, SQL Anywhere, Oracle, PostgreSQL, MySQL etc), I've seen this happening only in Sybase/SAP ASE (version 16). Is this a bug or is it "by design"? Nothing specific found in the online spec in this regard.
I'm looking for a generic fix, to apply some simple transformation to the pattern and return what is expected from any other platform. Without knowing in advance what data type the field is or what kind of data it holds.

This is caused by the VARCHAR semantics in ASE, which will always strip leading spaces from a value before operating on it. This is applied to the string '% ' before it is used, since that is a VARCHAR value by definition. This is indeed a particular semantic of ASE.
Now, you could try working around this by using the [ ] wildcard to match a space, but there are some things to be aware of. First, the column being matched (ENAME) must be CHAR, not VARCHAR, otherwise any trialing spaces will have been stripped as well before they were stored. Assuming the column is CHAR, then using a pattern '%[ ][ ][ ]' unfortunately still does not appear to work. I think there may be some trailing-space-stripping still happening here.
The best way to work around this is to use an artificial end-of-field delimiter which will not occur in the data, e.g.
ENAME||'~EOF~' like '% ~EOF~'
This works. But note that the column ENAME must still be CHAR rather than VARCHAR.

Like behavior is somehow documented in here .
For VARCHAR columns this will never work because ASE removes the trailing spaces
For CHAR it depends how do you insert the data.. in a char(10) column , if you insert 2 characters , ASE will add 8 blank spaces after the 2 characters to make them 10 .. so when you query , you will get this 2 characters entry as part of the result set because it includes more than 3 trailing spaces..
If this is not a problem for you, instead of like you can use char_index () which will count the trailing spaces and won't truncate them as like, so you could write something like :
select ENAME from dbo.emp where char_index(' ',ENAME) >0
Or you can calculate the trailing spaces , then check if your 3 spaces come after that or not , like :
select a from A
where charindex(' ',a) > (len(a) - len(convert (varchar(10) , a)))
Now again, this will get you more rows than expected if the data were inserted in a non-uniform count, but will work perfectly if you know exactly what to search for.

SELECT ename from dbo.emp where RIGHT(ENAME ,3) = '      '

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to prevent pyspark from adding double quotes to table name - apache-spark

Related

Replace single slash with double slash

Replace semi-colon with nothing if NOT preceded by " (double quote)

Oracle conditionally adding spaces into data

My database won't accept strings with letters

LIKE clause in Sybase/SAP ASE trimmed at the end?

Categories

Resources