Replace semi-colon with nothing if NOT preceded by " (double quote) - string

I have a string like "Column";"Column";"Column".
However, several times I see:
​"Column";"Column;";"Column"
(Notice the extra semicolon in the second field).
Is it possible to find all instances where a semicolon (;) is not surrounded by double quotes (") and replace these with nothing?
something like replace(#string,'[a-z][0-9];','') ?
"Column";"Column;";"Column" turns into "Column";"Column";"Column"
"Value";"Value;";"Value" turns into "Value";"Value";"Value"
"Something";";Something else;";"Another ;thing" turns into "Something";"Something else";"Another thing"

Without knowing your table's definition, this is a vague answer.
In SQL Server 2017 (if I recall correctly), support for CSV formats were added to BULK INSERT, meaning that you could specify both your column and row separators and quote identifiers. For the above, this would mean your FIELDTERMINATOR would need the value ';' and the FIELDQUOTE would need the value '"'. This will, however, leave the remaining ; characters that are surrounded in double quotes.
As such, what I would propose is to create a staging table, where all the columns are a (n)varchar, BULK INSERT your data into that and then INSERT the data into your production table, with REPLACE operators to remove the remaining ; characters and strongly typing them.
In pseudo-SQL this would look like something like this:
BULK INSERT Staging.YourTable
FROM 'C:\YourFilePath\YourFile.txt'
WITH (FORMAT='CSV',
FIELDQUOTE='"',
FIELDTERMINATOR=';');
INSERT INTO Production.YourTable (Column1, Column2, Column3, Column4)
SELECT REPLACE(Column1,';',''),
TRY_CONVERT(int,REPLACE(Column2,';','')),
TRY_CONVERT(date,REPLACE(Column3,';',''),103),
REPLACE(Column4,';','')
FROM Staging.YourTable;

Not sure if this is oversimplification, but if you really have that string in a #string then I see no reason this shouldn't work:
replace(#string, ';";"', '";"')

Related

Insert in column names with blank space in sqlite3

I'm trying to fill a sqlite3 database with Python3 with data stored in a dict. The problem in my code seems to be in this snippet.
matchlist=['id','handicap','goal line','corner line']
heads='"'+'","'.join([a for a in list(match.keys()) if a in matchlist])+'"'
llaves=':'+',:'.join([a for a in list(match.keys()) if a in matchlist])
cur.execute('''INSERT or IGNORE INTO preodds ({}) VALUES ({});'''.format(heads,llaves),
match)
This gives me an operational error next to line.
Apparently, when you try to insert in columns which name has blank spaces youneed to"escape" them. To do so, I modified as well my llaves string, doing the following:
llaves='":'+'",":'.join([a for a in list(match.keys()) if a in matchlist])+'"'
Doing so fix my issue, but instead of filling the table with the value provided my the dict, it is filled with the literal value in llaves string.
Why is this? and more importantly, if i have a table with columns with blank spaces, how do you fill rows using dict data?
Your original code is likely producing this
INSERT or IGNORE INTO preodds ("id","handicap","goal line","corner line")
VALUES (:id,:handicap,:goal line,:corner line)
Notice the spaces in the parameters names... the names starting with colon (:). That's the problem.
Your attempted solution is just submitting string literals as values, so that is exactly what is inserted.
INSERT or IGNORE INTO preodds ("id","handicap","goal line","corner line")
VALUES (":id",":handicap",":goal line",":corner line")
Quotes can be used to delimit object names, but in other contexts quotes are interpreted as literal string value delimiters. Precise rules for determining how it interprets quoted values are found here.
As far as I can tell, parameters cannot be escaped and so cannot contain spaces or other special characters, at least nothing that is documented. See sqlite docs.
If you are going to build the parameter list dynamically, you should strip out all spaces from the parameter names. Or alternatively, you could just use unnamed parameters using the ? character. Either way the parameters are assigned values in the order they appear, so there would be no difference.
Something like:
INSERT or IGNORE INTO preodds ("id","handicap","goal line","corner line")
VALUES (:id,:handicap,:goalline,:cornerline)
or
INSERT or IGNORE INTO preodds ("id","handicap","goal line","corner line")
VALUES (?, ?, ?, ?)

CSV File with values having single quote within quote text qualifier

I am trying to parse a CSV file which has single quote as text qualifier. The problem here is that some values with single quote text qualifier itself contains single quote
e-g:
'Fri, 24 Feb 2017 17:44:57 +0700','th01ham000tthxs','/','','Writer's Tools Data','7.1.0.0',
I am struggling to parse the file as after this row, all of the remaining rows get displaced.
I tried working with OpenCSV, UnivocityParsers but didn't get any luck.
If I place the above row in excel (Excel Image) and provide text qualifier as single quote, it give correct result without any displacement of rows.
If using java, the JRecord library should handle the File.
How it works: if a field starts with a quote (e.g. ,') specifically look for ', or ''', or ''''', or ' etc (an odd number of quotes followed by either a comma or end-of-line marker). This approach breaks down if:
The embedded quote is the last character in a field i.e. 'Field with quote '',
White space between the quote and comma i.e. 'Field' , or , '
Here is the line in ReCsvEditor
Also in the ReCsvEditor when editing the file, if you select Generate >>> Java Code >>> ... it will generate Java/JRecord Code to read the file.
Disclaimer: I am the author of JRecord / ReCvEditor. Also the ReCsvEditor Generate function is new and needs more work
Try configuring univocity-parsers to handle the unescaped quote according to your scenario. 'Writer's Tools Data' has an unescaped quote. From your input, I can see you want to use STOP_AT_CLOSING_QUOTE as the strategy to work around these values.
Add this line to your code and it should work fine:
parserSettings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
Hope this helps.

Azure Data Factory v2 pipeline double quotes

My source file has nvarchar and numeric columns.
Numeric column has thousand separator, do identify the value comes with double quotes. When i use the quoteChar ("\"") in file format the numeric value works fine.
Same time the nvarchar column (Name) has multiple double quotes between the data, if i use the quoteChar the values are split into further more columns based on the number of double quotes.
Is there any fix/solution for this?
Based on the properties in Text format, only one character is allowed. So you can't separate the different data types by different delimiter.
You could try to use | column delimiter if your nvarchar doesn't have | character. Or maybe you have to parse your source file to remove the thousand separator in an Azure Function activity before the transfer. Then it could be identified by copy activity in adf.
ADF parser fails while reading text that is encapsulated with double quotes and also have a comma within the text, like "Hello, World". To make it work, set Escape Character and Quote Character properties value to double quote. This will save the whole text with double quotes into destination.

My database won't accept strings with letters

I'm using Mariadb and have the table setup with VARCHAR(30). When I insert a string containing numbers like "192" and then select it I'm able to print out 192. When I insert a string like "a48" it just seems to be ignored. I've tried inserting a complete letter string "a" and I still get nothing. In the Mariadb documentation for VARCHAR(M) I found this:
"If a unique index consists of a column where trailing pad characters are stripped or ignored, inserts into that column where values differ only by the number of trailing pad characters will result in a duplicate-key error"
I'm not sure if that could have anything to do with it? I am using letters just to make it easier to parse the data on my client side program. If I don't find a solution I will probably just pad it on the server after selecting.
Does anybody have any suggestions on what's going on here, or things I could try to find the problem?
Assuming that melon is the column to receive the string, then you should put single quotes around the $melon variable in the query, like this:
query("REPLACE INTO state (id, melon, image) VALUES (1, '$melon', $image)");
String values should be surrounded by single quotes; numeric values don't need to be.
Because the target column is a varchar(30) the value should always be surrounded by single quotes. MariaDB works out what you mean when you supply a numeric value, but it doesn't understand an alphanumeric value without single quotes. Both will work if you use single quotes, as shown.
To avoid SQL injection errors, it is better to use prepared statements, as described at https://www.w3schools.com/php/php_mysql_prepared_statements.asp.

LIKE clause in Sybase/SAP ASE trimmed at the end?

The emp table below has no ENAME ending in three spaces. However, the following SQL statement behaves like the clause is trimmed at the end (like a '%' pattern), because it returns all records:
select ENAME from dbo.emp where ENAME like '% '
I tried many other database platforms (including SQL Server, SQL Anywhere, Oracle, PostgreSQL, MySQL etc), I've seen this happening only in Sybase/SAP ASE (version 16). Is this a bug or is it "by design"? Nothing specific found in the online spec in this regard.
I'm looking for a generic fix, to apply some simple transformation to the pattern and return what is expected from any other platform. Without knowing in advance what data type the field is or what kind of data it holds.
This is caused by the VARCHAR semantics in ASE, which will always strip leading spaces from a value before operating on it. This is applied to the string '% ' before it is used, since that is a VARCHAR value by definition. This is indeed a particular semantic of ASE.
Now, you could try working around this by using the [ ] wildcard to match a space, but there are some things to be aware of. First, the column being matched (ENAME) must be CHAR, not VARCHAR, otherwise any trialing spaces will have been stripped as well before they were stored. Assuming the column is CHAR, then using a pattern '%[ ][ ][ ]' unfortunately still does not appear to work. I think there may be some trailing-space-stripping still happening here.
The best way to work around this is to use an artificial end-of-field delimiter which will not occur in the data, e.g.
ENAME||'~EOF~' like '% ~EOF~'
This works. But note that the column ENAME must still be CHAR rather than VARCHAR.
Like behavior is somehow documented in here .
For VARCHAR columns this will never work because ASE removes the trailing spaces
For CHAR it depends how do you insert the data.. in a char(10) column , if you insert 2 characters , ASE will add 8 blank spaces after the 2 characters to make them 10 .. so when you query , you will get this 2 characters entry as part of the result set because it includes more than 3 trailing spaces..
If this is not a problem for you, instead of like you can use char_index () which will count the trailing spaces and won't truncate them as like, so you could write something like :
select ENAME from dbo.emp where char_index(' ',ENAME) >0
Or you can calculate the trailing spaces , then check if your 3 spaces come after that or not , like :
select a from A
where charindex(' ',a) > (len(a) - len(convert (varchar(10) , a)))
Now again, this will get you more rows than expected if the data were inserted in a non-uniform count, but will work perfectly if you know exactly what to search for.
SELECT ename from dbo.emp where RIGHT(ENAME ,3) = '      '

Resources