Postgresql COPY empty string as NULL not work - string

I have a CSV file with some integer column, now it 's saved as "" (empty string).
I want to COPY them to a table as NULL value.
With JAVA code, I have try these:
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', HEADER true)";
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '' HEADER true)";
I get: PSQLException: ERROR: invalid input syntax for type numeric: ""
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '\"\"' HEADER true)";
I get: PSQLException: ERROR: CSV quote character must not appear in the NULL specification
Any one has done this before ?

I assume you are aware that numeric data types have no concept of "empty string" ('') . It's either a number or NULL (or 'NaN' for numeric - but not for integer et al.)
Looks like you exported from a string data type like text and had some actual empty string in there - which are now represented as "" - " being the default QUOTE character in CSV format.
NULL would be represented by nothing, not even quotes. The manual:
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV format.
You cannot define "" to generally represent NULL since that already represents an empty string. Would be ambiguous.
To fix, I see two options:
Edit the CSV file / stream before feeding to COPY and replace "" with nothing. Might be tricky if you have actual empty string in there as well - or "" escaping literal " inside strings.
(What I would do.) Import to an auxiliary temporary table with identical structure except for the integer column converted to text. Then INSERT (or UPSERT?) to the target table from there, converting the integer value properly on the fly:
-- empty temp table with identical structure
CREATE TEMP TABLE tbl_tmp AS TABLE tbl LIMIT 0;
-- ... except for the int / text column
ALTER TABLE tbl_tmp ALTER col_int TYPE text;
COPY tbl_tmp ...;
INSERT INTO tbl -- identical number and names of columns guaranteed
SELECT col1, col2, NULLIF(col_int, '')::int -- list all columns in order here
FROM tbl_tmp;
Temporary tables are dropped at the end of the session automatically. If you run this multiple times in the same session, either just truncate the existing temp table or drop it after each transaction.
Related:
How to update selected rows with values from a CSV file in Postgres?
Rails Migrations: tried to change the type of column from string to integer
postgresql thread safety for temporary tables

Since Postgres 9.4 you now have the ability to use FORCE_NULL. This causes the empty string to be converted into a NULL. Very handy, especially with CSV files (actually this is only allowed when using CSV format).
The syntax is as follow:
COPY table FROM '/path/to/file.csv'
WITH (FORMAT CSV, DELIMITER ';', FORCE_NULL (columnname));
Further details are explained in the documentation: https://www.postgresql.org/docs/current/sql-copy.html

If we want to replace all blank and empty rows with null then you just have to add emptyasnull blanksasnull in copy command
syntax :
copy Table_name (columns_list)
from 's3://{bucket}/{s3_bucket_directory_name + manifest_filename}'
iam_role '{REDSHIFT_COPY_COMMAND_ROLE}' emptyasnull blanksasnull
manifest DELIMITER ',' IGNOREHEADER 1 compupdate off csv gzip;
Note: It will apply for all the records which contains empty/blank values

Related

SQL Server : escape punctuation in string

I am exporting data from a SQL Server table to a .csv file, and then I use sp_send_email to email the file with data.
My problem is with this value:
Cantata Number 212 "Peasants Cantata", BWV 212
The value gets split into two columns in the .csv file that gets emailed. This value should be only in one column.
Some titles might contain a comma, which needs to be left in the string for those instances.
For example:
Cantata Number 212 Peasants Cantata" BWV 212"
I tried this method, but is not working:
Note: This SELECT statement resides inside a view vw_WeeklyReport
SELECT TOP 100 PERCENT
'"' + [p].[Title] + '"' [Title]
FROM
table
The code that exports the data and emails the .csv file:
BEGIN
SET NOCOUNT ON;
DECLARE #qry VARCHAR(8000);
-- Create the query, concatenating the column name as an alias
SET #Qry = 'SET NOCOUNT ON; SELECT Title FROM [vw_WeeklyReport] SET NOCOUNT OFF';
-- Send the e-mail with the query results in attachment.
EXEC [msdb].[dbo].[sp_send_dbmail]
#profile_name = 'default',
#recipients = '6lack#email.com',
#subject = 'Weekly Report',
#body = 'An attachment has been included in this email.',
#query_attachment_filename = 'WeeklyRep.csv',
#query = #qry,
#attach_query_result_as_file = 1,
#query_result_separator = ',',
#query_result_width = 32767,
#query_result_no_padding = 1;
END;
When there are comma's (or separators) in the field, that field should be enclosed with double quotes, and any double quotes within have to be escaped with another double quote:
"Cantata Number 212 ""Peasants Cantata"", BWV 212"
Once double quotes are used around fields, all fields containing double quotes should also be quoted and inside quotes escaped as well.
Maybe you could look for an option to export to csv using quoted fields.
Removing all the comma's could also be an option, but then you lose some information.
On the other hand, if there is only one column (as in your SELECT statement) there is no need at all to use csv. A plain text file can be used instead.
Change your query in the stored proc to something like this:
SET #Qry = 'SET NOCOUNT ON; SELECT replace(Title, ',', '') as Title FROM [vw_WeeklyReport] SET NOCOUNT OFF';
Note this is untested, but should give you what you're looking for. This is under the presumption that stripping out commas is acceptable, as was indicated in the initial post. If the commas need to remain intact, the answer isn't quite as simple.

New line symbol when exporting to excel

I need to fill a cell with a data, separated by 'new line' symbol.
I've tried:
data: l_con_sepa TYPE c VALUE cl_abap_char_utilities=>newline.
...
CONCATENATE <gf_aufk>-tplnr " 6000000159 Korchagin AS 02.02.2017
<gf_aufk>-pltxt
l_con_sepa
<gf_aufk>-aufnr
INTO lv_str
SEPARATED BY space.
Tried to use CL_ABAP_CHAR_UTILITIES=>CR_LF. Tried to use "&" and "#" symbols. Tried to wrap lv_str with quotes. Nothing.
I either got symbols as is, or just a blank space insted of 'alt+enter' equivalent.
A simple experiment with Excel, namely creating a cell with Alt+Enter characters and saving it as a CSV file, shows that such a new line symbol is LF and not CR_LF. Moreover it is put there in double quotes.
So just use double quotes and CL_ABAP_CHAR_UTILITIES=>NEWLINE.
It must work with CSV. You did not specify what API you use to export your data to XLS format, so I cannot test it. If you do not mind putting those details in the question, please do so.
Assuming you use FM SAP_CONVERT_TO_XLS_FORMAT, there is even no need for double quotes.
REPORT YYY.
TYPES: BEGIN OF gty_my_type,
col1 TYPE char255,
col2 TYPE char255,
END OF gty_my_type,
gtty_my_type TYPE STANDARD TABLE OF gty_my_type WITH EMPTY KEY.
START-OF-SELECTION.
DATA(gt_string_table) = VALUE gtty_my_type(
(
col1 = 'aaa'
&& cl_abap_char_utilities=>newline
&& 'bbb'
&& cl_abap_char_utilities=>newline
&& 'ccc'
col2 = 'ddd'
)
).
CALL FUNCTION 'SAP_CONVERT_TO_XLS_FORMAT'
EXPORTING
i_filename = 'D:\temp\abap.xlsx'
TABLES
i_tab_sap_data = gt_string_table
EXCEPTIONS
conversion_failed = 1
OTHERS = 2.
ASSERT sy-subrc = 0.
The result looks like follows
I thought that it might be caused by CONCATENATE .. INTO .. SEPARATED BY space but it is not. Please execute the following program in order to check it out.
REPORT YYY.
TYPES: BEGIN OF gty_my_type,
col1 TYPE char255,
col2 TYPE char255,
END OF gty_my_type,
gtty_my_type TYPE STANDARD TABLE OF gty_my_type WITH EMPTY KEY.
DATA: gs_string TYPE gty_my_type.
DATA: gt_string_table TYPE gtty_my_type.
START-OF-SELECTION.
CONCATENATE 'aaa' cl_abap_char_utilities=>newline 'bbb' cl_abap_char_utilities=>newline 'ccc'
INTO gs_string-col1 SEPARATED BY space.
gs_string-col2 = 'ddd'.
APPEND gs_string TO gt_string_table.
CALL FUNCTION 'SAP_CONVERT_TO_XLS_FORMAT'
EXPORTING
i_filename = 'D:\temp\abap.xlsx'
TABLES
i_tab_sap_data = gt_string_table
EXCEPTIONS
conversion_failed = 1
OTHERS = 2.
ASSERT sy-subrc = 0.
So the problem must be somewhere else. You are not showing us your whole code. Maybe you use some kind of a third party package to process your Excel files?
I don't remember if it's needed to add an "end of line" symbol.
Just append each line into a table and download the full table using FM SAP_CONVERT_TO_XLS_FORMAT.

Populating a wide table with SSRS text parameter with a delimiter

I am trying to populate a table variable in SSRS and call a SP subsequently to process the data in it:
DECLARE #Tbl1 TABLE
(
D01 float,
D02 float,
D03 float,
D04 float,
D05 float,
...
D96 float
)
To populate it I use a text parameter #LS. The input is comma delimited string with 96 elements:
0.635316969,0.756943899,0.890520142,1.028008362,1.166350106,1.30511861,1.444527254,1.580948571,1.578743639,1.575542931,1.573195746,1.571346448,1.571275321,1.56992391,1.568003484,1.567221089,1.556836567,1.543820351,1.53037, ...., ,0.514543561
In a dataset I tried to populate the table first (after table variable declaration):
insert into #Tbl1
VALUES (#LS)
But got this error at run-time: "Column name or number of supplied values does not match table definition."
I tried JOIN(SPLIT()) with comma without luck. Any ideas?
Thanks!
The problem is the the #LS parameter is a single-value text parameter so you can't use it like that - you'd need to use a multi-value parameter.
So let's try something different. You don't need to create a temporary table because you could build your column values to give a dataset that you want using Sql like this:
SELECT 0.635316969 AS D01, 0.756943899 AS D02, ... , 0.514543561 AS D96
Fortunately almost everything in SSRS is an expression, so we just need to build this Sql statement dynamically from the #LS parameter using an expression. Go to the Report menu then Report Properties... and click the Code tab. Enter the following code:
Function MakeSql(LS As String) As String
Dim Sql As String
Dim Values() As String
Dim i As Integer
Sql = "SELECT "
Values = Split(LS, ",")
For i = 0 To Values.Length - 1
Sql = Sql + Values(i) + " AS D" + Right("0" + CStr(i+1), 2) + ", "
Next i
Sql = Left(Sql, Len(Sql) - 2) ' Remove trailing comma
Return Sql
End Function
So what we are doing is splitting the string into an array of values which we loop through to create a Sql statement that aliases these values to the field names we want.
Right-click your dataset, choose Dataset Properties and press the fx button beside the query textbox. This allows us to enter a text expression for our Sql statement rather than an actual Sql statement. Here we need to call the custom code function we created above which will insert our custom built Sql expression:
=Code.MakeSql(Parameters!LS.Value)
Make sure your dataset has the fields D01 to D96 (you'll have to set these up manually because SSRS can't analyse the Sql expression to determine the field values) and you're done!

Replace empty strings with null values

I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.
Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.
Ref: How to convert empty spaces into null values, using SQL Server?
To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.
The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.
CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- ARRAY of all basic character types
_sql text;
BEGIN
SELECT INTO _sql -- build SQL command
'UPDATE ' || _tbl
|| E'\nSET ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n ,')
|| E'\nWHERE ' || string_agg(col || ' = ''''', ' OR ')
FROM (
SELECT quote_ident(attname) AS col
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND NOT attnotnull -- exclude columns defined NOT NULL!
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) sub;
-- RAISE NOTICE '%', _sql; -- test?
-- Execute
IF _sql IS NULL THEN
updated_rows := 0; -- nothing to update
ELSE
EXECUTE _sql;
GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
END IF;
END
$func$;
Call:
SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');
To also get the column name updated_rows:
SELECT * FROM f_empty2null('mytable');
db<>fiddle here
Old sqlfiddle
Major points
Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.
Use quote_ident() or format() to sanitize column names and safeguard against SQLi.
The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)
Has to use dynamic SQL with EXECUTE.
The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.

Inserting a number as a String into a Text column and SQLite stills removes the leading zero

I got the following number as a string: String numberString = "079674839";
When I insert this number into a SQLite DB, SQLite automatically removes the leading zero and stores the string as 79674839. Considering affinity and that the column stores TEXT, shouldn't SQLite store the whole string and keep the leading zero?
Thanks
Double-check your database schema. As documented on Datatypes in SQLite Version 3, the column type name affects how values are processed before being stored.
Here's a Python program to demonstrate, using an in-memory database:
import sqlite3
db = sqlite3.connect(':memory:')
val = "0796";
db.execute('CREATE TABLE test (i INTEGER, r REAL, t TEXT, b BLOB);')
db.execute('INSERT INTO test VALUES (?, ?, ?, ?);', (val, val, val, val))
res = db.execute('SELECT * FROM test');
print '\t'.join([x[0] for x in res.description])
for row in res.fetchall():
print '\t'.join([repr(x) for x in row])
The output is:
i r t b
796 796.0 u'0796' u'0796'
So, it looks like your column is actually an integer type. Take a look at the schema definition (sqlite3 database.db .schema works from the command line), look at the documentation again, and make sure you are using one of type names that map to TEXT affinity. Unknown type names get INTEGER affinity.
In my own case, I was using 'STR', which ends up with the default INTEGER affinity. I changed it to 'TEXT', and SQLite started respecting my leading zeros.
Use single quotes around the number, (i.e., '079674839') if it is anywhere in inline sql code. Also, if you're doing this programatically, make sure that you are not going through a numeric conversion.

Resources