I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.
Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.
Ref: How to convert empty spaces into null values, using SQL Server?
To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.
The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.
CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- ARRAY of all basic character types
_sql text;
BEGIN
SELECT INTO _sql -- build SQL command
'UPDATE ' || _tbl
|| E'\nSET ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n ,')
|| E'\nWHERE ' || string_agg(col || ' = ''''', ' OR ')
FROM (
SELECT quote_ident(attname) AS col
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND NOT attnotnull -- exclude columns defined NOT NULL!
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) sub;
-- RAISE NOTICE '%', _sql; -- test?
-- Execute
IF _sql IS NULL THEN
updated_rows := 0; -- nothing to update
ELSE
EXECUTE _sql;
GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
END IF;
END
$func$;
Call:
SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');
To also get the column name updated_rows:
SELECT * FROM f_empty2null('mytable');
db<>fiddle here
Old sqlfiddle
Major points
Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.
Use quote_ident() or format() to sanitize column names and safeguard against SQLi.
The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)
Has to use dynamic SQL with EXECUTE.
The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.
Related
I have a static prompt which is a single select. In that I have two values lets call it A and B. So when I select option 'A' my report pulls all data from the DB which is expected. So when user Select option 'B' the report should pull only the records whose code = 'M'. Here code is a column name in the report.
Note: For option 'A' I don't need to set any prompt in the report because it should pull all records by default.
Let's assume your parameter name is param and data item is named item.
Filter expression:
if (?param? = 'A')
then ([item])
else ('M')
= [item]
Note: You absolutely need to use a prompt. The result of selecting A should be to not filter.
I think I understand, try this:
Make the prompt a single value (i.e. B) with a use value of 'M'
Make the HEADER TEXT for the prompt A (so it is not an actual selection)
Make the filter optional
if the user selects A - the prompt is NULL and the optional filter is ignored
if the user selects B - the filter [Some data item] = ?YourParm? will occur
Also, if you prefer to not have header text
you can make static values A, B and modify the optional filter to be like this:
(?YourParm? <> 'M') OR ([Some data item] = ?YourParm?)
An Excel table as data source may contain error values (#NA, #DIV/0), which could disturbe later some steps during the transformation process in Power Query.
Depending of the following steps, we may get no output but an error. So how to handle this cases?
I found two standard steps in Power Query to catch them:
Remove errors (UI: Home/Remove Rows/Remove Errors) -> all rows with an error will be removed
Replace error values (UI: Transform/Replace Errors) -> the columns have first to be selected for performing this operations.
The first possibility is not a solution for me, since I want to keep the rows and just replace the error values.
In my case, my data table will change over the time, means the column name may change (e.g. years), or new columns appear. So the second possibility is too static, since I do not want to change the script each time.
So I've tried to get a dynamic way to clean all columns, indepent from the column names (and number of columns). It replaces the errors by a null value.
let
Source = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
//Remove errors of all columns of the data source. ColumnName doesn't play any role
Cols = Table.ColumnNames(Source),
ColumnListWithParameter = Table.FromColumns({Cols, List.Repeat({""}, List.Count(Cols))}, {"ColName" as text, "ErrorHandling" as text}),
ParameterList = Table.ToRows(ColumnListWithParameter ),
ReplaceErrorSource = Table.ReplaceErrorValues(Source, ParameterList)
in
ReplaceErrorSource
Here the different three queries messages, after I've added two new column (with errors) to the source:
If anybody has another solution to make this kind of data cleaning, please write your post here.
let
src = Excel.CurrentWorkbook(){[Name="Tabelle1"]}[Content],
cols = Table.ColumnNames(src),
replace = Table.ReplaceErrorValues(src, List.Transform(cols, each {_, "!"}))
in
replace
Just for novices like me in Power Query
"!" could be any string as substitute for error values. I initially thought it was a wild card.
List.Transform(cols, each {_, "!"}) generates the list of error handling by column for the main funcion:
Table.ReplaceErrorValues(table_with errors, {{col1,error_str1},{col2,error_str2},{},{}, ...,{coln,error_strn}})
Nice elegant solution, Sergei
I have a query calculation that should throw me either a value (if conditions are met) or a blank/null value.
The code is in the following form:
if([attribute] > 3)
then ('value')
else ('')
At the moment the only way I could find to obtain the result is the use of '' (i.e. an empty character string), but this a value as well, so when I subsequently count the number of distinct values in another query I struggle to get the correct number (the empty string should be removed from the count, if found).
I can get the result with the following code:
if (attribute='') in ([first_query].[attribute]))
then (count(distinct(attribute)-1)
else (count(distinct(attribute))
How to avoid the double calculation in all later queries involving the count of attribute?
I use this Cognos function:
nullif(1, 1)
I found out that this can be managed using the case when function:
case
when ([attribute] > 3)
then ('value')
end
The difference is that case when doesn't need to have all the possible options for Handling data, and if it founds a case that is not in the list it just returns a blank cell.
Perfect for what I needed (and not as well documented on the web as the opposite case, i.e. dealing with null cases that should be zero).
I have a crystal report I need to modify to leave out duplicate rows by "name". So in Section Expert I am putting in a formula in Suppress and I cannot figure out how to compare the current name field being added to all the previous names that are in the group already. I was trying to use the Filter() function, but for the String array parameters I don't know what to enter that would be all of the other names previously added to the group. I need to compare the current name being added and see if it is already in the group so I can then compare another field called "date" and if the date of the field being added is more recent then the date of the duplicate name it will over write the row and only show the row with the most recent date.
Basically the question is how do I create an array with all the current fields already in the group(or does one exist already) so that I may use the Filter() function to see if the current name being added is already in that array of names added?
Well I figured it out, so for anyone who runs into this here is my solution.
first off I made a formula in the "formula fields" section that creates two arrays when reading the data from the database and keeps only one copy of each id and date in the record. Then for any other records that have the same id it will compare the date of that record to the record in the array with the same name and if the date is greater(later) then it will replace the date with the currently read in one. I named this formula field idArray.
Global StringVar Array idArray;
Global DateVar Array expArray;
BooleanVar addName;
NumberVar x;
StringVar idTest;
StringVar expDateTest;
whilereadingrecords;
(
addName := true;
for x := 1 to Ubound(idArray) step 1 do
(
if({hrpersnl.p_empno} = idArray[x]) then
(
addName := false;
if(Date({nemphist.enddate}) > expArray[x]) then
expArray[x] := Date({nemphist.enddate});
)
);
if(addName = true) then
(
reDim Preserve idArray[Ubound(idArray) + 1];
reDim Preserve expArray[Ubound(expArray) + 1];
idArray[Ubound(idArray)] := {hrpersnl.p_empno};
expArray[Ubound(expArray)] := Date({nemphist.enddate});
//idTest := idTest + ' ' + {hrpersnl.p_empno};
//expDateTest := expDateTest + ' ' + toText(Date({nemphist.enddate}));
);
//idTest
//Ubound(idArray)
//expDateTest
)
The commented out lines are what I used for testing to see how the arrays were building. I left them in there just as an example of how to debug crystal reports, since it doesn't come with a debugger.
The next step is to create a record suppression formula. In the Report menu I went to "section expert" and in the "Details" section of my group I clicked the little x-2 button next to the "Suppress (No Drill-Down)" option. I then inserted this code that looks at the current record's id and date and if the id is in the first array it will take its position and use that to retrieve the date from the second array and if the current record's date is less than the date we now know to be the max then it will suppress the record.
Global StringVar Array idArray;
Global DateVar Array expArray;
NumberVar x;
BooleanVar suppress := false;
for x := 1 to Ubound(idArray) do
(
if({hrpersnl.p_empno} = idArray[x]) then
if(Date({nemphist.enddate}) < expArray[x]) then
suppress := true;
);
if(suppress = true) then
true
else
false
Some lessons learned along the way...
Crystal Reports does global variables in a weird way. It took me a few hours of fudging around with them to figure out that you can basically use them anywhere in the report as long as you declare them in each section you put them in with the Global [vartype] "name" syntax. Even though you are re-declaring it each time Crystal does not remove the value of it or reset it or anything.
This operator ":=" is different than "=". The ":=" operator is used to set values for variables whereas the "=" seems to be used only for comparisons.
Crystal reports is really weird with its design. If you do want your formula field to return a specific variable or something you just type in that variable name without a ";" after it. Anything without a ";" after it is considered the end of the formula. So if you get this dumb "oh this code looks like its not part of the formula" error then it is because you didn't put a ";" after something and Crystal is assuming your function is ending at the location. But if you don't put a variable without a ";" after it your formula with just return "false" by default. So in my formula where I have //idTest
//Ubound(idArray)
//expDateTest
all I have to do is uncomment the variable I want to be returned and the formula will do so.
I got the following number as a string: String numberString = "079674839";
When I insert this number into a SQLite DB, SQLite automatically removes the leading zero and stores the string as 79674839. Considering affinity and that the column stores TEXT, shouldn't SQLite store the whole string and keep the leading zero?
Thanks
Double-check your database schema. As documented on Datatypes in SQLite Version 3, the column type name affects how values are processed before being stored.
Here's a Python program to demonstrate, using an in-memory database:
import sqlite3
db = sqlite3.connect(':memory:')
val = "0796";
db.execute('CREATE TABLE test (i INTEGER, r REAL, t TEXT, b BLOB);')
db.execute('INSERT INTO test VALUES (?, ?, ?, ?);', (val, val, val, val))
res = db.execute('SELECT * FROM test');
print '\t'.join([x[0] for x in res.description])
for row in res.fetchall():
print '\t'.join([repr(x) for x in row])
The output is:
i r t b
796 796.0 u'0796' u'0796'
So, it looks like your column is actually an integer type. Take a look at the schema definition (sqlite3 database.db .schema works from the command line), look at the documentation again, and make sure you are using one of type names that map to TEXT affinity. Unknown type names get INTEGER affinity.
In my own case, I was using 'STR', which ends up with the default INTEGER affinity. I changed it to 'TEXT', and SQLite started respecting my leading zeros.
Use single quotes around the number, (i.e., '079674839') if it is anywhere in inline sql code. Also, if you're doing this programatically, make sure that you are not going through a numeric conversion.