Query failing for seemingly identical strings? - string

I have the following two example queries:
select * from schema.test where id = 'Z6_SGIL_2115'
select * from schema.test where id = 'Z6_SGIL_2115'
The top query fails, the second one works. When the first query failed I copied & pasted the id value directly from the table into the query and it succeeded.
The strings look identical but obviously aren't.
What is the difference between the two? Is it possible one or more of the characters are encoded?

Unprintable data in the search patterns/strings can be detected with any hex editor or: http://www.babelstone.co.uk/Unicode/whatisit.html

Related

Expression.Error: We cannot convert the value null to type Text. Details: Value= Type=Type

I want to combine multiple files with different headings and i found this video which was perfect. I get up to minute 12 and it fails and i get this error "Expression.Error: We cannot convert the value null to type Text.
Details:
Value=
Type=Type"
my code for the power query is = Table.TransformColumnNames(stuff_Table, each List.Accumulate(Table.ToRecords(Headings), _ , (state, current)=>Text.Replace(Text.Upper(state), current [BEFORE],current[AFTER]) ))
I want to combine three files. One with a template that will be used for tableau but is otherwise blank, and the other two files with the data but in inconsistent formatting. How do i fix this?
It sounds like you are trying to apply a transformation that takes TEXT as input on data of type NULL. This means that PowerQuery is expecting a textual input but receives nothing. You need to make sure it receives a textual input (by adapting your source & replacing null with a text value), or that your PowerQuery formula handles null values explicitly (typically by some variation of "if field = null then "empty" else field").
Text.Replace(Text.Upper(state) ==> Most likely, the state field is NULL in at least one instance (one line) of your source. Maybe the Tableau template file?
Try replacing empty fields by "null" or "empty" or "no state" or even a white space " "). This can be done either manually within the source (transform the column or create a new column) or within PowerQuery.
A quick google search on "Power query replace Null with text" gives a range of options.
How can I perform COALESCE in power query?
https://www.edureka.co/community/40467/replace-null-values-custom-values-power-power-query-editor
https://community.powerbi.com/t5/Desktop/Replace-NULL/m-p/106183

Oracle conditionally adding spaces into data

I have a table that was given to me with some 'incorrect' data. The format of the data should be:
"000 00000"
There are TWO spaces. WHERE the spaces are can be different for different records, so for example one record could be the previous example and another could be "00 00 0000". The problem is that the data came in, in some instances with only a single space. (so "000 00000").
Ideally, id like to do this in a query to fix the data that's been loaded via an update statement. If this is easier done outside of oracle, that's fine, I can re-load the data (its a bit of data, at almost 400,000 rows).
What would be the easiest way to find the single space and add another as needed, or leave it alone if there are already two spaces?
I am currently working on a query to split the string ON the spaces, trim the data then put it all back together with 2 spaces.... its not working out too well in testing.
Thanks in advance!
here is the query to find single space record , try making CASE statement as needed.
WITH sample_data AS (SELECT '000 00000' value FROM dual UNION ALL
SELECT '00 00 0000' value FROM dual UNION ALL
SELECT '000 00000' value FROM dual )
Select * from sample_data where REGEXP_COUNT(VALUE,'[[:space:]]') =1

LIKE clause in Sybase/SAP ASE trimmed at the end?

The emp table below has no ENAME ending in three spaces. However, the following SQL statement behaves like the clause is trimmed at the end (like a '%' pattern), because it returns all records:
select ENAME from dbo.emp where ENAME like '% '
I tried many other database platforms (including SQL Server, SQL Anywhere, Oracle, PostgreSQL, MySQL etc), I've seen this happening only in Sybase/SAP ASE (version 16). Is this a bug or is it "by design"? Nothing specific found in the online spec in this regard.
I'm looking for a generic fix, to apply some simple transformation to the pattern and return what is expected from any other platform. Without knowing in advance what data type the field is or what kind of data it holds.
This is caused by the VARCHAR semantics in ASE, which will always strip leading spaces from a value before operating on it. This is applied to the string '% ' before it is used, since that is a VARCHAR value by definition. This is indeed a particular semantic of ASE.
Now, you could try working around this by using the [ ] wildcard to match a space, but there are some things to be aware of. First, the column being matched (ENAME) must be CHAR, not VARCHAR, otherwise any trialing spaces will have been stripped as well before they were stored. Assuming the column is CHAR, then using a pattern '%[ ][ ][ ]' unfortunately still does not appear to work. I think there may be some trailing-space-stripping still happening here.
The best way to work around this is to use an artificial end-of-field delimiter which will not occur in the data, e.g.
ENAME||'~EOF~' like '% ~EOF~'
This works. But note that the column ENAME must still be CHAR rather than VARCHAR.
Like behavior is somehow documented in here .
For VARCHAR columns this will never work because ASE removes the trailing spaces
For CHAR it depends how do you insert the data.. in a char(10) column , if you insert 2 characters , ASE will add 8 blank spaces after the 2 characters to make them 10 .. so when you query , you will get this 2 characters entry as part of the result set because it includes more than 3 trailing spaces..
If this is not a problem for you, instead of like you can use char_index () which will count the trailing spaces and won't truncate them as like, so you could write something like :
select ENAME from dbo.emp where char_index(' ',ENAME) >0
Or you can calculate the trailing spaces , then check if your 3 spaces come after that or not , like :
select a from A
where charindex(' ',a) > (len(a) - len(convert (varchar(10) , a)))
Now again, this will get you more rows than expected if the data were inserted in a non-uniform count, but will work perfectly if you know exactly what to search for.
SELECT ename from dbo.emp where RIGHT(ENAME ,3) = '      '

Redshift: Truncate VARCHAR value automatically on INSERT or maybe use max length?

When performing an INSERT, Redshift does not allow you to insert a string value that is longer/wider than the target field in the table. Observe:
CREATE TEMPORARY TABLE test (col VARCHAR(5));
-- result: 'Table test created'
INSERT INTO test VALUES('abcdefghijkl');
-- result: '[Amazon](500310) Invalid operation: value too long for type character varying(5);'
One workaround for this is to cast the value:
INSERT INTO test VALUES('abcdefghijkl'::VARCHAR(5));
-- result: 'INSERT INTO test successful, 1 row affected'
The annoying part about this is that now all of my code will have to have these cast statements on every INSERT for each VARCHAR field like this, or the application code will have to truncate the string before trying to construct the query; either way, it means that the column's width specification has to go into the application code, which is annoying.
Is there any better way of doing this with Redshift? It would be great if there was some option to just have the server truncate the string and perform (and maybe raise a warning) the way it does with MySQL.
One thing I could do is just declare these particular fields as a very large VARCHAR, perhaps even 65535 (the maximum).
create table analytics.testShort (a varchar(3));
create table analytics.testLong (a varchar(4096));
create table analytics.testSuperLong (a varchar(65535));
insert into analytics.testShort values('abc');
insert into analytics.testLong values('abc');
insert into analytics.testSuperLong values('abc');
-- Redshift reports the size for each table is the same, 4 mb
The one disadvantage of this approach I have found is that it will cause bad performance if this column is used in a group by/join/etc:
https://discourse.looker.com/t/troubleshooting-redshift-performance-extensive-guide/326
(search for VARCHAR)
I am wondering though if there is no harm otherwise if you plan to never use this field in group by, join, and the like.
Some things to note in my scenario: Yes, I really don't care about the extra characters that may be lost with truncation, and no, I don't have a way to enforce the length of the source text. I am capturing messages and URLs from external sources which generally fall into certain range in length of characters, but sometimes there are longer ones. It doesn't matter in our application if they get truncated or not in storage.
The only way to automatically truncate the strings to match the column width is using the COPY command with the option TRUNCATECOLUMNS
Truncates data in columns to the appropriate number of characters so
that it fits the column specification. Applies only to columns with a
VARCHAR or CHAR data type, and rows 4 MB or less in size.
Otherwise, you will have to take care of the length of your strings using one of these two methods:
Explicitly CAST your values to the VARCHAR you want:
INSERT INTO test VALUES(CAST('abcdefghijkl' AS VARCHAR(5)));
Use the LEFT and RIGHT string functions to truncate your strings:
INSERT INTO test VALUES(LEFT('abcdefghijkl', 5));
Note: CAST should be your first option because it handles multi-byte characters properly. LEFT will truncate based on the number of characters not bytes and if you have a multi-byte character in your string, you might end up exceeding the limit of your column.

Quick SQL question

Working on postgres SQL.
I have a table with a column that contains values of the following format:
Set1/Set2/Set3/...
Seti can be a set of values for each i. They are delimited by '/'.
I would like to show distinct entries of the form set1/set2 and that is - I would like to trim or truncate the rest of the string in those entries.
That is, I want all distinct options for:
Set1/Set2
A regular expression would work great: I want a substring of the pattern: .*/.*/
to be displayed without the rest of it.
I got as far as:
select distinct column_name from table_name
but I have no idea how to make the trimming itself.
Tried looking in w3schools and other sites as well as searching SQL trim / SQL truncate in google but didn't find what I'm looking for.
Thanks in advance.
mu is too short's answer is fine if the the lengths of the strings between the forward slashes is always consistent. Otherwise you'll want to use a regex with the substring function.
For example:
=> select substring('Set1/Set2/Set3/' from '^[^/]+/[^/]+');
substring
-----------
Set1/Set2
(1 row)
=> select substring('Set123/Set24/Set3/' from '^[^/]+/[^/]+');
substring
--------------
Set123/Set24
(1 row)
So your query on the table would become:
select distinct substring(column_name from '^[^/]+/[^/]+') from table_name;
The relevant docs are http://www.postgresql.org/docs/8.4/static/functions-string.html
and http://www.postgresql.org/docs/8.4/static/functions-matching.html.
Why do you store multiple values in a single record? The preferred solution would be multiple values in multiple records, your problem would not exist anymore.
Another option would be the usage of an array of values, using the TEXT[] array-datatype instead of TEXT. You can index an array field using the GIN-index.
SUBSTRING() (like mu_is_too_short showed you) can solve the current problem, using an array and the array functions is another option:
SELECT array_to_string(
(string_to_array('Set1/Set2/Set3/', '/'))[1:2], '/' );
This makes it rather flexible, there is no need for a fixed length of the values. The separator in the array functions will do the job. The [1:2] will pick the first 2 slices of the array, using [1:3] would pick slices 1 to 3. This makes it easy to change.
If they really are that regular you could use substring; for example:
=> select substring('Set1/Set2/Set3/' from 1 for 9);
substring
-----------
Set1/Set2
(1 row)
There is also a version of substring that understands POSIX regular expressions if you need a little more flexibility.
The PostgreSQL online documentation is quite good BTW:
http://www.postgresql.org/docs/current/static/index.html
and it even has a usable index and sensible navigation.
If you want to use .*/.* then you'd want something like substring(s from '[^/]+/[^/]+'), for example:
=> select substring('where/is/pancakes/house?' from '[^/]+/[^/]+');
substring
-----------
where/is
(1 row)

Resources