Azure SQL: join of 2 tables with 2 unicode fields returns empty when matching records exist - unicode-string

I have a table with a few key columns created with nvarchar(80) => unicode.
I can list the full dataset with SELECT * statement (Table1) and can confirm the values I need to filter are there.
However, I can't get any results from that table if I filter rows by using as input an alphabet char on any column.
Columns in table1 stores values in cyrilic characters.
I know it must have to do with character encoding => what I see in the result list is not what I use as input characters.
Unicode nvarchar type should resolve automatically this character type mismatch.
What do you suggest me to do in order to get results?
Thank you very much.
Paulo

Related

How to convert a column of string array to array format and coalesce the first non null value in the dataset

I have dataset which consists of two columns. Where "Values" column consists of string in list/array and the column datatype is char. I need to get coalesce first non null value in the new column since we also have null values other rows. I am new to SAS. Could you please help me with the solution.
Required Output
So you have a long string with comma separated values? You can use SCAN() to select one item from the list. Since your list has extra [ and ] you can just include those extra characters in the set of delimiter characters for SCAN().
data want;
set have;
first = scan(values,1,'[,]');
run;
If the values can include the delimiter use the 'q' modifier. That will ignore delimiters that are inside of quoted strings. If you want to remove the quotes from the result use the DEQUOTE() function.
data want;
set have;
first = dequote(scan(values,1,'[,]','q'));
run;

How to Flatten a semicolon Array properly in Azure Data Factory?

Context: I've a data flow that extracts data from SQL DB, when data comes is just one column with a string separated by tab, in order to manipulate the data properly, I've tried to separate every single column with its corresponding data:
Firstly, to 'rebuild' the table properly I used a 'Derived Column' activity replacing tab with semicolons instead (1)
dropLeft(regexReplace(regexReplace(regexReplace(descripcion,[\t],';'),[\n],';'),[\r],';'),1)
So, after that use 'split()' function to get an array and build the columns (2)
split(descripcion, ';')
Problem: When I try to use 'Flatten' activity (as here https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten), is just not working and data flow throws me just one column or if I add an additional column in the 'Flatten' activity I just get another column with the same data that the first one:
Expected output:
column2
column1
column3
2000017
ENVASE CORONA CLARA 24/355 ML GRAB
PC13
2004297
ENVASE V FAM GRAB 12/940 ML USADO
PC15
Could you say me what i'm doing wrong, guys? thanks by the way.
You can use the derived column activity itself, try as below.
After the first derived column, what you have is a string array which can just be split again using derived schema modifier.
Where firstc represent the source column equivalent to your column descripcion
Column1: split(firstc, ';')[1]
Column2: split(firstc, ';')[2]
Column3: split(firstc, ';')[3]
Optionally you can select the columns you need to write to SQL sink

Hive ORC table empty string

I have a Hive table whit data stored as ORC.
I write in some fields empty values (blank, '"") but sometimes when I run a select query on this table the empty string columns are shown as NULL in the query result.
I would like to see the empty values I entered, how is this possible?
If you want to see, empty values for NULL in hive table, then you can use NVL function, which can help you to produce default values for NULL column values.
Below is syntax,
NVL(arg1, arg2) - here argument 1 is expression or column and arg2 is default value for
NULL values.
e.g. Query - SELECT NVL(blank,'') as blank_1 AS FROM db.table;

Power Query: How to delete duplicate characters from a string (eg. xzxxxzzzzxzzzzx-> leave only xz)?

I have a huge table in Power Query with text in cells that consist of multiple 'x's and 'z's. I want to deduplicate values so I have one x and one z only.
For example:
xzzzxxxzxz-> xz
zzzzzzzzzz-> z
The table is very big, so I don't want to create additional columns. Can you please help?
You can convert the string to a list of characters, make the list distinct (remove duplicates), sort (if desired), and then transform back to text.
= Table.TransformColumns(#"Previous Step", {{"ColumnName",
each Text.Combine( List.Sort( List.Distinct( Text.ToList(_) ) ) ),
type text}})

Need initial N characters of column in Postgres where N is unknown

I have one column in my table in Postgres let's say employeeId. We do some modification based on the employee type and store it in DB. Basically, we append strings from these 4 strings ('ACR','AC','DCR','DC'). Now we can have any combination of these 4 strings appended after employeeId. For example, EMPIDACRDC, EMPIDDCDCRAC etc. These are valid combinations. I need to retrieve EMPID from this. EMPID length is not fixed. The column is of varying length type. How can this be done in Postgres?
I am not entirely sure I understand the question, but regexp_replace() seems to do the trick:
with sample (employeeid) as (
values
('1ACR'),
('2ACRDCR'),
('100DCRAC')
)
select employeeid,
regexp_replace(employeeid, 'ACR|AC|DCR|DC.*$', '', 'gi') as clean_id
from sample
returns:
employeeid | clean_id
-----------+---------
1ACR | 1
2ACRDCR | 2
100DCRAC | 100
The regular expression says "any character after any of those string up to the end of the string" - and that is then replace with nothing. This however won't work if the actual empid contains any of those codes that are appended.
It would be much cleaner to store this information in two columns. One for the empid and one for those "codes"

Resources