string concat operator(||) throwing error in hive - apache-spark

I am trying to concat string columns in table with concat operator || and throwing error.
Here is the query: select "Bob"||'~'||"glad" from table
its throwing error as : ParseException - cannpt recognize input near '|' '"~"' '|' in expression specification
its works with concat function but not with concat operator.
select concat("bob","~","glad") from table - its working
I am using hive version 2.1 and could anyone tell me why this operator not working?
Thanks,Babu

Hive doesnt support concat operator ||, its oracle syntax. Please use concat function to concat multiple values. You can use concat_ws to concat with a delimiter.
concat
select concat ('this','~','is','~','hello','~','world');
Output : this~is~hello~world
select concat_ws ('~','hello','world','is','not','enough');
Output : hello~world~is~not~enough

Related

Cannot use custom SQL function with arguments inside transform scope [Spark SQL] (Error in SQL statement: AnalysisException: Resolved attribute(s)...)

I am using a Spark SQL context in Azure Databricks.
My query uses the transform function for handling an array like so:
SELECT
colA,
colB,
transform(colC,
x -> named_struct(
"innerColA", functionA(x.innerColA), -- does not work
"innerColB", [...x.innerColB...], -- works (same logic as functionA)
"test1", test1(), -- works
"test2", test2(x.innerColA) -- does not work
)
)
FROM
tableA
I get the following error regarding the use of functionA:
Error in SQL statement: AnalysisException: Resolved attribute(s) x#2723416 missing from in operator !Project [cast(lambda x#2723416 as string) AS arg1#2723417].
functionA is simple enough that, if I rewrite it directly into the query, it works (as shown using "innerColC" of my code example.
I have tested with simple functions that don't take any arguments and they can be used without any issues:
CREATE OR REPLACE FUNCTION test1() RETURNS STRING RETURN "test"
But if you have any arguments, it throws that error:
CREATE OR REPLACE FUNCTION test2(arg1 STRING) RETURNS STRING RETURN "test"
Is that a limitation of SparkSQL? Are there any workarounds?

Mapping data flow SQL query and Parameters failing

In my mapping dataflow I have simplified this down to dimdate just for the test
My parameters are
The source even tells you exactly how to enter the select query if you are using parameters which is what I'm trying to achieve
Then I import but get two different errors
for parameterizing a table`
SELECT * FROM {$df_TableName}
I get
This error from a select * or invidiual columns
I've tried just the WHERE clause (what I actually need) as a parameter but keep getting datatype mismatch errors
I then started testing multiple ways and it only allows the schema to be parameterised from my queries below
all of these other options seem to fail no matter what I do
SELECT * FROM [{$df_Schema}].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[{$df_TableName}] Where [Period] = {$df_incomingPeriod}
SELECT * FROM [dbo].[DimDate] Where [Period] = 2106
I know there's an issue with the Integer datatype but don't know how to pass this to the query within the parameter without changing its type as the sql engine cannot run [period] as a string
Use CONCAT function in expression builder to build the Query in Dataflow.
concat(<this> : string, <that> : string, ...) => string
Note: Concatenates a variable number of strings together. All the variables should be in form of strings.
Example 1:
concat(toString("select * from "), toString($df_tablename))
Example 2:
concat(toString("select * from "), toString($df_tablename), ' ', toString(" where incomingperiod = "), toString($df_incomingPeriod))
Awesome, it worked like magic for me. I was struggling with parameterizing tables= names which I was passing through Array list.
Created a data flow parameter and gave this value:
#item().TABLE_NAME

Oracle INSTR equivalent in Spark SQL

I tried to replicate the oracle Instr function, but it seems to me that there are not all the arguments that exist in Oracle. I receive this error and I would like to include this transformation in a "plataforma" field in the table but I can't:
SELECT
SUBSTR(a.SOURCE, 0, INSTR(a.SOURCE, '-', 1, 2) - 1) AS plataforma,
COUNT(*) AS qtd
FROM db1.table AS as a
LEFT JOIN db1.table2 AS b ON a.ID=b.id
GROUP BY SUBSTR(a.SOURCE, 0, INSTR(a.SOURCE, '-', 1, 2) - 1)
ORDER BY qtd
The Apache Spark 2.0 database encountered an error while running this
query.
Error running query: org.apache.spark.sql.AnalysisException: Invalid number of arguments for function instr. Expected: 2; Found: 4;
line 8 pos 45
I made the transformation of the field that way but I don't know if it is the correct one:
How can I replicate the same Oracle function in Spark? I need to do just this:
Source:
apache-spark-sql
sql-server-dw
Result:
apache-spark
sql-server
What you're looking for is substring_index function :
substring_index('apache-spark-sql', '-', 2)
It returns the substring before 2 occurrences of -.
I suppose you want to get the substring before the last occurrence of -. So you can count the number of - in the input string and combine it with substring_index function like this:
substring_index(col, '-', size(split(col, '-')) - 1)
Where size(split(col, '-')) - 1 gives the number of occurences of -.

What does "Correlated scalar subqueries must be Aggregated" mean?

I use Spark 2.0.
I'd like to execute the following SQL query:
val sqlText = """
select
f.ID as TID,
f.BldgID as TBldgID,
f.LeaseID as TLeaseID,
f.Period as TPeriod,
coalesce(
(select
f ChargeAmt
from
Fact_CMCharges f
where
f.BldgID = Fact_CMCharges.BldgID
limit 1),
0) as TChargeAmt1,
f.ChargeAmt as TChargeAmt2,
l.EFFDATE as TBreakDate
from
Fact_CMCharges f
join
CMRECC l on l.BLDGID = f.BldgID and l.LEASID = f.LeaseID and l.INCCAT = f.IncomeCat and date_format(l.EFFDATE,'D')<>1 and f.Period=EFFDateInt(l.EFFDATE)
where
f.ActualProjected = 'Lease'
except(
select * from TT1 t2 left semi join Fact_CMCharges f2 on t2.TID=f2.ID)
"""
val query = spark.sql(sqlText)
query.show()
It seems that the inner statement in coalesce gives the following error:
pyspark.sql.utils.AnalysisException: u'Correlated scalar subqueries must be Aggregated: GlobalLimit 1\n+- LocalLimit 1\n
What's wrong with the query?
You have to make sure that your sub-query by definition (and not by data) only returns a single row. Otherwise Spark Analyzer complains while parsing the SQL statement.
So when catalyst can't make 100% sure just by looking at the SQL statement (without looking at your data) that the sub-query only returns a single row, this exception is thrown.
If you are sure that your subquery only gives a single row you can use one of the following aggregation standard functions, so Spark Analyzer is happy:
first
avg
max
min

How to replace one or more consecutive symbols with one symbol in DB2

I am using DB2 LUW 9.5. In a field, I have a value like this one:
Test^test^^test^^^^test^^test^test
In a SELECT query, I would like to replace the duplicated ^ with only one ^. This would produce:
Test^test^test^test^test^test
The delimiter is known and static (can be hardcoded). Would you know a way to obtain the desired output using DB2 functions?
Thank you
You need one other character that can be used as delimiter, for example the pipe sign (|).
Let's say the table is defined as
create table myTable (
myColumn varchar(400)
);
Add a value for a test:
insert into myTable (myColumn) values
('Test^^^^^^^^test^^^^^^^test^^^^^^test^^^^^test^^^^test^^^test^^test^test');
Then do a smart replacement with use of the other delimiter
select replace(replace(replace(myColumn, '^^', '^|^'), '|^^', ''), '^|^', '^')
from myTable;
The result:
Test^test^test^test^test^test^test^test^test^test
Instead of using a one character delimiter you can use a string of which you are sure it will not occur in the values, for example 'xy'. The next query will give the same results:
select replace(replace(replace(myColumn, '^^', '^xy^'), 'xy^^', ''), '^xy^', '^')
from myTable;

Resources