Databricks Concatenation Issue - databricks

As community member called Saideep help with an coding issue I had here
I have attempted to modify the code to search for values with without www (after removing either http or https) and prepending it to a field value.
So, at the moment, my code does the following:
You can see from the image, that code successfully removes https:// and http, but it fails to prepend the values with www in the field newwebsite_url when doesn't exists in the homepage_url field
For example movingahead.com should appear as www.movingahead.com in newwebsite_url
My code is as follows:
SELECT tt.homepage_url
,concat(iff(left(v1.RightString, 4)='www.', null, 'www.')) as addwww , LEFT(v1.RightString,COALESCE(NULLIF(CHARINDEX('/',v1.RightString)-1,-1),150)) as newwebsite_url
FROM basecrmcbreport.organizations tt
inner join (select (SUBSTRING(homepage_url,CHARINDEX('//',homepage_url)+2,150)) from basecrmcbreport.organizations)v1(RightString) on tt.homepage_url like concat('%',v1.RightString,'%') escape '|';
I know its a concatenation issue, but not where to fix it.
Any thoughts?

As you suspected, the issue here is with concatenation. When using concat in the query, we have just given the part where taking www. if it is absent and null if it is present.
But we are not concatenating this with the extracted part from URL (instead using it to create new column). To fix this, we have to place the above function value inside concat.
The following is the demonstration of the same using my sample data.
Using the following updated query, I was able to achieve your requirement.
%sql
SELECT tt.homepage_url
,concat(iff(left(v1.RightString, 4)='www.', '', 'www.'),LEFT(v1.RightString,COALESCE(NULLIF(CHARINDEX('/',replace(v1.RightString,'\\','/'))-1,-1),150))) as newwebsite_url
FROM demo tt
inner join (select (SUBSTRING(homepage_url,CHARINDEX('//',homepage_url)+2,150)) from demo)v1(RightString) on tt.homepage_url like concat('%',v1.RightString,'%') escape '|';
NOTE: I have also replaced \ with /. Also, I have used '' (empty string) instead of using null in concat.

Related

Why aren't the values inserted into the mysql2 query?

Can you tell me, please?
Why does the mysql2 value substitution not work inside the IN operator?
I do this, but nothing works.
Only the first character of the array is being substituted (number 6)
"select * from products_categories WHERE category_id IN (?)", [6,3]);
You can do it like this, of course:
IN(?,?,?,?,?,?,?,?,?,?) [6,3,1,1,1,1,1,1,1,1,1]
But that's not right, I thought that the IN should be automatically substituted from an array =(
I haven't used this, but my gut feeling tells that array items map to question marks based on indexes, so in your case 6 binds to first ? and 3 looks for another one, but doesn't find.
If I were you, I'd try to make sure that my first array item is then actually array, so I'd rewrite it:
"select * from products_categories WHERE category_id IN (?)", [[6,3]]);
I suspect you are using this with .execute(), which is short for prepared statements "prepare first if never executed before"+execute. While api is very similar to .query() one biggest difference is that in case of prepared statement only parameters are sent at execution time, unlike .query() where whole query text is interpolated with all parameters on the client. As a result, you need to send exactly the number of parameters as you have number of placeholders in original query text ( in you example - one ?). The whole [6,3,1,1,1,1,1,1,1,1,1] is treated as one parameter and sent to server as "6,3,1,1,1,1,1,1,1,1,1" string ( because during prepare step that parameter was likely reported by server as VAR_CHAR )
The solution is 1) use .query() and interpolate on the client or 2) build enough ?s dynamically and prepare different PS for different number of IN parameters

Extract a JSON property with an asterisk in the property name using get_json_object() in SparkSql

I have a table in Databricks that has a column (called "properties") which contains JSON data. I've successfully used get_json_object() in a SparkSql notebook to retrieve properties from it like so:
%sql
select distinct_id, get_json_object(properties, "$.time")
from my_table
This works well. However, there are sub-properties in the properties column that have asterisks in their names, e.g. *Plan. Accessing these properties in the standard way, e.g. $.*Plan doesn't work, since * has special meaning for get_json_object(). I've tried accessing these properties using escape chars, like so:
%sql
select distinct_id, get_json_object(properties, "$.\*Plan")
from my_table
... along with alternative escapes, but to no avail. Is there a way to extract JSON sub-properties that can escape the asterisk?
Thanks!
You can use LATERAL VIEW and json_tuple as a workaround. It's not so fussy about special characters eg:
SELECT x.*
FROM my_table
LATERAL VIEW json_tuple( properties, '*Plan' ) x
Or if you are wedded to using get_json_object you can clean up the string beforehand (although you have somewhat defeated the point of using JSON):
%sql
select
distinct_id,
get_json_object(replace(properties, '*', ''), '$.Plan' ) z
from my_table
I couldn't personally get any escaping methods to work ( eg \u0042 or \ ) but happy to be corrected.
Ideally, don't put such strange characters in your JSON in the first place.

how to get substring in JCR:SQL2?

Usecase that I am trying to solve is:
Find all page references of all components under /apps.
i.e. First find all pages where a component is being used, and
then do this for all components under /apps.
By using the report builder tool for Adobe AEM: https://adobe-consulting-services.github.io/acs-aem-commons/features/report-builder/configuring.html
Query I am trying:
SELECT * FROM [nt:base] AS s
WHERE [sling:resourceType] IN (SELECT path FROM [cq:Component] AS s WHERE [componentGroup] IS NOT NULL AND ISDESCENDANTNODE([/apps]))
AND ISDESCENDANTNODE([/content])
Background:
I only need to sanitize the resultset from inner query.
Without sanitization, it would spit path of the form /apps/acs-commons/components/utilities/report-builder/columns/text
while sling:resourceType from outer query can only accept acs-commons/components/utilities/report-builder/columns/text.
So I need to strip out /apps/ from the inner query resultset path.
Here is the error message:
Caused by: java.text.ParseException: Query: SELECT * FROM [nt:base] AS s
WHERE [sling:resourceType] IN (SELECT(*)CAST(path, AS STRING) FROM [cq:Component] AS s WHERE [componentGroup] IS NOT NULL AND ISDESCENDANTNODE([/apps]))
AND ISDESCENDANTNODE([/content]); expected: static operand
I don't think you can manipulate result set using jcr sql2 syntax, stored procs are usually used to manipulate result sets akin to PL/SQL and I did not find any reference to this in jcr docs. In fact, to my knowledge jcr does not even support aggregate functions like MAX(), COUNT(), etc
Hacky way to do this -> you would probably have to execute the inner query first to retrieve all the components in /apps, modify the result set manually(stripping out /apps) and feed it to the outer query.
SELECT * FROM [nt:unstructured] AS comp
WHERE ISDESCENDANTNODE(comp, "/content/prj")
AND [sling:resourceType] IN ("prj/components/content/accordion","prj/components/content/breadcrumb")
To fasten the process, you can use text editors like notepad++ which helps you with block selection (ctrl + alt + shift and then left click mouse button and drag to select) to remove /apps, add start/end double quotes, comma and replace newline char to get it all in one line and construct the overall query.
Would be interested to know what others think and if this can be accomplished with just jcr sql2syntax.

pg-promise creating column error

var name = req.body.name;
db.any('alter table "houseList" add $1 text', [name])
I tried to add a new column to database host on heroku using the above code in nodejs but I keep getting this error :
error: syntax error at or near "'haha'"
'haha' is the value inside name, anyone have any idea what is wrong?
You are using invalid escaping for the column name, as a regular string variable.
Any schema/table/column name are referred to as SQL Names, and must be escaped using "".
Within pg-promise that means you must use its SQL Names support, with :name: or ~ modifier.
db.any('alter table "houseList" add $1:name text', [name])
or
db.any('alter table "houseList" add $1~ text', [name])
Also, if you sure that you are only using simple names, i.e. no white spaces, no capitals, then you can use the name directly, unescaped, which means using the Raw Text, via modifier :raw or ^. But generally, it is not recommended, i.e. escaping the names is recommended as safer ;)

How do I make a WHERE clause with SQLalchemy to compare to a string?

Objective
All I am trying to do is retrieve a single record from a specific table where the primary key matches. I have a feeling I'm greatly over complicating this as it seems to be a simple enough task. I have a theory that it may not know the variable value because it isn't actually pulling it from the Python code but instead trying to find a variable by the same name in the database.
EDIT: Is it possible that I need to wrap my where clause in an expression statement?
Attempted
My Python code is
def get_single_record(name_to_search):
my_engine = super_secret_inhouse_engine_constructor("sample_data.csv")
print("Searching for " + name_to_search)
statement = my_engine.tables["Users"].select().where(my_engine.tables["Users"].c.Name == name_to_search)
# Print out the raw SQL so we can see what exactly it's checking for
print("You are about to run: " + str(statement))
# Print out each result (should only be one)
print("Results:")
for item in my_engine.execute(statement):
print(item)
I tried hard coding a string in its place.
I tried using like instead of where.
All to the same end result.
Expected
I expect it to generate something along the lines of SELECT * FROM MyTable WHERE Name='Todd'.
Actual Result
Searching for Todd
STATEMENT: SELECT "Users"."Name", ...
FROM "Users"
WHERE "Users"."Name" = ?
That is an actual question mark appearing my statement, not simply my own confusion. This is then followed by it printing out a collection of all the records from the table, as though it successfully matched everything.
EDIT 2: Running either my own hard coded SQL string or the generated query by Alchemy returns every record from the table. I'm beginning to think the issue may be with the engine I've set up not accepting the query.
Why I'm Confused
According to the official documentation and third party sources, I should be able to compare to hardcoded strings and then, by proxy, be able to compare to a variable.

Resources