Turning a simple json array to a csv string using PostgreSQL - string

I tried looking in the already open questions but nothing helped me unfortunately.
I'm trying to convert a json array to a string separated by commas, using the jsonb_array_elements and the string_agg function, but I keep getting the same error no matter how I try - "ERROR: cannot extract elements from a scalar"
Let's assume that my JSON data looks like: { "id" : "hi", "list" : ["a", "b", "c"] }.
What I'm trying to get is a query that results in a row that looks like : [id, list participants] = [hi, a,b,c]
I tried some different methods but the most recent one was:
select (select string_agg(t->>0, ',') from jsonb_array_elements(data->'list') as t) from my_table
Would really appreciate any help with this

You can do this for example
with c as (
select 1,'{ "id" : "hi", "list" : ["a", "b", "c"] }'::jsonb->>'id' as value
union
select 2,string_agg(t->>0, ',') from jsonb_array_elements('{ "id" : "hi", "list" : ["a", "b", "c"] }'::jsonb->'list') as t
)
select string_agg(c.value,',') from c order by 1

So, after a lot of attempts with many different methods - every method attempted has resulted in the same "ERROR: cannot extract elements from a scalar".
I eventually figured out that the query was not the problem, instead it was the data!
If one of the lists in the data is empty( "{[]}" ) it would result in that error. So after excluding those empty lists from the query using a WHERE clause, I finally got it working! Thanks, hopefully this post will help future frustrated query users :)
For reference, the method I eventually chose is:
SELECT t.tbl, d.list
FROM tbl t, LATERAL (
SELECT string_agg(value::text, ', ') AS list
FROM json_array_elements_text(t.data->'tags')
) d

Related

Cosmos db null value

I have two kind of record mention below in my table staudentdetail of cosmosDb.In below example previousSchooldetail is nullable filed and it can be present for student or not.
sample record below :-
{
"empid": "1234",
"empname": "ram",
"schoolname": "high school ,bankur",
"class": "10",
"previousSchooldetail": {
"prevSchoolName": "1763440",
"YearLeft": "2001"
} --(Nullable)
}
{
"empid": "12345",
"empname": "shyam",
"schoolname": "high school",
"class": "10"
}
I am trying to access the above record from azure databricks using pyspark or scala code .But when we are building the dataframe reading it from cosmos db it does not bring previousSchooldetail detail in the data frame.But when we change the query including id for which the previousSchooldetail show in the data frame .
Case 1:-
val Query = "SELECT * FROM c "
Result when query fired directly
empid
empname
schoolname
class
Case2:-
val Query = "SELECT * FROM c where c.empid=1234"
Result when query fired with where clause.
empid
empname
school name
class
previousSchooldetail
prevSchoolName
YearLeft
Could you please tell me why i am not able to get previousSchooldetail in case 1 and how should i proceed.
As #Jayendran, mentioned in the comments, the first query will give you the previouschooldetail document wherever they are available. Else, the column would not be present.
You can have this column present for all the scenarios by using the IS_DEFINED function. Try tweaking your query as below:
SELECT c.empid,
c.empname,
IS_DEFINED(c.previousSchooldetail) ? c.previousSchooldetail : null
as previousSchooldetail,
c.schoolname,
c.class
FROM c
If you are looking to get the result as a flat structure, it can be tricky and would need to use two separate queries such as:
Query 1
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
p.prevSchoolName,
p.YearLeft
FROM c JOIN c.previousSchooldetail p
Query 2
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
null as prevSchoolName,
null as YearLeft
FROM c
WHERE not IS_DEFINED (c.previousSchooldetail) or
c.previousSchooldetail = null
Unfortunately, Cosmos DB does not support LEFT JOIN or UNION. Hence, I'm not sure if you can achieve this in a single query.
Alternatively, you can create a stored procedure to return the desired result.

AWS Athena working with nested arrays, trying to search for a field within the array

I have a sql query:
SELECT id_str, entities.hashtags
FROM tweets, unnest(entities.hashtags) as t(hashtag)
WHERE cardinality(entities.hashtags)=2 and id_str='1248585590573948928'
limit 5
which returns:
id_str hashtags
1248585590573948928 [{text=LUCAS, indices=[75, 81]}, {text=WayV, indices=[83, 88]}]
1248585590573948928 [{text=LUCAS, indices=[75, 81]}, {text=WayV, indices=[83, 88]}]
The unnesting has returned the row twice which originally was one row, this is because there are 2 objects in this array.
The next part I wanted to add to the sql query was
select hashtag['text'] as htag to the existing select which should return 2 rows still but this time returning LUCAS and WayV in the separate rows in same column, named htag.
But I get this error - any idea what I am doing wrong?
Your query has the following error(s):
SYNTAX_ERROR: line 1:8: '[]' cannot be applied to row(text varchar,indices array(bigint)), varchar(4)
I assume it is because I have another array within this array.. ?
Thanks in advance
I'm not entirely sure where you're adding the hashtag['text'] expression, so I can't say with confidence what your problem is, but I have two suggestions for you to try:
The error says that hashtag is of type row(text varchar, …), which suggests that hashtag.text should work.
If that doesn't work, you can try using element_at e.g. element_at(hashtag, 'text').
I came across this issue as well and since there is no solution provided I like to chip in:
After you unnest an array, you can address the result with a . reference instead of ['']:
WITH dataset AS (
SELECT ARRAY[
CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
] AS users
)
SELECT
user,
user.name
FROM dataset
cross join unnest (users) as t(user)

Cast some columns and select all columns without explicitly writing column names

I want to cast some columns and then select all others
id, name, property, description = column("id"), column("name"), column("property"), column("description")
select([cast(id, String).label('id'), cast(property, String).label('property'), name, description]).select_from(events_table)
Is there any way to cast some columns and select all with out mentioning all column names
I tried
select([cast(id, String).label('id'), cast(property, String).label('property')], '*').select_from(events_table)
py_.transform(return_obj, lambda acc, element: acc.append(dict(element)), [])
But I get two extra columns (total 7 columns) which are cast and I can't convert them to dictionary which throws key error.
I'm using FASTAPI, sqlalchemy and databases(async)
Thanks
Pretty sure you can do
select_columns = []
for field in events_table.keys()
select_columns.append(getattr(events_table.c, field))
select(select_columns).select_from(events_table)
to select all fields from that table. You can also keep a list of fields you want to actually select instead of events_table.keys(), like
select_these = ["id", "name", "property", "description"]
select_columns = []
for field in select_these
select_columns.append(getattr(events_table.c, field))
select(select_columns).select_from(events_table)

Azure CosmosDB Unexpected Pagination Behaviour

We are using CosmosDb C# SDK
We tried both: "Microsoft.Azure.Cosmos 3.4.1", "Microsoft.Azure.DocumentDB.Core 2.9.1 and 2.4.2"
We are getting invalid results and the main problems is the ResponseContinuation
[{"token":null,"range":{"min":"05C1DFFFFFFFFC","max":"FF"}}]
This started showing in one of our smaller service with only 14 documents.
In all queryes we use the folowing headers:
"x-ms-documentdb-query-enablecrosspartition" = true
"x-ms-max-item-count" = 100
Query 1:
The query is the folowing SELECT * FROM c.
We get the folowing response:
- 7 items
- ResponseContinuation [{"token":null,"range":{"min":"05C1DFFFFFFFFC","max":"FF"}}]
Then we use the continuation token to get the other 7 items.
Query 2:
If we modify the query to SELECT * FROM c ORDER BY c.property ASC, the order gets messed up! (responses are simplified)
- we get the folowing result ["A", "B", "C", "D", "F"]
- and the second query ["C", "D", "G"]
Query 3:
if we want to find only one item SELECT TOP 1 * FROM c WHERE c.name = #name, and the item is in the "second query result"
- nothing and RequestContionuation {"top":1,"sourceToken":"[{\"token\":null,\"range\":{\"min\":\"05C1DFFFFFFFFC\",\"max\":\"FF\"}}]"}
This is all a really unexpected behaviour.
Why does ORDER BY, TOP even exist if we can't even use it properly..
We can't afford to list all data from cosmos to our server and then do ordering, expecialy on bigger containers.
Edit: github issue link: https://github.com/Azure/azure-cosmos-dotnet-v3/issues/1001

U-SQL Error in Naming the Column

I have a JSON where the order of fields is not fixed.
i.e. I can have [A, B, C] or [B, C, A]
All A, B, C are json objects are of the form {Name: x, Value:y}.
So, when I use USQL to extract the JSON (I don't know their order) and put it into a CSV (for which I will need column name):
#output =
SELECT
A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING),
B["Value"] ?? "0" AS "System_" + B["Name"],
System_da
So, I am trying to put column name as the "Name" field in the JSON.
But am getting the error at #### above:
Message
syntax error. Expected one of: FROM ',' EXCEPT GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE ';' ')'
Resolution
Correct the script syntax, using expected token(s) as a guide.
Description
Invalid syntax found in the script.
Details
at token '(', line 74
near the ###:
**************
I am not allowed to put the correct column name "dynamically" and it is an absolute necessity of my issue.
Input: [A, B, C,], [C, B, A]
Output: A.name B.name C.name
Row 1's values
Row 2's values
This
#output =
SELECT
A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING),
B["Value"] ?? "0" AS "System_" + B["Name"],
System_da
is not a valid SELECT clause (neither in U-SQL nor any other SQL dialect I am aware of).
What is the JSON Array? Is it a key/value pair? Or positional? Or a single value in the array that you want to have a marker for whether it is present in the array?
From your example, it seems that you want something like:
Input:
[["A","B","C"],["C","D","B"]]
Output:
A B C D
true true true false
false true true true
If that is the case, I would write it as:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#input =
SELECT "[[\"A\", \"B\", \"C\"],[\"C\", \"D\", \"B\"]]" AS json
FROM (VALUES (1)) AS T(x);
#data =
SELECT JsonFunctions.JsonTuple(arrstring) AS a
FROM #input CROSS APPLY EXPLODE( JsonFunctions.JsonTuple(json).Values) AS T(arrstring);
#data =
SELECT a.Contains("A") AS A, a.Contains("B") AS B, a.Contains("C") AS C, a.Contains("D") AS D
FROM (SELECT a.Values AS a FROM #data) AS t;
OUTPUT #data
TO "/output/data.csv"
USING Outputters.Csv(outputHeader : true);
If you need something more dynamic, either use the resulting SqlArray or SqlMap or use the above approach to generate the script.
However, I wonder why you would model your information this way in the first place. I would recommend finding a more appropriate way to mark the presence of the value in the JSON.
UPDATE: I missed your comment about that the inner array members are an object with two key-value pairs, where one is always called name (for property) and one is always called value for the property value. So here is the answer for that case.
First: Modelling key value pairs in JSON using {"Name": "propname", "Value" : "value"} is a complete misuse of the flexible modelling capabilities of JSON and should not be done. Use {"propname" : "value"} instead if you can.
So changing the input, the following will give you the pivoted values. Note that you will need to know the values ahead of time and there are several options on how to do the pivot. I do it in the statement where I create the new SqlMap instance to reduce the over-modelling, and then in the next SELECT where I get the values from the map.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#input =
SELECT "[[{\"Name\":\"A\", \"Value\": 1}, {\"Name\": \"B\", \"Value\": 2}, {\"Name\": \"C\", \"Value\":3 }], [{\"Name\":\"C\", \"Value\": 4}, {\"Name\":\"D\", \"Value\": 5}, {\"Name\":\"B\", \"Value\": 6}]]" AS json
FROM (VALUES (1)) AS T(x);
#data =
SELECT JsonFunctions.JsonTuple(arrstring) AS a
FROM #input CROSS APPLY EXPLODE( JsonFunctions.JsonTuple(json)) AS T(rowid, arrstring);
#data =
SELECT new SqlMap<string, string>(
a.Values.Select((kvp) =>
new KeyValuePair<string, string>(
JsonFunctions.JsonTuple(kvp)["Name"]
, JsonFunctions.JsonTuple(kvp)["Value"])
)) AS kvp
FROM #data;
#data =
SELECT kvp["A"] AS A,
kvp["B"] AS B,
kvp["C"] AS C,
kvp["D"] AS D
FROM #data;
OUTPUT #data
TO "/output/data.csv"
USING Outputters.Csv(outputHeader : true);

Resources