Group in ranges values extracted from an 'alphanumeric + symbols' string [PostgreSQL] - string

Nb. I am using PostgreSQL.
Nb1. For simplicity purposes I am only showing the two relevant columns for this post, original table has more rows & columns.
I have a table called 'contents' like this:
<!DOCTYPE html>
<html>
<body>
<table border="1" style="width:100%">
<tr>
<td>id</td>
<td>data</td>
</tr>
<tr>
<td>4009</td>
<td>"duration"=>"101", "preview_version"=>"", "twitter_profile"=>"", "creator_category"=>"association", "facebook_profile"=>"", "linkedin_profile"=>"", "personal_website"=>"", "content_expertise_type"=>"image", "content_expertise_categories"=>"1,2,3"</td>
</tr>
<tr>
<td>4865</td>
<td>"duration"=>"108", "preview_version"=>"", "twitter_profile"=>"", "creator_category"=>"association", "facebook_profile"=>"", "linkedin_profile"=>"", "personal_website"=>"", "content_expertise_type"=>"image", "content_expertise_categories"=>"4,6"</td>
</tr>
</table>
</body>
</html>
from this table I need to extract the duration value by using this query:
select id,data->'duration' as data from contents
which gives me the below result (again, the original table will return many more entries and some values in "data" column will coincide reason why I need to group them in ranges):
+------+------+
| id | data |
+------+------+
| 4009 | 101 |
| 4865 | 108 |
+------+------+
Now that I have the 'data' values I want to tag them in different ranges
SELECT d.id,
case when d.data >= 0 and d.data< 10 then '0-9'
when d.data >= 10 and d.data< 20 then '10-19'
else '20-500' end as range
FROM (SELECT id,data->'duration' as data FROM contents) as d
But here the query returns this error:
ERROR: operator does not exist: text >= integer
LINE 3: case when d.data >= 0 and d.data< 10 then '0-9'
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
After this I was hoping to group the ranges like this:
SELECT t.range as score_range, count(*) as number_of_ocurrences
FROM
(***ABOVE QUERY THAT CURRENTLY RETURNS AN ERROR TO BE PLACED IN HERE***) as t
GROUP BY t.range
ORDER BY score_range
Any help to achieve this grouping task will be very much appreciated!
Looking forward to getting an answer! :-)
Thanks!

Json values in Postgres are always strings, it is necessary to cast. For fetching integer values from json fields there is a special operator ->>
Try to fetch duration as Integer value like
select id, data->>'duration' as data from contents
More info http://www.postgresql.org/docs/9.3/static/functions-json.html

It should work if you cast the value:
SELECT d.id,
(case when d.data >= 0 and d.data< 10 then '0-9'
when d.data >= 10 and d.data< 20 then '10-19'
else '20-500'
end) as range
FROM (SELECT id, (data->'duration')::int as data FROM contents
) d

Related

Conditional Counting in Spark SQL

I'm trying to count conditionally on one column, here's my code.
spark.sql(
s"""
|SELECT
| date,
| sea,
| contract,
| project,
| COUNT(CASE WHEN type = 'ABC' THEN 1 ELSE 0 END) AS abc,
| COUNT(CASE WHEN type = 'DEF' THEN 1 ELSE 0 END) AS def,
| COUNT(CASE WHEN type = 'ABC' OR type = 'DEF' OR type = 'GHI' THEN 1 ELSE 0 END) AS all
|FROM someTable
|GROUP BY date, seat, contract, project
""".stripMargin).createOrReplaceTempView("something")
This throws up a weird error.
Diagnostic messages truncated, showing last 65536 chars out of 124764:
What am I doing wrong here?
Any help appreciated.
It seems you want to get count of type = 'ABC', type = 'DEF' etc per grouping condition.
If this is the case then COUNT will not give you desired results but would give same result for each case for a group.
It seems you can use SUM instead of COUNT.
SUM will add all 0 and 1 will give you correct count.
Still if you want to resolve the error you are getting, please paste the error and if possible some data you are using to create data frame.

Search query for multiple column with comma separated string

$category_string="244,46,45";
I want query that will return product id only 239.
when i am trying to search by select * from category where category in($category_string) then it will give me all rows.
<table>
<tr><td>category_id</td><td>product_id</td></tr>
<tr><td>244</td><td>239</td></tr>
<tr><td>46</td><td>239</td></tr>
<tr><td>45</td><td>239</td></tr>
<tr><td>45</td><td>240</td></tr>
<tr><td>46</td><td>240</td></tr>
<tr><td>45</td><td>241</td></tr>
<tr><td>46</td><td>241</td></tr>
<tr><td>45</td><td>242</td></tr>
<tr><td>46</td><td>242</td></tr>
</tr>
<table>
If you want only 239
SELECT * FROM category WHERE product_id IN ( 239 );
OR
SELECT * FROM category WHERE product_id = 239;
Since you are comparing for only one product_id, i would suggest 2 query.

How to do negation for 'CONTAINS'

I have Cassandra table with one column defined as set.
How can I achieve something like this:
SELECT * FROM <table> WHERE <set_column_name> NOT CONTAINS <value>
Proper secondary index in was already created.
From the documentation:
SELECT select_expression FROM keyspace_name.table_name WHERE
relation AND relation ... ORDER BY ( clustering_column ( ASC | DESC
)...) LIMIT n ALLOW FILTERING
then later:
relation is:
column_name op term
and finally:
op is = | < | > | <= | > | = | CONTAINS | CONTAINS KEY
So there's no native way to perform such query. You have to workaround by designing a new table to specifically satisfy this query.

Cognos Report : Crosstab with section, how to display all column?

First, I'm french so sorry for my bad english.
In report studio i use Crosstab with section but for each section, i want display all columns ( columns come from distinct values of the variable i use for my croostab).
I think a exemple will be better :
-----------------------Source
Var A | var B | var C | Number |
A1 | B1 | C1 | 120
A1 | B1 | C2 | 130
A1 | B2 | C1 | 10
A2 | B1 | C1 | 17
A2 | B1 | C2 | 16
I make crosstab :
Columns : Var B
Row : Var C
"Values" : sum (Number)
Section : Var A
So I have :
Section: Var A = A1
| B1 | B2
C1 | 120 | 10
C2 | 130 | 0
AND :
Section: Var A = A2
| B1
C1 | 17
C2 | 16
BUT I WANT :
Section: Var A = A2
| B1 | B2
C1 | 17 | 0
C2 | 16 | 0
I don't know how to do that properly ( i have found a method where it is necessary to isolate each variable and cross themselves but it is long, gredy and ugly)
Best regard
I have found the solution in a other Forum (i search since a long time but i don't use right key word):
"http://www-01.ibm.com/support/docview.wss?uid=swg21341708
Title : Columns or rows missing from crosstab if they contain no data
Problem(Abstract)
If a crosstab row or column contains no data, it does not show up in the crosstab. This document describes a method of forcing all columns and rows to appear, whether they contain data or not.
Cause
Column and Row headings in Crosstab reports are determined by the result set of the query.
Environment
Relational Data Source.
Resolving the problem
Create separate queries for the column/row headings, and the data. Join these two queries with a 1..1 -> 0..n relationship so that even Columns and Rows with no data will be represented in the result set.
See the attached example written for the GO Sales and Retailers sample package. It is a simple crosstab filtered for 2004 data. There is no data for Mountaineering Equipment in 2004. The crosstab uses a joined query as described, and does contain a blank row for Mountaineering Equipment.
Steps:The following steps assume that both rows or columns could be missing. If you are concerned about rows-only or columns-only, you may skip steps 1-2, and create just the row or column data in step 3.
1) Create a "Column Query", containing only the column information and a dummy data item with a value of 1. In the attached example, this is named "Years"
2) Create a "Row Query", containing only the row information and a dummy data item with a value of 1. In the attached example, this is named "Product Lines"
3) Create a "Dimension Query" query that joins the queries from steps 1 and 2 on dummy. This requires that the Outer Join Allowed property of the query be set to Allowed. This creates a crossjoin that includes all possible combinations of rows and columns
4) Create a fourth query that contains the data for the crosstab. This is the same as a normal crosstab report.
5) Join the queries from steps 3 and 4, using cardinality of 1..1 and 0..n respectively. When dragging data items into this new query, ensure that you are dragging in the row and column headings from the "Dimension Query". This ensures that all possible rows and columns will be returned, even if there is no data associated with them."
Time for execution is very good

Pull Oracle data From Excel File

So I have a lot of rows taken up by excel. I have 10,000 rows or so taken up by data and I am working with 10,000 or different IDs. Is there a way to query off an oracle database just 1 time by capturing the entire ID column as a group and including the group in the WHERE query instead of looping the 10,000 assets and query the database 10,000 times?
Sorry for not providing code. I really have not attempted this because I dont know if a solution exists.
Something like what you are asking can be accomplished in a two step process. First, by creating SELECT-FROM-DUAL queries for the relevant IDs, and second, inputting those queries into your main query and joining against them to limit to only the returns you need.
For the first step, use Excel to create SELECT-FROM-DUAL subqueries.
If your ID column starts in cell A2, copy the following formula into an empty cell on the same row and drag it down the column until all rows with an ID also have the formula. Alter the references to cell A2 and A3 if your IDs don't start in cell A2.="SELECT "&A2&" AS id FROM DUAL"&IF(NOT(ISBLANK(A3)), " UNION ALL", "")
Ultimately, what we want is a block of SELECT-FROM-DUAL statements that look like the below. Note that the last statement will not end in "UNION ALL", but all other statements should.
| IDs | Formula |
|----- |------------------------------------ |
| 1 | SELECT 1 AS id FROM DUAL UNION ALL |
| 2 | SELECT 2 AS id FROM DUAL UNION ALL |
| 3 | SELECT 3 AS id FROM DUAL UNION ALL |
| 4 | SELECT 4 AS id FROM DUAL UNION ALL |
| 5 | SELECT 5 AS id FROM DUAL UNION ALL |
| 6 | SELECT 6 AS id FROM DUAL |
For the second step, add all the SELECT-FROM-DUAL statements to your main query and then add an appropriate JOIN condition.SELECT
*
FROM table_you_need tyn
INNER JOIN (
SELECT 1 AS id FROM DUAL UNION ALL
SELECT 2 AS id FROM DUAL UNION ALL
SELECT 3 AS id FROM DUAL UNION ALL
SELECT 4 AS id FROM DUAL UNION ALL
SELECT 5 AS id FROM DUAL UNION ALL
SELECT 6 AS id FROM DUAL
) your_ids yi
ON tyn.id = yi.id
;
If you had a shorter list of IDs you could use a similar strategy to create an ID list for a WHERE ids IN (<list_of_numbers>) clause, but the IN list is typically limited to 100 items, and consequently would not work for your current question.
You can import data from Excel using Toad or SQL Developer. You need to create a table first in the database.
You can read the data directly with external tables if you save the excel file as a CSV file to a folder on the database server that the database can access.
You can read files as Excel (xls or xlsx format) using a PL/SQL library.
There are probably a few other ways I haven't thought of as well. This is a very common question.

Resources