use row data as columns in PostgreSQL 9.1 - pivot

--------------------
|bookname |author |
--------------------
|book1 |author1 |
|book1 |author2 |
|book2 |author3 |
|book2 |author4 |
|book3 |author5 |
|book3 |author6 |
|book4 |author7 |
|book4 |author8 |
---------------------
but I want the booknames as columns and authors as its rows
ex
----------------------------------
|book1 |book2 |book3 |book4 |
----------------------------------
|author1|author3 |author5|author7|
|author2|author4 |author6|author8|
----------------------------------
is it possible in postgres? How can I do this?
I tried crosstab but I failed to do this.

You can get the result using an aggregate function with a CASE expression but I would first use row_number() so you have a value that can be used to group the data.
If you use row_number() then the query could be:
select
max(case when bookname = 'book1' then author end) book1,
max(case when bookname = 'book2' then author end) book2,
max(case when bookname = 'book3' then author end) book3,
max(case when bookname = 'book4' then author end) book4
from
(
select bookname, author,
row_number() over(partition by bookname
order by author) seq
from yourtable
) d
group by seq;
See SQL Fiddle with Demo. I added the row_number() so you will return each distinct value for the books. If you exclude the row_number(), then using an aggregate with a CASE will return only one value for each book.
This query gives the result:
| BOOK1 | BOOK2 | BOOK3 | BOOK4 |
-----------------------------------------
| author1 | author3 | author5 | author7 |
| author2 | author4 | author6 | author8 |

Related

Correlated Subquery in Spark SQL

I have the following 2 tables for which I have to check the existence of values between them using a correlated sub-query.
The requirement is - for each record in the orders table check if the corresponding custid is present in the customer table, and then output a field (named FLAG) with value Y if the custid exists, otherwise N if it doesn't.
orders:
orderid | custid
12345 | XYZ
34566 | XYZ
68790 | MNP
59876 | QRS
15620 | UVW
customer:
id | custid
1 | XYZ
2 | UVW
Expected Output:
orderid | custid | FLAG
12345 | XYZ | Y
34566 | XYZ | Y
68790 | MNP | N
59876 | QRS | N
15620 | UVW | Y
I tried something like the following but couldn't get it to work -
select
o.orderid,
o.custid,
case when o.custid EXISTS (select 1 from customer c on c.custid = o.custid)
then 'Y'
else 'N'
end as flag
from orders o
Can this be solved with a correlated scalar sub-query ? If not what is the best way to implement this requirement ?
Please advise.
Note: using Spark SQL query v2.4.0
Thanks.
IN/EXISTS predicate sub-queries can only be used in a filter in Spark.
The following works in a locally recreated copy of your data:
select orderid, custid, case when existing_customer is null then 'N' else 'Y' end existing_customer
from (select o.orderid, o.custid, c.custid existing_customer
from orders o
left join customer c
on c.custid = o.custid)
Here's how it works with recreated data:
def textToView(csv: String, viewName: String) = {
spark.read
.option("ignoreLeadingWhiteSpace", "true")
.option("ignoreTrailingWhiteSpace", "true")
.option("delimiter", "|")
.option("header", "true")
.csv(spark.sparkContext.parallelize(csv.split("\n")).toDS)
.createOrReplaceTempView(viewName)
}
textToView("""id | custid
1 | XYZ
2 | UVW""", "customer")
textToView("""orderid | custid
12345 | XYZ
34566 | XYZ
68790 | MNP
59876 | QRS
15620 | UVW""", "orders")
spark.sql("""
select orderid, custid, case when existing_customer is null then 'N' else 'Y' end existing_customer
from (select o.orderid, o.custid, c.custid existing_customer
from orders o
left join customer c
on c.custid = o.custid)""").show
Which returns:
+-------+------+-----------------+
|orderid|custid|existing_customer|
+-------+------+-----------------+
| 59876| QRS| N|
| 12345| XYZ| Y|
| 34566| XYZ| Y|
| 68790| MNP| N|
| 15620| UVW| Y|
+-------+------+-----------------+

Select record from one column which are not in another column

Let's say I have 2 excel tabs (A) & (B):
TAB (A)
+----------+
|City |
+----------+
| Seattle |
| New York |
| Boston |
| Miami |
+----------+
TAB (B)
+------------+---------+
|City | Name |
+------------+---------+
| Seattle | Klay |
| Seattle | Walis |
| New York | Walis |
| Boston | Klay |
| Miami | John |
| New York | Klay |
+------------+---------+
I am trying to group them in order to obtain a new tab (result) where I have the list of city where people NEVER went group by name:
TAB (RESULT)
+------------+---------+
|Name | City |
+------------+---------+
| Klay | Miami |
|----------------------|
| Walis | Boston |
| | Miami |
|----------------------|
| John |Seattle |
| |New York |
| |Boston |
+------------+---------+
The only solution I came with was using a pivot table but I am looking for opposite result! I have also use Index & Match but it's not working.
Since you mentioned you are trying to do this in Excel, here's an Excel solution. Let's pretend you have your data setup all in one tab, like so:
In cell G2 and copied over and down is this formula:
=IF(COLUMN(A2)>ROWS($A$2:$A$5)-COUNTIF($D$2:$D$7,$F2),"",INDEX($A$2:$A$5,MATCH(1,INDEX((COUNTIFS($D$2:$D$7,$F2,$C$2:$C$7,$A$2:$A$5)=0)*(COUNTIF($F2:F2,$A$2:$A$5)=0),),0)))
You can cut and paste each section to a different tab if desired.
in sql server it would be something like this
--tsql
with tableC AS
(
SELECT
a.City
,b.name
FROM tableA a
cross join (select distinct name from tableB) b
)
SELECT
c.*
FROM tableC c
LEFT JOIN tableB b
ON c.City = b.City
AND c.name = b.name
WHERE b.city IS NULL
If this is indeed a MySQL problem, you need to get every combination of name and city, and then eliminate combinations that have visits.
SELECT bNames.Name, tableA.City
FROM (SELECT DISTINCT Name FROM tableB) AS bNames
CROSS JOIN tableA
WHERE (bNames.Name, tableA.City) NOT IN (SELECT Name, City FROM tableB)
ORDER BY bNames.Name, tableA.City
;
The result will not omit a repeated user name on successive entries, but that is something almost always better handled by post processing the results anyway.
One possible solution
Select b.name, a.city city_to_visit
From a join b on 1 = 1
Minus — some db use except
Select b.name, b.city city_visited
From b
Is this your desired ?
SELECT NAME,
CASE WHEN (SELECT
CITY
FROM TAB1) NOT IN
CITY
Then City
END CASE From Tab1 LEFT JOIN
TAB2 ON TAB1.CITY=Tab2.CITY
GROUP BY NAME;

How to pivot data using Informatica when you have variable amount of pivot rows?

Based on my earlier questions, how can I pivot data using Informatica PowerCenter Designer when I have variable amount of Addresses in my data. I would like to Pivot e.g four addresses from my data. This is the structure of the source data file:
+---------+--------------+-----------------+
| ADDR_ID | NAME | ADDRESS |
+---------+--------------+-----------------+
| 1 | John Smith | JohnsAddress1 |
| 1 | John Smith | JohnsAddress2 |
| 1 | John Smith | JohnsAddress3 |
| 2 | Adrian Smith | AdriansAddress1 |
| 2 | Adrian Smith | AdriansAddress2 |
| 3 | Ivar Smith | IvarAddress1 |
+---------+--------------+-----------------+
And this should be the resulting table:
+---------+--------------+-----------------+-----------------+---------------+----------+
| ADDR_ID | NAME | ADDRESS1 | ADDRESS2 | ADDRESS3 | ADDRESS4 |
+---------+--------------+-----------------+-----------------+---------------+----------+
| 1 | John Smith | JohnsAddress1 | JohnsAddress2 | JohnsAddress3 | NULL |
| 2 | Adrian Smith | AdriansAddress1 | AdriansAddress2 | NULL | NULL |
| 3 | Ivar Smith | IvarAddress1 | NULL | NULL | NULL |
+---------+--------------+-----------------+-----------------+---------------+----------+
I guess I can use
SOURCE --> SOURCE_QUALIFIER --> SORTER --> AGGREGATOR --> EXPRESSION --> TARGET TABLE
But what kind of port should I use in AGGREGATOR and EXPRESSION transforms?
You should use something along the lines of this:
Source->Expression->Aggregator->Target
In the expression, add a variable port:
v_count expr: IIF(ISNULL(v_COUNT) OR v_COUNT=3, 1, v_COUNT + 1)
OR
v_count expr: IIF(ADDR_ID=v_PREVIOUS_ADDR_ID, v_COUNT + 1, 1)
And 3 output ports:
o_addr1 expr: DECODE(TRUE, v_COUNT=1, ADDR_IN, NULL)
o_addr2 expr: DECODE(TRUE, v_COUNT=2, ADDR_IN, NULL)
o_addr3 expr: DECODE(TRUE, v_COUNT=3, ADDR_IN, NULL)
Then use the aggregator, group by ID and select always the Max,
e.g.
agg_addr1: expr: MAX(O_ADDR1)
agg_addr2: expr: MAX(O_ADDR2)
agg_addr3: expr: MAX(O_ADDR3)
If you need more denormalized ports, add additional ports and set the initial state
of the v_count variable accordingly.
Try this:
SOURCE --> SOURCE_QUALIFIER --> RANK --> AGGREGATOR -->TARGET
In RANK transformation, group by on ADDR_ID and select ADDRESS as rank port. In properties tab, select Number of ranks as 4.
In AGGREGATOR transformation group by on ADDR_ID and use the following output port expressions (RANKINDEX will be generated by RANK transformation):
ADDRESS1 = MAX(ADDRESS,RANKINDEX=1)
ADDRESS2 = MAX(ADDRESS,RANKINDEX=2)
ADDRESS3 = MAX(ADDRESS,RANKINDEX=3)
ADDRESS4 = MAX(ADDRESS,RANKINDEX=4)

How to unpivot a crosstab like table?

After importing data from an excel document I ended up with a table that looks like this (quite similar to a pivot table):
EMPLOYEEID | SKILL1 | SKILL2 | SKILL 3
---------------------------------------
emp1 | 1 | | 3
emp2 | 2 | 3 |
emp3 | | | 1
emp4 | | 2 | 3
And in my database I have another table which stores each level of knowledge of each skill for
every employee:
EMPLOYEEID | SKILLID | LEVEL_OF_KNOWLEDGE
------------------------------------------
emp1 | SKILL1 | 1
emp1 | SKILL3 | 3
emp2 | SKILL1 | 2
emp2 | SKILL2 | 3
emp3 | SKILL3 | 1
emp4 | SKILL2 | 2
emp4 | SKILL3 | 3
My question is, how can I retrieve the data from the first table and store it in the second one? Is it possible using only Access queries or have I to deal with vba?
I have found plenty of examples doing the opposite (pivoting the second table to get the first one) but I haven't managed to find the way to do solve this case.
Sure
SELECT EmployeeID, "SKILL1" AS SkillID, SKILL1 AS Level_OF_Knowledge WHERE SKILL1 IS NOT NULL
UNION ALL SELECT EmployeeID, "SKILL2" AS SkillID, SKILL2 AS Level_OF_Knowledge WHERE SKILL2 IS NOT NULL
UNION ALL SELECT EmployeeID, "SKILL3" AS SkillID, SKILL3 AS Level_OF_Knowledge WHERE SKILL3 IS NOT NULL
*repeat last line for each additional column in your first table

Include Summation Row with Group By Clause

Query:
SELECT aType, SUM(Earnings - Expenses) "Rev"
FROM aTable
GROUP BY aType
ORDER BY aType ASC
Results:
| aType | Rev |
| ----- | ----- |
| A | 20 |
| B | 150 |
| C | 250 |
Question:
Is it possible to display a summary row at the bottom such as below by using Sybase syntax within my initial query, or would it have to be a separate query altogether?
| aType | Rev |
| ----- | ----- |
| A | 20 |
| B | 150 |
| C | 250 |
=================
| All | 320 |
I couldn't get the ROLLUP function from SQL to translate over to Sybase successfully but I'm not sure if there is another way to do this, if at all.
Thanks!
Have you tried just using a UNION ALL similar to this:
select aType, Rev
from
(
SELECT aType, SUM(Earnings - Expenses) "Rev", 0 SortOrder
FROM aTable
GROUP BY aType
UNION ALL
SELECT 'All', SUM(Earnings - Expenses) "Rev", 1 SortOrder
FROM aTable
) src
ORDER BY SortOrder, aType
See SQL Fiddle with Demo. This gives the result:
| ATYPE | REV |
---------------
| A | 10 |
| B | 150 |
| C | 250 |
| All | 410 |
May be you can work out with compute by clause in sybase like:
create table #tmp1( name char(9), earning int , expense int)
insert into #tmp1 values("A",30,20)
insert into #tmp1 values("B",50,30)
insert into #tmp1 values("C",60,30)
select name, (earning-expense) resv from #tmp1
group by name
order by name,resv
compute sum(earning-expense)
OR
select name, convert(varchar(15),(earning-expense)) resv from #tmp1
group by name
union all
SELECT "------------------","-----"
union all
select "ALL",convert(varchar(15),sum(earning-expense)) from #tmp1
Thanks,
Gopal
Not all versions of Sybase support ROLLUP. You can do it the old fashioned way:
with t as
(SELECT aType, SUM(Earnings - Expenses) "Rev"
FROM aTable
GROUP BY aType
)
select t.*
from ((select aType, rev from t) union all
(select NULL, sum(rev))
) t
ORDER BY (case when atype is NULL then 1 else 0 end), aType ASC
This is the yucky, brute force approach. If this version of Sybase doesn't support with, you can do:
select t.aType, t.Rev
from ((SELECT aType, SUM(Earnings - Expenses) "Rev"
FROM aTable
GROUP BY aType
) union all
(select NULL, sum(rev))
) t
ORDER BY (case when atype is NULL then 1 else 0 end), aType ASC
This is pretty basic, standard SQL.

Resources