Pivot String Values in Snowflake - pivot

How can I pivot this table
ID
attribute_name
attribute_value
1
Name
John
1
Country
UK
1
City
London
into structure?
ID
Name
Country
City
1
John
UK
London
According to the documentation pivot requires a aggregate function
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
How can I apply this to string values?

The aggregating function can be max(). For example:
select *
from (
select xx.seq, xx.value:"#id" id, xx.value:"$" title
from BooksXML, table(flatten(xml:"$":"$")) xx
)
pivot(max(title) for id in ('bk101', 'bk102', 'bk103', 'bk104', 'bk105')) as p
order by seq
With the table:
CREATE temp TABLE BooksXML
as
select parse_xml('<catalog issue="spring">
<Books>
<book id="bk101">The Good Book</book>
<book id="bk102">The OK Book</book>
<book id="bk103">The NOT Ok Book</book>
<book id="bk104">All OK Book</book>
<book id="bk105">Every OK Book</book>
</Books>
</catalog>') xml
union all select parse_xml('
<catalog issue="spring">
<Books>
<book id="bk102">The OK Book1</book>
<book id="bk103">The NOT Ok Book1</book>
<book id="bk104">All OK Book1</book>
</Books>
</catalog>')
union all select parse_xml('
<catalog issue="spring">
<Books>
<book id="bk101">The Good Book2</book>
<book id="bk103">The NOT Ok Book2</book>
<book id="bk104">All OK Book2</book>
<book id="bk105">Every OK Book2</book>
</Books>
</catalog>');

Related

Combining data to 1 table from 2 tables when data is a partial match

I have 2 tables, which contain data I need to combined, but one column relating to the same data is enter in a slightly different way.
Example
Table A
ColA ColB ColC
ABC.1234 XYZ 123
ABC.5678 RST 890
Table B
ColA ColB ColC
1234 1A2B TTSS
5678 2E3F RRQQ
Output required
ColA ColB ColC ColD
1234 XYZ 1A2B TTSS
5678 RST 2E3F RRQQ
Basically, I need to drop the 'ABC' from
Table A ColA
then link the Table A ColA and Table B ColA entries to output as above.
I believe it is done using the substr(....) to drop the ABC from the Table A ColA, but I am struggling with the rest of the statement.

Filter out data in XPages view

There is a Notes view(view1).The each document in the view1 have information for ID and Name.
Then, there is another view(view2) in another DB.The each document in the view2 also have information for ID and Name.
As "XPages view", I'd like to display the documents in view1 which are filtered out by view2 data.
For example,
view1 in DB1 has 4 documents.
Doc1 - ID1, AAA
Doc2 - ID2, BBB
Doc3 - ID3, CCC
Doc4 - ID4, DDD
view2 in DB2 has 2 documents.
Doc1 - ID2, BBB
Doc2 - ID3, CCC
I'd like to see the data as a XPages view which filtered out by view2 data. Is this feasible?
Doc1 - ID1, AAA
Doc2 - ID4, DDD
I feel it is possible if I'd like to get the next data by 'filter by column value'option. But I'd like to get the opposite result in XPages view.
Doc1 - ID2, BBB
Doc2 - ID3, CCC
If you retrieve a DocumentCollection for each view, you can use the following set operations on those NotesCollections: Intersect, Subtract and Merge. I think you need Subtract in your case. These operations can be very slow, in my experience.
See, e.g.: https://www.ibm.com/support/knowledgecenter/en/SSVRGU_8.5.3/com.ibm.designer.domino.main.doc/H_SUBTRACT_METHOD_COLLECTION.html
You're not filtering out documents from view 2 in results from view 1. Because they're two different databases, they're not the same document. At the very least, the UNID and NoteID will be different and as these are properties of the document, they're different documents. They just have the same values for the subset of fields you've chosen to include in your question.
You will need to extract the ViewEntries into a List of Java objects using only the values you want, then filter accordingly.
The only alternative is to write an additional property to the documents in database 1 for IsInDatabaseTwo, which you can then filter on in your view's selection formula.

PYSPARK : Join a table column with one of the two columns from another table

My problem is as follow:
Table 1
ID1 ID2
1 2
3 4
Table 2
C1 VALUE
1 London
4 Texas
Table3
C3 VALUE
2 Paris
3 Arizona
Table 1 has primary and secondary Ids. I need to create a final output which is aggregation of values from Table2 and Table3 based on Ids mapping from table1.
i.e if a value in table2 or table3 is mapped to either of the IDs it should be aggregated as one.
i.e my final output should look like:
ID Aggregated
1 [2, London, Paris] // since Paris is mapped to 2 which is turn is mapped to 1
3 [4, Texas, Arizona] // Texas is mapped to 4 which in turn is mapped to 3
Any suggestion how to achieve this in pyspark.
I am not sure if joining the tables is going to help in this problem.
I was thinking PairedRDD might help me in this but i am not able to come up with proper solution.
Thanks
Below is a very straightforward approach:
spark.sql(
"""
select 1 as id1,2 as id2
union
select 3 as id1,4 as id2
""").createOrReplaceTempView("table1")
spark.sql(
"""
select 1 as c1, 'london' as city
union
select 4 as c1, 'texas' as city
""").createOrReplaceTempView("table2")
spark.sql(
"""
select 2 as c1, 'paris' as city
union
select 3 as c1, 'arizona' as city
""").createOrReplaceTempView("table3")
spark.table("table1").show()
spark.table("table2").show()
spark.table("table3").show()
# for simplicity, union table2 and table 3
spark.sql(""" select * from table2 union all select * from table3 """).createOrReplaceTempView("city_mappings")
spark.table("city_mappings").show()
# now join to the ids:
spark.sql("""
select id1, id2, city from table1
join city_mappings on c1 = id1 or c1 = id2
""").createOrReplaceTempView("id_to_city")
# and finally you can aggregate:
spark.sql("""
select id1, id2, collect_list(city)
from id_to_city
group by id1, id2
""").createOrReplaceTempView("result")
table("result").show()
# result looks like this, you can reshape to better suit your needs :
+---+---+------------------+
|id1|id2|collect_list(city)|
+---+---+------------------+
| 1| 2| [london, paris]|
| 3| 4| [texas, arizona]|
+---+---+------------------+

Search query for multiple column with comma separated string

$category_string="244,46,45";
I want query that will return product id only 239.
when i am trying to search by select * from category where category in($category_string) then it will give me all rows.
<table>
<tr><td>category_id</td><td>product_id</td></tr>
<tr><td>244</td><td>239</td></tr>
<tr><td>46</td><td>239</td></tr>
<tr><td>45</td><td>239</td></tr>
<tr><td>45</td><td>240</td></tr>
<tr><td>46</td><td>240</td></tr>
<tr><td>45</td><td>241</td></tr>
<tr><td>46</td><td>241</td></tr>
<tr><td>45</td><td>242</td></tr>
<tr><td>46</td><td>242</td></tr>
</tr>
<table>
If you want only 239
SELECT * FROM category WHERE product_id IN ( 239 );
OR
SELECT * FROM category WHERE product_id = 239;
Since you are comparing for only one product_id, i would suggest 2 query.

Split a string based on spacein Hive

This is the format of my CSV file:
Chevrolet C10,13.0,8,350.0,145.0,4055,12.0,76,US
Ford F108,13.0,8,302.0,130.0,3870,15.0,76,US
Dodge D100,13.0,8,318.0,150.0,3755,14.0,76,US
Honda Accord CVCC,31.5,4,98.00,68.00,2045,18.5,77,Japan
Buick Opel Isuzu Deluxe,30.0,4,111.0,80.00,2155,14.8,77,US
Renault 5 GTL,36.0,4,79.00,58.00,1825,18.6,77,Europe
Plymouth Arrow GS,25.5,4,122.0,96.00,2300,15.5,77,US
I want to split the first field like,
Chevrolet C10 should be Chevrolet
Ford F108 should be Ford
Honda Accord CVCC should be Honda etc and then I will use the car name for further processing.
Solution in Pig
Code :
read = LOAD 'test.data' USING PigStorage(',') AS (name:chararray, val1:long, val2:long, val3:long, val4:long, val5:long, val6:long, country:chararray);
sub_data = FOREACH read GENERATE SUBSTRING(name,0,(INDEXOF(name, ' ',0))) AS (subname:chararray);
DUMP sub_data;
Output :
(Chevrolet)
(Ford)
(Dodge)
(Honda)
(Buick)
(Renault)
(Plymouth)
select
case when MODEL like 'US % %' or MODEL like 'Europe % %'
then regexp_extract(MODEL, '^([^ ]* [^ ]*) ', 1)
when MODEL like '% %'
then regexp_extract(MODEL, '^([^ ]*) ', 1)
else MODEL
end as BRAND
from WHATEVER
Chevrolet C10 => Chevrolet
US Honda Accord => US Honda
Zorglub => Zorglub
Use the below UDF -
substring_index(string A, string delim, int count)
Reference
Create a table with the schema which you want for your table.
CREATE TABLE carinfo (carname STRING, val1 DOUBLE, val2 INT, val3 DOUBLE, val4 DOUBLE, val5 INT, val6 DOUBLE, val7 INT, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
load the data into the above table
LOAD DATA LOCAL INPATH '/hivesamples/splitstr.txt' OVERWRITE INTO TABLE carinfo;
Use CTAS to split the carname and get the brand name. This new table will have the same schema which you defined earlier.
CREATE TABLE modified_carinfo
AS
SELECT split(carname, ' ')[0] as carname, val1, val2, val3, val4, val5 ,val6, val7, country
FROM carinfo;

Resources