I have 2 tables, which contain data I need to combined, but one column relating to the same data is enter in a slightly different way.
Example
Table A
ColA ColB ColC
ABC.1234 XYZ 123
ABC.5678 RST 890
Table B
ColA ColB ColC
1234 1A2B TTSS
5678 2E3F RRQQ
Output required
ColA ColB ColC ColD
1234 XYZ 1A2B TTSS
5678 RST 2E3F RRQQ
Basically, I need to drop the 'ABC' from
Table A ColA
then link the Table A ColA and Table B ColA entries to output as above.
I believe it is done using the substr(....) to drop the ABC from the Table A ColA, but I am struggling with the rest of the statement.
There is a Notes view(view1).The each document in the view1 have information for ID and Name.
Then, there is another view(view2) in another DB.The each document in the view2 also have information for ID and Name.
As "XPages view", I'd like to display the documents in view1 which are filtered out by view2 data.
For example,
view1 in DB1 has 4 documents.
Doc1 - ID1, AAA
Doc2 - ID2, BBB
Doc3 - ID3, CCC
Doc4 - ID4, DDD
view2 in DB2 has 2 documents.
Doc1 - ID2, BBB
Doc2 - ID3, CCC
I'd like to see the data as a XPages view which filtered out by view2 data. Is this feasible?
Doc1 - ID1, AAA
Doc2 - ID4, DDD
I feel it is possible if I'd like to get the next data by 'filter by column value'option. But I'd like to get the opposite result in XPages view.
Doc1 - ID2, BBB
Doc2 - ID3, CCC
If you retrieve a DocumentCollection for each view, you can use the following set operations on those NotesCollections: Intersect, Subtract and Merge. I think you need Subtract in your case. These operations can be very slow, in my experience.
See, e.g.: https://www.ibm.com/support/knowledgecenter/en/SSVRGU_8.5.3/com.ibm.designer.domino.main.doc/H_SUBTRACT_METHOD_COLLECTION.html
You're not filtering out documents from view 2 in results from view 1. Because they're two different databases, they're not the same document. At the very least, the UNID and NoteID will be different and as these are properties of the document, they're different documents. They just have the same values for the subset of fields you've chosen to include in your question.
You will need to extract the ViewEntries into a List of Java objects using only the values you want, then filter accordingly.
The only alternative is to write an additional property to the documents in database 1 for IsInDatabaseTwo, which you can then filter on in your view's selection formula.
My problem is as follow:
Table 1
ID1 ID2
1 2
3 4
Table 2
C1 VALUE
1 London
4 Texas
Table3
C3 VALUE
2 Paris
3 Arizona
Table 1 has primary and secondary Ids. I need to create a final output which is aggregation of values from Table2 and Table3 based on Ids mapping from table1.
i.e if a value in table2 or table3 is mapped to either of the IDs it should be aggregated as one.
i.e my final output should look like:
ID Aggregated
1 [2, London, Paris] // since Paris is mapped to 2 which is turn is mapped to 1
3 [4, Texas, Arizona] // Texas is mapped to 4 which in turn is mapped to 3
Any suggestion how to achieve this in pyspark.
I am not sure if joining the tables is going to help in this problem.
I was thinking PairedRDD might help me in this but i am not able to come up with proper solution.
Thanks
Below is a very straightforward approach:
spark.sql(
"""
select 1 as id1,2 as id2
union
select 3 as id1,4 as id2
""").createOrReplaceTempView("table1")
spark.sql(
"""
select 1 as c1, 'london' as city
union
select 4 as c1, 'texas' as city
""").createOrReplaceTempView("table2")
spark.sql(
"""
select 2 as c1, 'paris' as city
union
select 3 as c1, 'arizona' as city
""").createOrReplaceTempView("table3")
spark.table("table1").show()
spark.table("table2").show()
spark.table("table3").show()
# for simplicity, union table2 and table 3
spark.sql(""" select * from table2 union all select * from table3 """).createOrReplaceTempView("city_mappings")
spark.table("city_mappings").show()
# now join to the ids:
spark.sql("""
select id1, id2, city from table1
join city_mappings on c1 = id1 or c1 = id2
""").createOrReplaceTempView("id_to_city")
# and finally you can aggregate:
spark.sql("""
select id1, id2, collect_list(city)
from id_to_city
group by id1, id2
""").createOrReplaceTempView("result")
table("result").show()
# result looks like this, you can reshape to better suit your needs :
+---+---+------------------+
|id1|id2|collect_list(city)|
+---+---+------------------+
| 1| 2| [london, paris]|
| 3| 4| [texas, arizona]|
+---+---+------------------+
$category_string="244,46,45";
I want query that will return product id only 239.
when i am trying to search by select * from category where category in($category_string) then it will give me all rows.
<table>
<tr><td>category_id</td><td>product_id</td></tr>
<tr><td>244</td><td>239</td></tr>
<tr><td>46</td><td>239</td></tr>
<tr><td>45</td><td>239</td></tr>
<tr><td>45</td><td>240</td></tr>
<tr><td>46</td><td>240</td></tr>
<tr><td>45</td><td>241</td></tr>
<tr><td>46</td><td>241</td></tr>
<tr><td>45</td><td>242</td></tr>
<tr><td>46</td><td>242</td></tr>
</tr>
<table>
If you want only 239
SELECT * FROM category WHERE product_id IN ( 239 );
OR
SELECT * FROM category WHERE product_id = 239;
Since you are comparing for only one product_id, i would suggest 2 query.
This is the format of my CSV file:
Chevrolet C10,13.0,8,350.0,145.0,4055,12.0,76,US
Ford F108,13.0,8,302.0,130.0,3870,15.0,76,US
Dodge D100,13.0,8,318.0,150.0,3755,14.0,76,US
Honda Accord CVCC,31.5,4,98.00,68.00,2045,18.5,77,Japan
Buick Opel Isuzu Deluxe,30.0,4,111.0,80.00,2155,14.8,77,US
Renault 5 GTL,36.0,4,79.00,58.00,1825,18.6,77,Europe
Plymouth Arrow GS,25.5,4,122.0,96.00,2300,15.5,77,US
I want to split the first field like,
Chevrolet C10 should be Chevrolet
Ford F108 should be Ford
Honda Accord CVCC should be Honda etc and then I will use the car name for further processing.
Solution in Pig
Code :
read = LOAD 'test.data' USING PigStorage(',') AS (name:chararray, val1:long, val2:long, val3:long, val4:long, val5:long, val6:long, country:chararray);
sub_data = FOREACH read GENERATE SUBSTRING(name,0,(INDEXOF(name, ' ',0))) AS (subname:chararray);
DUMP sub_data;
Output :
(Chevrolet)
(Ford)
(Dodge)
(Honda)
(Buick)
(Renault)
(Plymouth)
select
case when MODEL like 'US % %' or MODEL like 'Europe % %'
then regexp_extract(MODEL, '^([^ ]* [^ ]*) ', 1)
when MODEL like '% %'
then regexp_extract(MODEL, '^([^ ]*) ', 1)
else MODEL
end as BRAND
from WHATEVER
Chevrolet C10 => Chevrolet
US Honda Accord => US Honda
Zorglub => Zorglub
Use the below UDF -
substring_index(string A, string delim, int count)
Reference
Create a table with the schema which you want for your table.
CREATE TABLE carinfo (carname STRING, val1 DOUBLE, val2 INT, val3 DOUBLE, val4 DOUBLE, val5 INT, val6 DOUBLE, val7 INT, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
load the data into the above table
LOAD DATA LOCAL INPATH '/hivesamples/splitstr.txt' OVERWRITE INTO TABLE carinfo;
Use CTAS to split the carname and get the brand name. This new table will have the same schema which you defined earlier.
CREATE TABLE modified_carinfo
AS
SELECT split(carname, ' ')[0] as carname, val1, val2, val3, val4, val5 ,val6, val7, country
FROM carinfo;