Combining data to 1 table from 2 tables when data is a partial match - substr

I have 2 tables, which contain data I need to combined, but one column relating to the same data is enter in a slightly different way.
Example
Table A
ColA ColB ColC
ABC.1234 XYZ 123
ABC.5678 RST 890
Table B
ColA ColB ColC
1234 1A2B TTSS
5678 2E3F RRQQ
Output required
ColA ColB ColC ColD
1234 XYZ 1A2B TTSS
5678 RST 2E3F RRQQ
Basically, I need to drop the 'ABC' from
Table A ColA
then link the Table A ColA and Table B ColA entries to output as above.
I believe it is done using the substr(....) to drop the ABC from the Table A ColA, but I am struggling with the rest of the statement.

Related

How to get a subset of teradata table i.e. from nth row to n+3th row values

Assume I have a table A with 100 records in it in Teradata. Now I have to pass 20-20 rows 5 times to a specific process. I am struggling to segment that whole table with 100 records into 5 subparts, any clue of any SQL which can give me such data.
Example:
table A
A AA
B BB
C CC
D DD
E EE
F FF
Here I have 6 records, I want to fetch first 2 and then second 2 and then last 2 records one by one, any SQL help
If there's some unique column(s) you can apply ROW_NUMBERs:
select *
from table
QUALIFY
ROW_NUMBER() OVER (ORDER BY unique_column(s)) BETWEEN 3 AND 4
;
Of course, this is not very efficient on a big table.

Replacing Null Values with Mean Value of the Column in Grid DB

So, I was working with GridDB NodeJs Connector, I know the query to find out the null values which shows the records/rows:
SELECT * FROM employees where employee_salary = NaN;
But I want to replace the null values of the column with the mean value of the column, in order to maintain the data consistency for data analysis. How do I do that in GridDB?
The Employee table looks like the following:
employee_id employee_salary first_name department
---------------+---------------+--------------+--------------
0 John Sales
1 60000 Lisa Development
2 45000 Richard Sales
3 50000 Lina Marketing
4 55000 Anderson Development

PYSPARK : Join a table column with one of the two columns from another table

My problem is as follow:
Table 1
ID1 ID2
1 2
3 4
Table 2
C1 VALUE
1 London
4 Texas
Table3
C3 VALUE
2 Paris
3 Arizona
Table 1 has primary and secondary Ids. I need to create a final output which is aggregation of values from Table2 and Table3 based on Ids mapping from table1.
i.e if a value in table2 or table3 is mapped to either of the IDs it should be aggregated as one.
i.e my final output should look like:
ID Aggregated
1 [2, London, Paris] // since Paris is mapped to 2 which is turn is mapped to 1
3 [4, Texas, Arizona] // Texas is mapped to 4 which in turn is mapped to 3
Any suggestion how to achieve this in pyspark.
I am not sure if joining the tables is going to help in this problem.
I was thinking PairedRDD might help me in this but i am not able to come up with proper solution.
Thanks
Below is a very straightforward approach:
spark.sql(
"""
select 1 as id1,2 as id2
union
select 3 as id1,4 as id2
""").createOrReplaceTempView("table1")
spark.sql(
"""
select 1 as c1, 'london' as city
union
select 4 as c1, 'texas' as city
""").createOrReplaceTempView("table2")
spark.sql(
"""
select 2 as c1, 'paris' as city
union
select 3 as c1, 'arizona' as city
""").createOrReplaceTempView("table3")
spark.table("table1").show()
spark.table("table2").show()
spark.table("table3").show()
# for simplicity, union table2 and table 3
spark.sql(""" select * from table2 union all select * from table3 """).createOrReplaceTempView("city_mappings")
spark.table("city_mappings").show()
# now join to the ids:
spark.sql("""
select id1, id2, city from table1
join city_mappings on c1 = id1 or c1 = id2
""").createOrReplaceTempView("id_to_city")
# and finally you can aggregate:
spark.sql("""
select id1, id2, collect_list(city)
from id_to_city
group by id1, id2
""").createOrReplaceTempView("result")
table("result").show()
# result looks like this, you can reshape to better suit your needs :
+---+---+------------------+
|id1|id2|collect_list(city)|
+---+---+------------------+
| 1| 2| [london, paris]|
| 3| 4| [texas, arizona]|
+---+---+------------------+

Dynamic Join in Power Query (Excel)

I have a big data table which needs to be filtered by several columns. I am thinking using inner join the filter table with data table to get results. The question is the filters are dynamic.
For example, the user can use two columns to filter (Select data with Acct= 1001 or 1002 or 1003 or 1004 and Tran= 1 or 2 or 3). Table is in below.
col1 col2
Acct Tran
==== ====
1001 1
1002 2
1003 3
1004
Or the user can add one column at the end of the table, using three columns to filter (Select data with Acct= 1001 or 1002 or 1003 or 1004 and Tran= 1 or 2 or 3 and Dept=a or b or c). Table is in below
col1 col2 col3
Acct Tran Dept
==== ==== ====
1001 1 a
1002 2 b
1003 3 c
1004
The number of columns and column names may change. Does someone know how to fulfill this function in Power Query? or VBA?
Many thanks.
You could build the filter table in the following way:
Attribute Value
========= =====
Acct 1001
Acct 1002
Acct 1003
Tran 1
Tran 2
... ...
This would give you a fixed amount of columns you can import using Power Query. In your data table you would need to unpivot the columns to get the same structure. Afterwards you can join on the attribute and value columns. Pivot the attributes into columns to get the original structure again.
I would Merge with the filter table using the Acct and Tran columns, and expand the Filter.Dept column.
Next I would add a column "Matched" with a formula to evaluate the filter, along the lines of:
if [Filter.Dept] = null then true else if [Dept] = [Filter.Dept] then true else false
Finally I would filter the "Matched" column for TRUE.
Note the Filter Dept column will need to always be present, but it can be left blank for your first scenario.
I love the unpivot-solution - but it can become quite slow if your BigDataTable is very large. Then you can use this alternative instead:
= Table.NestedJoin(BigDataTable,Table.ColumnNames(FilterTable),FilterTable,Table.ColumnNames(FilterTable),"NewColumn",JoinKind.Inner)
It generates the list of column names dynamically using: Table.ColumnNames(FilterTable) which is returning the the column names of your FilterTable. My expectation is that this would even fold back to a server.

Cognos Report : Crosstab with section, how to display all column?

First, I'm french so sorry for my bad english.
In report studio i use Crosstab with section but for each section, i want display all columns ( columns come from distinct values of the variable i use for my croostab).
I think a exemple will be better :
-----------------------Source
Var A | var B | var C | Number |
A1 | B1 | C1 | 120
A1 | B1 | C2 | 130
A1 | B2 | C1 | 10
A2 | B1 | C1 | 17
A2 | B1 | C2 | 16
I make crosstab :
Columns : Var B
Row : Var C
"Values" : sum (Number)
Section : Var A
So I have :
Section: Var A = A1
| B1 | B2
C1 | 120 | 10
C2 | 130 | 0
AND :
Section: Var A = A2
| B1
C1 | 17
C2 | 16
BUT I WANT :
Section: Var A = A2
| B1 | B2
C1 | 17 | 0
C2 | 16 | 0
I don't know how to do that properly ( i have found a method where it is necessary to isolate each variable and cross themselves but it is long, gredy and ugly)
Best regard
I have found the solution in a other Forum (i search since a long time but i don't use right key word):
"http://www-01.ibm.com/support/docview.wss?uid=swg21341708
Title : Columns or rows missing from crosstab if they contain no data
Problem(Abstract)
If a crosstab row or column contains no data, it does not show up in the crosstab. This document describes a method of forcing all columns and rows to appear, whether they contain data or not.
Cause
Column and Row headings in Crosstab reports are determined by the result set of the query.
Environment
Relational Data Source.
Resolving the problem
Create separate queries for the column/row headings, and the data. Join these two queries with a 1..1 -> 0..n relationship so that even Columns and Rows with no data will be represented in the result set.
See the attached example written for the GO Sales and Retailers sample package. It is a simple crosstab filtered for 2004 data. There is no data for Mountaineering Equipment in 2004. The crosstab uses a joined query as described, and does contain a blank row for Mountaineering Equipment.
Steps:The following steps assume that both rows or columns could be missing. If you are concerned about rows-only or columns-only, you may skip steps 1-2, and create just the row or column data in step 3.
1) Create a "Column Query", containing only the column information and a dummy data item with a value of 1. In the attached example, this is named "Years"
2) Create a "Row Query", containing only the row information and a dummy data item with a value of 1. In the attached example, this is named "Product Lines"
3) Create a "Dimension Query" query that joins the queries from steps 1 and 2 on dummy. This requires that the Outer Join Allowed property of the query be set to Allowed. This creates a crossjoin that includes all possible combinations of rows and columns
4) Create a fourth query that contains the data for the crosstab. This is the same as a normal crosstab report.
5) Join the queries from steps 3 and 4, using cardinality of 1..1 and 0..n respectively. When dragging data items into this new query, ensure that you are dragging in the row and column headings from the "Dimension Query". This ensures that all possible rows and columns will be returned, even if there is no data associated with them."
Time for execution is very good

Resources