I have a set of data as below.
SHEET 1
+------+-------+
| JANUARY |
+------+-------+
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | ALFRED | 11 | 150 |
| 2 | ARIS | 22 | 120 |
| 3 | JOHN | 33 | 170 |
| 4 | CHRIS | 22 | 190 |
| 5 | JOE | 55 | 120 |
| 6 | ACE | 11 | 200 |
+----+----------+------+-------+
SHEET2
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | CHRIS | 13 | 123 |
| 2 | ACE | 26 | 165 |
| 3 | JOE | 39 | 178 |
| 4 | ALFRED | 21 | 198 |
| 5 | JOHN | 58 | 112 |
| 6 | ARIS | 11 | 200 |
+----+----------+------+-------+
The RESULT should look like this in sheet1 :
+------+-------++------+-------+
| JANUARY | FEBRUARY |
+------+-------++------+-------+
+----+----------+------+-------++-------+-------+
| ID | NAME |COUNT | PRICE || COUNT | PRICE |
+----+----------+------+-------++-------+-------+
| 1 | ALFRED | 11 | 150 || 21 | 198 |
| 2 | ARIS | 22 | 120 || 11 | 200 |
| 3 | JOHN | 33 | 170 || 58 | 112 |
| 4 | CHRIS | 22 | 190 || 13 | 123 |
| 5 | JOE | 55 | 120 || 39 | 178 |
| 6 | ACE | 11 | 200 || 26 | 165 |
+----+----------+------+-------++-------+-------+
I need formula in column name "FEBRUARY". this formula will find its match in sheet 2
BUILD SAMPLE DATA
create table table1(
id int,
id_entry varchar(10),
tag int,
tag2 int
)
create table table2(
id int,
name varchar(50),
lastname varchar(50),
age int,
tel int
)
insert into table1
select 1, 'A1', 11, 12 union all
select 2, 'C2', 22, 13 union all
select 3, 'S5', 33, 14 union all
select 4, 'C2', 22, 13 union all
select 5, 'B6', 55, 16 union all
select 6, 'A1', 11, 12
insert into table2
select 1, 'ALFRED', 'DAVE', 21, 555 union all
select 2, 'FRED', 'SMITH', 22, 666 union all
select 3, 'MANNY', 'PAC', 23, 777 union all
select 4, 'FRED', 'DAVE', 22, 666 union all
select 5, 'JOHN', 'SMITH', 25, 999 union all
select 6, 'ALFRED', 'DAVE', 21, 555
SOLUTION
;with cte as(
select
t1.id_entry,
t1.tag,
t1.tag2,
t2.name,
t2.lastname,
t2.age,
t2.tel,
cc = count(*) over(partition by t1.id_entry),
rn = row_number() over(partition by t1.id_entry order by t2.lastname desc)
from table1 t1
inner join table2 t2
on t2.id = t1.id
)
select
id_entry,
tag,
tag2,
name,
lastname,
age,
tel
from cte
where
cc > 1
and rn = 1
DROP SAMPLE DATA
drop table table1
drop table table2
Try this
SELECT T2.ID_ENTRY, T1.TAG, T1.TAG2, T2.Name, T2.LastName, T2.Age, T2.Tel
FROM Table1 T1
INNER JOIN Table2 T2
ON T1.ID = T2.ID
GROUP BY T2.ID_ENTRY, T1.TAG, T1.TAG2, T2.Name, T2.LastName, T2.Age, T2.Tel
HAVING Count(T2.ID_ENTRY) > 1
Related
I have a following dataframe:
+--------+------+---------+---------+
| Col1 | col2 | values1 | Values2 |
+--------+------+---------+---------+
| item1 | A1 | 5 | 11 |
| item1 | A2 | 5 | 25 |
| item1 | A3 | 5 | 33 |
| item1 | na | | 18 |
| item2 | A1 | 6 | 12 |
| item2 | A2 | 6 | 26 |
| item2 | A3 | 6 | 34 |
| item2 | na | 6 | |
+--------+------+---------+---------+
which can be created with this code
df = Seq(
(item1, A1,5 ,11),
(item1, A2,5 ,25),
(item1, A3,5 ,33),
(item1, na,0,18),
(item2, A1,6 ,12),
(item2, A2,6 ,26),
(item2, A3,6 ,34),
(item2, na,6 ,0)).toDF('Col1', 'col2', 'values1', 'Values2');
I want to skip the adding of column values1 for all the records when doing rollup or cube on it.
My Desired OutPut:
+-------+------+---------+---------+
| Col1 | col2 | values1 | values2 |
+-------+------+---------+---------+
| null | null | 17 | 159 |
| item1 | null | 5 | 87 |
| item1 | A1 | 5 | 11 |
| item1 | A2 | 5 | 25 |
| item1 | A3 | 5 | 33 |
| item1 | na | 0 | 18 |
| item2 | null | 12 | 72 |
| item2 | A1 | 6 | 12 |
| item2 | A2 | 6 | 26 |
| item2 | A3 | 6 | 34 |
| item2 | na | 6 | |
+-------+------+---------+---------+
How can I get a rollup or cube Function applied to this dataset so that sum of values1 to Col1 should sum up the values for either (A1/A2/A3)+na=
so for eg:
the second row shows
values1 =5= 5+0 and values2= 87=11+25+33+18 and the 6th row
values1 =12=6+6 and values2 =12+26+34+0=72
But what I get now by doing the rollup operation is
Adds up all the agg which I don't want to happen for values1 column.
df.rollup("Col1","col2").agg(sum("values1") as "values1",sum("values2") as "values2");
Current Output:
+-------+------+---------+---------+
| Col1 | col2 | values1 | values2 |
+-------+------+---------+---------+
| null | null | 39 | 159 |
| item1 | null | 15 | 87 |
| item1 | A1 | 5 | 11 |
| item1 | A2 | 5 | 25 |
| item1 | A3 | 5 | 33 |
| item1 | na | 0 | 18 |
| item2 | null | 24 | 72 |
| item2 | A1 | 6 | 12 |
| item2 | A2 | 6 | 26 |
| item2 | A3 | 6 | 34 |
| item2 | na | 6 | |
+-------+------+---------+---------+
(The link which was posted as dup is not the actual ask here. The desired output is different than the answers in the link )
For below Dataset, to get Total Summary values of Col1 , I did
import org.apache.spark.sql.functions._
val totaldf = df.groupBy("Col1").agg(lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"))
and then merged with
df.union(totaldf).orderBy(col("Col1"), col("Col2").desc).show(false)
df.
+-----------+-------+--------+--------------+
| Col1 | Col2 | price | displayPrice |
+-----------+-------+--------+--------------+
| Category1 | item1 | 15 | 14 |
| Category1 | item2 | 11 | 10 |
| Category1 | item3 | 18 | 16 |
| Category2 | item1 | 15 | 14 |
| Category2 | item2 | 11 | 10 |
| Category2 | item3 | 18 | 16 |
+-----------+-------+--------+--------------+
After merging.
+-----------+-------+-------+--------------+
| Col1 | Col2 | price | displayPrice |
+-----------+-------+-------+--------------+
| Category1 | Total | 44 | 40 |
| Category1 | item1 | 15 | 14 |
| Category1 | item2 | 11 | 10 |
| Category1 | item3 | 18 | 16 |
| Category2 | Total | 46 | 44 |
| Category2 | item1 | 16 | 15 |
| Category2 | item2 | 11 | 10 |
| Category2 | item3 | 19 | 17 |
+-----------+-------+-------+--------------+
Now I want summary of Whole Dataset as Below , which will have Col1 Summary as Total and has the Data of All Col1 and Col2.
Required.
+-----------+-------+-------+--------------+
| Col1 | Col2 | price | displayPrice |
+-----------+-------+-------+--------------+
| Total | Total | 90 | 84 |
| Category1 | Total | 44 | 40 |
| Category1 | item1 | 15 | 14 |
| Category1 | item2 | 11 | 10 |
| Category1 | item3 | 18 | 16 |
| Category2 | Total | 46 | 44 |
| Category2 | item1 | 16 | 15 |
| Category2 | item2 | 11 | 10 |
| Category2 | item3 | 19 | 17 |
+-----------+-------+-------+--------------+
How Can I be able to achieve the above result?
create a third dataframe from the totaldf as
val finalTotalDF= totaldf.select(lit("Total").as("Col1"), lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"))
and then use it for union as
df.union(totaldf).union(finalTotalDF).orderBy(col("Col1"), col("Col2").desc).show(false)
You should have your final required dataframe
Updated
If ordering matters to you then you should be changing T of Total in Col2 column to t as total by doing the following
import org.apache.spark.sql.functions._
val totaldf = df.groupBy("Col1").agg(lit("total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"))
val finalTotalDF= totaldf.select(lit("Total").as("Col1"), lit("total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"))
df.union(totaldf).union(finalTotalDF).orderBy(col("Col1").desc, col("Col2").desc).show(false)
and you should get
+---------+-----+-----+------------+
|Col1 |Col2 |price|displayPrice|
+---------+-----+-----+------------+
|Total |total|90 |82 |
|Category2|total|46 |42 |
|Category2|item3|19 |17 |
|Category2|item2|11 |10 |
|Category2|item1|16 |15 |
|Category1|total|44 |40 |
|Category1|item3|18 |16 |
|Category1|item2|11 |10 |
|Category1|item1|15 |14 |
+---------+-----+-----+------------+
If ordering really matters to you as mentioned in the comment
I want the total Data as prioirity,So I want that to be at the Top, which is actuall the requirement for me
Then you can create another column for sorting as
import org.apache.spark.sql.functions._
val totaldf = df.groupBy("Col1").agg(lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"), lit(1).as("sort"))
val finalTotalDF= totaldf.select(lit("Total").as("Col1"), lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice"), lit(0).as("sort"))
finalTotalDF.union(totaldf).union(df.withColumn("sort", lit(2))).orderBy(col("sort"), col("Col1"), col("Col2")).drop("sort").show(false)
and you should get
+---------+-----+-----+------------+
|Col1 |Col2 |price|displayPrice|
+---------+-----+-----+------------+
|Total |Total|90 |82 |
|Category1|Total|44 |40 |
|Category2|Total|46 |42 |
|Category1|item1|15 |14 |
|Category1|item2|11 |10 |
|Category1|item3|18 |16 |
|Category2|item1|16 |15 |
|Category2|item2|11 |10 |
|Category2|item3|19 |17 |
+---------+-----+-----+------------+
Excel is beating me up for a day here.
I have this table:
+---+--------+--------+--------+--------+
| | A | B | C | D |
+---+--------+--------+--------+--------+
| 1 | AGE | EX# | DG1 | DG2 |
+---+--------+--------+--------+--------+
| 2 | 19 | C01 | ASC | |
+---+--------+--------+--------+--------+
| 3 | 45 | C02 | ATR | |
+---+--------+--------+--------+--------+
| 4 | 27 | C03 | LSI | |
+---+--------+--------+--------+--------+
| 5 | 15 | C04 | LSI | |
+---+--------+--------+--------+--------+
| 6 | 49 | C05 | ASC | AGC |
+---+--------+--------+--------+--------+
| 7 | 76 | C06 | AGC | |
+---+--------+--------+--------+--------+
| 8 | 33 | C07 | ASC | |
+---+--------+--------+--------+--------+
| 9 | 17 | C08 | LSI | |
+---+--------+--------+--------+--------+
Now, I need to create a new table based on that data, with one row and one column, which I'll fill column A and need a formula to fill column B:
+----+--------+---------------+
| | A | B |
+--=-+--------+---------------+
| | DG | AGE |
+--=-+--------+---------------+
| 10 | AGC | 49, 76 |
+----+--------+---------------+
| 11 | ASC | 19, 33, 49 |
+----+--------+---------------+
| 12 | ATR | 45 |
+----+--------+---------------+
| 13 | LSI | 15, 17, 27 |
+----+--------+---------------+
So I need a formula to check the first table's columns C and D for each DGs, and check the age of each one in column A, and then concatenate all values that match into one cell with a , as a separator.
Can anyone help me?
Thanks
On the great excel website from Chip Pierson, I found the custom function: StringConcat. Just copy-paste the code in a VBA module.
Something like the following formula (in cell B10 & fill down)should work for you. It's an array formula (commit with [ctrl-shift-enter]
=StringConcat(", ",IF(Sheet1!$B$2:$C$100=A10,Sheet1!$A$2:$A$100,""))
You'll have to adjust the ranges off course.
I have a set of data as below.
SHEET 1
+------+-------+
| JANUARY |
+------+-------+
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | ALFRED | 11 | 150 |
| 2 | ARIS | 22 | 120 |
| 3 | JOHN | 33 | 170 |
| 4 | CHRIS | 22 | 190 |
| 5 | JOE | 55 | 120 |
| 6 | ACE | 11 | 200 |
+----+----------+------+-------+
SHEET2
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | CHRIS | 13 | 123 |
| 2 | ACE | 26 | 165 |
| 3 | JOE | 39 | 178 |
| 4 | ALFRED | 21 | 198 |
| 5 | JOHN | 58 | 112 |
| 6 | ARIS | 11 | 200 |
+----+----------+------+-------+
The RESULT should look like this in sheet1 :
+------+-------++------+-------+
| JANUARY | FEBRUARY |
+------+-------++------+-------+
+----+----------+------+-------++-------+-------+
| ID | NAME |COUNT | PRICE || COUNT | PRICE |
+----+----------+------+-------++-------+-------+
| 1 | ALFRED | 11 | 150 || 21 | 198 |
| 2 | ARIS | 22 | 120 || 11 | 200 |
| 3 | JOHN | 33 | 170 || 58 | 112 |
| 4 | CHRIS | 22 | 190 || 13 | 123 |
| 5 | JOE | 55 | 120 || 39 | 178 |
| 6 | ACE | 11 | 200 || 26 | 165 |
+----+----------+------+-------++-------+-------+
I need formula in column name "FEBRUARY". this formula will find its match in sheet 2
Assuming the first Count value should go in cell E3 of Sheet1, the following formula would be the usual way of doing it:-
=INDEX(Sheet2!C:C,MATCH($B3,Sheet2!$B:$B,0))
Then the Price (in F3) would be given by
=INDEX(Sheet2!D:D,MATCH($B3,Sheet2!$B:$B,0))
I think this query will work fine for your requirement
SELECT `Sheet1$`.ID,`Sheet1$`.NAME, `Sheet1$`.COUNT AS 'Jan-COUNT',`Sheet1$`.PRICE AS 'Jan-PRICE', `Sheet2$`.COUNT AS 'Feb-COUNT',`Sheet2$`.PRICE AS 'Feb-PRICE'
FROM `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet1$` `Sheet1$`, `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet2$` `Sheet2$`
WHERE (`Sheet1$`.NAME=`Sheet2$`.NAME)
Provide Actual path insted of
C:\Users\Nagendra\Desktop\aaaaa.xlsx
First you need to know about how to make connection. So refer http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html
Hi I am currently working on a project that contains individual information for each month and I want to build a table or two to contain the information(I don't want to create a table for each month). a simple illustration will be :
Jan
weight height
student a
student b
Feb
weight height
student a
student b
student c
what I what is just to export data to excel in the form of the above, weight, height column are fixed but I want to have data clustered by month so that the data organization is clearer.
May I ask how to design the database so that the abovementioned requirement could be met? Thanks.
Here are the tables you'll need to store the information:
students
id unsigned int(P)
name varchar(50)
+----+------+
| id | name |
+----+------+
| 1 | John |
| 2 | Mary |
| 3 | Tina |
| .. | .... |
+----+------+
In the measurements table the Primary Key is formed by the student_id, year and month. The student_id is also a foreign key to the students table.
measurements
student_id unsigned int(F students.id)-\
year unsigned int ----------------(P)
month unsigned int ---------------/
height unsigned int
weight unsigned int
+------------+------+-------+--------+--------+
| student_id | year | month | height | weight |
+------------+------+-------+--------+--------+
| 1 | 2013 | 11 | 70 | 200 |
| 2 | 2013 | 11 | 65 | 130 |
| 1 | 2013 | 12 | 70 | 192 |
| 2 | 2013 | 12 | 65 | 126 |
| 3 | 2013 | 12 | 68 | 140 |
| .......... | .... | ..... | ...... | ...... |
+------------+------+-------+--------+--------+
And then a query to extract the information:
SELECT name, height, weight, year, month
FROM students s
LEFT JOIN measurements m ON s.id = m.student_id
ORDER BY year, month, name
Which will give you:
+------+--------+--------+------+-------+
| name | height | weight | year | month |
+------+--------+--------+------+-------+
| John | 70 | 200 | 2013 | 11 |
| Mary | 65 | 130 | 2013 | 11 |
| John | 70 | 192 | 2013 | 12 |
| Mary | 65 | 126 | 2013 | 12 |
| Tina | 68 | 140 | 2013 | 12 |
+------+--------+--------+------+-------+
Which is the data you want, sorted in the way you want. Any further formatting of the data is up to your application.