Excel merge cells equivalent in SQL - excel

I am trying to generate SQL output similar to how Excel does the merge cells.
Sample output from SQL select statement
Name
Country
State
years
lived
Alice
USA
CA
2
2
Alice
USA
NYC
1
1
Alice
USA
MI
5
5
Bob
USA
CA
1
1
Bob
USA
NYC
8
8
Bob
USA
IL
4
4
I am trying to convert the output in below format using SQL query, so I can directly export to Excel, instead of modifying in excel separately.
Name
Country
State
years
lived
CA
2
NYC
1
Alice
USA
MI
5
8
CA
1
NYC
8
Bob
USA
IL
4
13
Any help is appreciated.

Related

Create new dataframe column where cell value indicates animal type

Suppose that I have the following dataframe called pet_stores that consists of the number of cats and dogs per location of a pet store franchise
Dog Cat City
0 5 11 NYC
1 4 1 San Francisco
How can I transform this dataframe such that instead of having separate Dog and Cat columns I have a single column called animal_type? I want the following result:
animal_type Count City
0 Dog 5 NYC
1 Dog 4 San Francisco
2 Cat 11 NYC
3 Cat 1 San Francisco
Thanks!
Use melt:
>>> df.melt('City', var_name='animal_type', value_name='Count')
City animal_type Count
0 NYC Dog 5
1 San Francisco Dog 4
2 NYC Cat 11
3 San Francisco Cat 1
Without melt and index manipulation:
>>> df.set_index('City').rename_axis(columns='animal_type') \
.stack().rename('Count').reset_index()
City animal_type Count
0 NYC Dog 5
1 NYC Cat 11
2 San Francisco Dog 4
3 San Francisco Cat 1
Use sort_values to change the order of rows.

Excel cell lookup in subtotaled range

I'd like to use index/match to lookup values in a subtotaled range. Using the sample data below, from another sheet (Sheet 2), I need to lookup the total NY Company hours for each employee.
Sheet 2:
| Bob | NY Company | ???? |
This formula returns the first match of NY Company Total
=INDEX('Sheet1!A1:C45,MATCH(Sheet2!B2 & " Total",'Sheet1!B1:B45,0),3)
Now I need to expand the lookup to include the Employee (Bob). Also, Column A is blank on the total Row. I've started to work with something like the following but no luck.
=INDEX('Sheet1!A1:C45,MATCH(1,('Sheet2!B2 & " Total"='Sheet1!B1:B45)*('Sheet2!B1='Sheet1!A1:A45)),3)
Also, as the sample data below looks perfect in the preview and then looks really bad after saving, I've added a pic with the sample data.
Sample data:
Sample Data:
A
B
C
Employee
Customer
Hours
Bob
ABC Company
5
Bob
ABC Company
3
ABC Company Total
8
Bob
NY Company
7
Bob
NY Company
7
Bob
NY Company
5
Bob
NY Company
3
NY Company Total
22
Bob
Jet Company
1
Jet Company Total
1
Carrie
ABC Company
1
Carrie
ABC Company
4
ABC Company Total
5
Carrie
NY Company
6
Carrie
NY Company
2
Carrie
NY Company
3
NY Company Total
11
Carrie
Jet Company
7
Carrie
Jet Company
9
Jet Company Total
16
Carrie
XYZ Company
4
XYZ Company Total
4
Gale
Cats Service
2
Gale
Cats Service
6
Gale
Cats Service
1
Cats Service Total
9
Gale
NY Company
6
Gale
NY Company
8
NY Company Total
14
Gale
XYZ Company
1
XYZ Company Total
1
John
NY Company
3
John
NY Company
5
NY Company Total
8
John
XYZ Company
8
John
XYZ Company
5
XYZ Company Total
13
Ken
ABC Company
10
ABC Company Total
10
Ken
NY Company
2
Ken
NY Company
3
Ken
NY Company
5
NY Company Total
10
Grand Total
132
Any suggestions??

Question about excel columns csv file how to combine columns

I got a quick question I got a column like this
the players name and the percentage of matches won
Rank
Country
Name
Matches Won %
1 ESP ESP Rafael Nadal 89.06%
2 SRB SRB Novak Djokovic 83.82%
3 SUI SUI Roger Federer 83.61%
4 RUS RUS Daniil Medvedev 73.75%
5 AUT AUT Dominic Thiem 72.73%
6 GRE GRE Stefanos Tsitsipas 67.95%
7 JPN JPN Kei Nishikori 67.44%
and I got another data like this ACES PERCENTAGE
Rank
Country
Name
Ace %
1 USA USA John Isner 26.97%
2 CRO CRO Ivo Karlovic 25.47%
3 USA USA Reilly Opelka 24.81%
4 CAN CAN Milos Raonic 24.63%
5 USA USA Sam Querrey 20.75%
6 AUS AUS Nick Kyrgios 20.73%
7 RSA RSA Kevin Anderson 17.82%
8 KAZ KAZ Alexander Bublik 17.06%
9 FRA FRA Jo Wilfried Tsonga 14.29%
---------------------------------------
85 ESP ESP RAFAEL NADAL 6.85%
My question is can I make my two tables align so for example I want to have
my data based on matches won
So I have for example
Rank Country Name Matches% Aces %
1 ESP RAFAEL NADAL 89.06% 6.85%
Like this for all the player
I agree with the comment above that it would be easiest to import both and to then use XLOOKUP() to add the Aces % column to the first set of data. If you import the first data set to Sheet1 and the second data set to Sheet2 and both have the rank in Column A , your XLOOKUP() in Sheet 1 Column E would look something like:
XLOOKUP(A2, Sheet2!A:A, Sheet2!D:D)

Excel - Count based on criteria in 3 other columns

I'm looking for help in getting a count based on criteria in 3 other columns. I started to do a pivot table, but I cannot see how to add an IF statement to the distinct count there.
I need a count of each customer within the customer type, by each supplier, if the Cases > 0 for that year.
Here's a sample data set:
Supplier
Customer
Type
2019 Cases
2020 Cases
ABC
Al's Store
Package
3
2
ABC
Ben's
Package
0
6
ABC
Kroger
Grocery
2
1
ABC
Publix
Grocery
1
0
XYZ
Al's Store
Package
0
5
XYZ
Ben's
Package
4
0
XYZ
Kroger
Grocery
0
1
XYZ
Publix
Grocery
3
7
I need a result like this. My actual report will have each supplier on their own tab.
Supplier
Type
2019 Customer Count
2020 Customer Count
My Reason
ABC
Package
1
2
Al's bought in both years, but Ben's only in 2020
ABC
Grocery
2
1
Kroger bought in both years, but Publix only in 2019
XYZ
Package
1
1
Al only bought in 2020, Ben only bought in 2019
XYZ
Grocery
1
2
Kroger only bought in 2020
Thanks!

Select two or more consecutive rows based on a criteria using python

I have a data set like this:
user time city cookie index
A 2019-01-01 11.00 NYC 123456 1
A 2019-01-01 11.12 CA 234567 2
A 2019-01-01 11.18 TX 234567 3
B 2019-01-02 12.19 WA 456789 4
B 2019-01-02 12.21 FL 456789 5
B 2019-01-02 12.31 VT 987654 6
B 2019-01-02 12.50 DC 157890 7
A 2019-01-03 09:12 CA 123456 8
A 2019-01-03 09:27 NYC 345678 9
A 2019-01-03 09:34 TX 123456 10
A 2019-01-04 09:40 CA 234567 11
In this data set I want to compare and select two or more consecutive which fit the following criteria:
User should be same
Time difference should be less than 15 mins
Cookie should be different
So if I apply the filter I should get the following data:
user time city cookie index
A 2019-01-01 11.00 NYC 123456 1
A 2019-01-01 11.12 CA 234567 2
B 2019-01-02 12.21 FL 456789 5
B 2019-01-02 12.31 VT 987654 6
A 2019-01-03 09:12 CA 123456 8
A 2019-01-03 09:27 NYC 345678 9
A 2019-01-03 09:34 TX 123456 10
So, in the above, comparing first two rows(index 1 and 2) satisfy all the conditions above. The next two (index 2 and 3) has same cookie, index 3 and 4 has different user, 5 and 6 is selected and displayed, 6 and 7 has time difference more than 15 mins. 8,9 and 10 fit the criteria but 11 doesnt as the date is 24 hours apart.
How can I solve this using python dataframe? All help is appreciated.
What I have tried:
I tried creating flags using
shift()
cookiediff=pd.DataFrame(df.Cookie==df.Cookie.shift())
cookiediff.columns=['Cookiediffs']
timediff=pd.DataFrame(pd.to_datetime(df.time) - pd.to_datetime(df.time.shift()))
timediff.columns=['timediff']
mask = df.user != df.user.shift(1)
timediff.timediff[mask] = np.nan
cookiediff['Cookiediffs'][mask] = np.nan
This will do the trick:
import numpy as np
#you have inconsistent time delim-just to correct it per your sample data
df["time"]=df["time"].str.replace(":", ".")
df["time"]=pd.to_datetime(df["time"], format="%Y-%m-%d %H.%M")
cond_=np.logical_or(
df["time"].sub(df["time"].shift()).astype('timedelta64[m]').lt(15) &\
df["user"].eq(df["user"].shift()) &\
df["cookie"].ne(df["cookie"].shift()),
df["time"].sub(df["time"].shift(-1)).astype('timedelta64[m]').lt(15) &\
df["user"].eq(df["user"].shift(-1)) &\
df["cookie"].ne(df["cookie"].shift(-1)),
)
res=df.loc[cond_]
Few points- you need to ensure your time column is datetime in order to make the 15 minutes condition verifiable.
Then - the final filter (cond_) you obtain by comparing each row to the previous one, checking all 3 conditions OR by doing the same, but checking against the next one (otherwise you would just get all the consecutive matching rows, except the first one).
Outputs:
user time city cookie index
0 A 2019-01-01 11:00:00 NYC 123456 1
1 A 2019-01-01 11:12:00 CA 234567 2
4 B 2019-01-02 12:21:00 FL 456789 5
5 B 2019-01-02 12:31:00 VT 987654 6
7 A 2019-01-03 09:12:00 CA 123456 8
8 A 2019-01-03 09:27:00 NYC 345678 9
9 A 2019-01-03 09:34:00 TX 123456 10
You could use regular expressions to isolate the fields and use named groups and the groupdict() function to store the value of each field into a dictionary and compare the values from the last dictionary to the current one. So iterate through each line of the dataset with two dictionaries, the current dictionary and the last dictionary, and perform a re.search() on each line with the regex pattern string to separate each line into named fields, then compare the value of the two dictionaries.
So, something like:
import re
c_dict=re.search('(?P<user>\w) +(?P<time>\d{4}-\d{2}-\d{2} \d{2}\.\d{2}) +(?P<city>\w+) +(?P<cookie>\d{6}) +(?P<index>\d+)',s).groupdict()
for each line of your dataset. For the first line of your dataset, this would create the dictionary {'user': 'A', 'time': '2019-01-01 11.00', 'city': 'NYC', 'cookie': '123456', 'index': '1'}. With the fields isolated, you could easily compare the values of the fields to previous lines if you stored those in another dictionary.

Resources