Want to Pivot data on two column - pivot

I have a data as below.
Region Transactions Production Value
EAST Sales LUX 1000
EAST Sales Cinthol 1500
EAST Purchases LUX 1000
EAST Purchases Cinthol 1500
NORTH Sales LUX 3000
NORTH Sales Cinthol 3500
NORTH Purchases LUX 3000
NORTH Purchases Cinthol 3500
SOUTH Sales LUX 4000
SOUTH Sales Cinthol 4500
SOUTH Purchases LUX 4000
SOUTH Purchases Cinthol 4500
WEST Sales LUX 2000
WEST Sales Cinthol 2500
WEST Purchases LUX 2000
WEST Purchases Cinthol 2500
I have data in a table in the above format nearly 1,00,000 rows.
So, through query i want the data in the below format.
LUX CINTHOL
Region SALES PURCHASES SALES PURCHASES
EAST 1000 1000 1500 1500
WEST 2000 2000 2500 2500
NORTH 3000 3000 3500 3500
SOUTH 4000 4000 4500 4500

If the RDBMS in Oracle 11g You can use PIVOT statement: 11g-pivot
WITH p1 AS
(
SELECT region, SALES_SUM_VALUE, PURCHASES_SUM_VALUE
FROM
(
SELECT *
FROM
(
SELECT *
FROM special_data
WHERE production = 'LUX'
)
PIVOT( SUM(value) AS SUM_VALUE FOR(transactions) IN('Sales' AS Sales, 'Purchases' AS Purchases))
)
)
, p2 AS
(
SELECT region, SALES_SUM_VALUE, PURCHASES_SUM_VALUE
FROM
(
SELECT *
FROM
(
SELECT *
FROM special_data
WHERE production = 'Cinthol'
)
PIVOT( SUM(value) AS SUM_VALUE FOR(transactions) IN('Sales' AS Sales, 'Purchases' AS Purchases))
)
)
SELECT p1.region, p1.SALES_SUM_VALUE LUX_SALES, p1.PURCHASES_SUM_VALUE LUX_PURCHASES, p2.SALES_SUM_VALUE Cinthol_SALES, p2.PURCHASES_SUM_VALUE Cinthol_PURCHASES
FROM p1, p2
WHERE p1.region = p2.region
I've prepared some fiddle for You: SQL FIDDLE

Related

How can I convert a repeated column element in to a title row?

I have some rather ugly post-pivot data, much like the following:
Location
Team
Staff
Sales
North
1
1100
55
North
2
2100
56
North
3
3200
91
South
1
7100
75
South
2
3100
16
South
3
9200
41
East
1
8100
25
East
2
9100
56
East
3
4200
31
My users don't like the duplication in the first column and would rather it be a header row with only one element, with the three resulting tables side-by-side. So, something like this:
with the obvious extension for East.
How can I achieve this automatically? I would do it by hand, but the real version of my table has a few hundred categories of values in the Location column.

How to create a Rank Column with recurring rank number? (Excel)

I would be happy if you would like to check the picture bellow first so you might clearly and directly understand my question.
I want to generate a field that ranking every state according to its assigned region
These are my inputs:
| Region | State |
West California
West Arizona
West Washington
East New York
East Florida
East North Carolina
South Texas
South Louisiana
South Alabama
I would like to generate the "Rank State" field
| Region | State | Rank State |
West California 1
West Arizona 2
West Washington 3
East New York 1
East Florida 2
East North Carolina 3
South Texas 1
South Louisiana 2
South Alabama 3
the question is: what calculation or method can do the "rank state" column/field?
I'd be Happy to accept excel solutions if it is possible :)
The way I see it, you want to count how many states above or including the selected one are in the same region?
Assuming 'Region' is Column A (in excel)
in row 2 in the Rank column, paste:
=COUNTIF($A$2:$A2, $A2)
Then autofill it down the column (double-click or drag the little green square at the bottom right of the selected cell)

Python: how to remove footnotes when loading data, and how to select the first when there is a pair of numbers

I am new to python and looking for help.
resp =requests.get("https://en.wikipedia.org/wiki/World_War_II_casualties")
soup = bs.BeautifulSoup(resp.text)
table = soup.find("table", {"class": "wikitable sortable"})
deaths = []`
for row in table.findAll('tr')[1:]:
death = row.findAll('td')[5].text.strip()
deaths.append(death)
It comes out as
'30,000',
'40,400',
'',
'88,000',
'2,000',
'21,500',
'252,600',
'43,600',
'15,000,000[35]to 20,000,000[35]',
'100',
'340,000 to 355,000',
'6,000',
'3,000,000to 4,000,000',
'1,100',
'83,000',
'100,000[49]',
'85,000 to 95,000',
'600,000',
'1,000,000to 2,200,000',
'6,900,000 to 7,400,000',
...
'557,000',
'5,900,000[115] to 6,000,000[116]',
'40,000to 70,000',
'500,000[39]',
'36,000–50,000',
'11,900',
'10,000',
'20,000,000[141] to 27,000,000[142][143][144][145][146]',
'',
'2,100',
'100',
'7,600',
'200',
'450,900',
'419,400',
'1,027,000[160] to 1,700,000[159]',
'',
'70,000,000to 85,000,000']`
I want to plot a graph, but the [] footnote would completely ruin it. Many of the values are with footnotes. Is it also possible to select the first number when there is a pair in one cell? I'd appreciate if anyone of you could teach me... Thank you
You can use soup.find_next() with text=True parameter, then split/strip accordingly.
For example:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/World_War_II_casualties'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for tr in soup.table.select('tr:has(td)')[1:]:
tds = tr.select('td')
if not tds[0].b:
continue
name = tds[0].b.get_text(strip=True, separator=' ')
casualties = tds[5].find_next(text=True).strip()
print('{:<30} {}'.format(name, casualties.split('–')[0].split()[0] if casualties else ''))
Prints:
Albania 30,000
Australia 40,400
Austria
Belgium 88,000
Brazil 2,000
Bulgaria 21,500
Burma 252,600
Canada 43,600
China 15,000,000
Cuba 100
Czechoslovakia 340,000
Denmark 6,000
Dutch East Indies 3,000,000
Egypt 1,100
Estonia 83,000
Ethiopia 100,000
Finland 85,000
France 600,000
French Indochina 1,000,000
Germany 6,900,000
Greece 507,000
Guam 1,000
Hungary 464,000
Iceland 200
India 2,200,000
Iran 200
Iraq 700
Ireland 100
Italy 492,400
Japan 2,500,000
Korea 483,000
Latvia 250,000
Lithuania 370,000
Luxembourg 5,000
Malaya & Singapore 100,000
Malta 1,500
Mexico 100
Mongolia 300
Nauru 500
Nepal
Netherlands 210,000
Newfoundland 1,200
New Zealand 11,700
Norway 10,200
Papua and New Guinea 15,000
Philippines 557,000
Poland 5,900,000
Portuguese Timor 40,000
Romania 500,000
Ruanda-Urundi 36,000
South Africa 11,900
South Pacific Mandate 10,000
Soviet Union 20,000,000
Spain
Sweden 2,100
Switzerland 100
Thailand 7,600
Turkey 200
United Kingdom 450,900
United States 419,400
Yugoslavia 1,027,000
Approx. totals 70,000,000

Subtotal for each level in Pivot table

I'm trying to create a pivot table that has, besides the general total, a subtotal between each row level.
I created my df.
import pandas as pd
df = pd.DataFrame(
np.array([['SOUTH AMERICA', 'BRAZIL', 'SP', 500],
['SOUTH AMERICA', 'BRAZIL', 'RJ', 200],
['SOUTH AMERICA', 'BRAZIL', 'MG', 150],
['SOUTH AMERICA', 'ARGENTINA', 'BA', 180],
['SOUTH AMERICA', 'ARGENTINA', 'CO', 300],
['EUROPE', 'SPAIN', 'MA', 400],
['EUROPE', 'SPAIN', 'BA', 110],
['EUROPE', 'FRANCE', 'PA', 320],
['EUROPE', 'FRANCE', 'CA', 100],
['EUROPE', 'FRANCE', 'LY', 80]], dtype=object),
columns=["CONTINENT", "COUNTRY","LOCATION","POPULATION"]
)
After that i created my pivot table as shown bellow
table = pd.pivot_table(df, values=['POPULATION'], index=['CONTINENT', 'COUNTRY', 'LOCATION'], fill_value=0, aggfunc=np.sum, dropna=True)
table
To do the subtotal i started sum CONTINENT level
tab_tots = table.groupby(level='CONTINENT').sum()
tab_tots.index = [tab_tots.index, ['Total'] * len(tab_tots)]
And concatenated with my first pivot to get subtotal.
pd.concat([table, tab_tots]).sort_index()
And got it:
How can i get the values separated in level like the first table?
I'm not finding a way to do this.
With margins=True, and need change little bit of your pivot index and columns .
newdf=pd.pivot_table(df, index=['CONTINENT'],values=['POPULATION'], columns=[ 'COUNTRY', 'LOCATION'], aggfunc=np.sum, dropna=True,margins=True)
newdf.drop('All').stack([1,2])
Out[132]:
POPULATION
CONTINENT COUNTRY LOCATION
EUROPE All 1010.0
FRANCE CA 100.0
LY 80.0
PA 320.0
SPAIN BA 110.0
MA 400.0
SOUTH AMERICA ARGENTINA BA 180.0
CO 300.0
All 1330.0
BRAZIL MG 150.0
RJ 200.0
SP 500.0
IIUC:
contotal = table.groupby(level=0).sum().assign(COUNTRY='TOTAL', LOCATION='').set_index(['COUNTRY','LOCATION'], append=True)
coutotal = table.groupby(level=[0,1]).sum().assign(LOCATION='TOTAL').set_index(['LOCATION'], append=True)
df_out = (pd.concat([table,contotal,coutotal]).sort_index())
df_out
Output:
POPULATION
CONTINENT COUNTRY LOCATION
EUROPE FRANCE CA 100
LY 80
PA 320
TOTAL 500
SPAIN BA 110
MA 400
TOTAL 510
TOTAL 1010
SOUTH AMERICA ARGENTINA BA 180
CO 300
TOTAL 480
BRAZIL MG 150
RJ 200
SP 500
TOTAL 850
TOTAL 1330
You want to do something like this instead
tab_tots.index = [tab_tots.index, ['Total'] * len(tab_tots), [''] * len(tab_tots)]
Which gives the following I think you are after
In [277]: pd.concat([table, tab_tots]).sort_index()
Out[277]:
POPULATION
CONTINENT COUNTRY LOCATION
EUROPE FRANCE CA 100
LY 80
PA 320
SPAIN BA 110
MA 400
Total 1010
SOUTH AMERICA ARGENTINA BA 180
CO 300
BRAZIL MG 150
RJ 200
SP 500
Total 1330
Note that although this solves your problem, it isn't good programming stylistically. You have inconsistent logic on your summed levels.
This makes sense for a UI interface but if you are using the data it would be better to perhaps use
tab_tots.index = [tab_tots.index, ['All'] * len(tab_tots), ['All'] * len(tab_tots)]
This follows SQL table logic and will give you
In [289]: pd.concat([table, tab_tots]).sort_index()
Out[289]:
POPULATION
CONTINENT COUNTRY LOCATION
EUROPE All All 1010
FRANCE CA 100
LY 80
PA 320
SPAIN BA 110
MA 400
SOUTH AMERICA ARGENTINA BA 180
CO 300
All All 1330
BRAZIL MG 150
RJ 200
SP 500

Dynamically fusion rows cells with same values in Excel

In a datasheet with automatic filters, I have this (values and columns names are for example) :
Continent Country City Street
----------------------------------------------------------
Asia Vietnam Hanoi egdsqgdfgdsfg
Asia Vietnam Hanoi fhfdghdfdh
Asia Vietnam Hanoi dfhdfhfdhfdhfdhfdh
Asia Vietnam Saigon ggdsfgfdsdgsdfgdf
Asia Vietnam Hue qsdfqsfqsdf
Asia China Beijing qegfqsddfgdf
Asia China Canton sdgsdfgsdgsdg
Asia China Canton tjgjfgj
Asia China Canton tzeryrty
Asia Japan Tokyo ertsegsgsdfdg
Asia Japan Kyoto qegdgdfgdfgdf
Asia Japan Sapporo gsdgfdgsgsdfgf
Europa France Paris qfqsdfdsqfgsdfgsg
Europa France Toulon qgrhrgqzfqzetzeqrr
Europa France Lyon pàjhçuhàçuh
Europa Italy Rome qrgfqegfgdfg
Europa Italy Rome qergqegsdfgsdfgdsg
I would like this to be displayed like this, with rows fusionned dynamically if filters changes
Continent Country City Street
----------------------------------------------------------
egdsqgdfgdsfg
Hanoi fhfdghdfdh
Vietnam dfhdfhfdhfdhfdhfdh
Saigon ggdsfgfdsdgsdfgdf
Hue qsdfqsfqsdf
---
Asia Beijing qegfqsddfgdf
China sdgsdfgsdgsdg
Canton tjgjfgj
tzeryrty
---
Tokyo ertsegsgsdfdg
Japan Kyoto qegdgdfgdfgdf
Sapporo gsdgfdgsgsdfgf
---
Paris qfqsdfdsqfgsdfgsg
France Toulon qgrhrgqzfqzetzeqrr
Europa Lyon pàjhçuhàçuh
Italy Rome qrgfqegfgdfg
qergqegsdfgsdfgdsg
Is macro mandatory for this ?
I don't want to merge values in Street column. I want to keep all lines. I just want to work on the first column display to avoid having long series of same values.
You can also setup a PivotTable - this would look like this:
Just go to "insert->pivottable" and select your given data as input and create the pivottable as new worksheet ;)
Put all field in the "rows" section, remove any subsum or sum calculations.
Because you don't have any values to sum up, you should just hide those columns, to get a clear view.
If you want to use a Function.
You can do it like this:
=IF(MATCH(Tabelle1!A1;(Tabelle1!A:A);0)=ROW();Tabelle1!A1;"")
Insert this Formula in a other Sheet.

Resources