How to create a Rank Column with recurring rank number? (Excel) - excel

I would be happy if you would like to check the picture bellow first so you might clearly and directly understand my question.
I want to generate a field that ranking every state according to its assigned region
These are my inputs:
| Region | State |
West California
West Arizona
West Washington
East New York
East Florida
East North Carolina
South Texas
South Louisiana
South Alabama
I would like to generate the "Rank State" field
| Region | State | Rank State |
West California 1
West Arizona 2
West Washington 3
East New York 1
East Florida 2
East North Carolina 3
South Texas 1
South Louisiana 2
South Alabama 3
the question is: what calculation or method can do the "rank state" column/field?
I'd be Happy to accept excel solutions if it is possible :)

The way I see it, you want to count how many states above or including the selected one are in the same region?
Assuming 'Region' is Column A (in excel)
in row 2 in the Rank column, paste:
=COUNTIF($A$2:$A2, $A2)
Then autofill it down the column (double-click or drag the little green square at the bottom right of the selected cell)

Related

Compare three dataframe and create a new column in one of the dataframe based on a condition

I am comparing two data frames with master_df and create a new column based on a new condition if available.
for example I have master_df and two region df as asia_df and europe_df. I want to check if company of master_df is available in any of the region data frames and create a new column as region as Europe and Asia
master_df
company product
ABC Apple
BCA Mango
DCA Apple
ERT Mango
NFT Oranges
europe_df
account sales
ABC 12
BCA 13
DCA 12
asia_df
account sales
DCA 15
ERT 34
My final output dataframe is expected to be
company product region
ABC Apple Europe
BCA Mango Europe
DCA Apple Europe
DCA Apple Asia
ERT Mango Asia
NFT Oranges Others
When I try to merge and compare, some datas are removed. I need help on how to fix this issues
final_df = europe_df.merge(master_df, left_on='company', right_on='account', how='left').drop_duplicates()
final1_df = asia_df.merge(master_df, left_on='company', right_on='account', how='left').drop_duplicates()
final['region'] = np.where(final_df['account'] == final_df['company'] ,'Europe','Others')
final['region'] = np.where(final1_df['account'] == final1_df['company'] ,'Asia','Others')
First using pd.concat concat the dataframes asia_df and europe_df then use DataFrame.merge to merge them with master_df, finally use Series.fillna to fill NaN values in Region with Others:
r = pd.concat([europe_df.assign(Region='Europe'), asia_df.assign(Region='Asia')])\
.rename(columns={'account': 'company'})[['company', 'Region']]
df = master_df.merge(r, on='company', how='left')
df['Region'] = df['Region'].fillna('Others')
Result:
print(df)
company product Region
0 ABC Apple Europe
1 BCA Mango Europe
2 DCA Apple Europe
3 DCA Apple Asia
4 ERT Mango Asia
5 NFT Oranges Others

Python: how to remove footnotes when loading data, and how to select the first when there is a pair of numbers

I am new to python and looking for help.
resp =requests.get("https://en.wikipedia.org/wiki/World_War_II_casualties")
soup = bs.BeautifulSoup(resp.text)
table = soup.find("table", {"class": "wikitable sortable"})
deaths = []`
for row in table.findAll('tr')[1:]:
death = row.findAll('td')[5].text.strip()
deaths.append(death)
It comes out as
'30,000',
'40,400',
'',
'88,000',
'2,000',
'21,500',
'252,600',
'43,600',
'15,000,000[35]to 20,000,000[35]',
'100',
'340,000 to 355,000',
'6,000',
'3,000,000to 4,000,000',
'1,100',
'83,000',
'100,000[49]',
'85,000 to 95,000',
'600,000',
'1,000,000to 2,200,000',
'6,900,000 to 7,400,000',
...
'557,000',
'5,900,000[115] to 6,000,000[116]',
'40,000to 70,000',
'500,000[39]',
'36,000–50,000',
'11,900',
'10,000',
'20,000,000[141] to 27,000,000[142][143][144][145][146]',
'',
'2,100',
'100',
'7,600',
'200',
'450,900',
'419,400',
'1,027,000[160] to 1,700,000[159]',
'',
'70,000,000to 85,000,000']`
I want to plot a graph, but the [] footnote would completely ruin it. Many of the values are with footnotes. Is it also possible to select the first number when there is a pair in one cell? I'd appreciate if anyone of you could teach me... Thank you
You can use soup.find_next() with text=True parameter, then split/strip accordingly.
For example:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/World_War_II_casualties'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for tr in soup.table.select('tr:has(td)')[1:]:
tds = tr.select('td')
if not tds[0].b:
continue
name = tds[0].b.get_text(strip=True, separator=' ')
casualties = tds[5].find_next(text=True).strip()
print('{:<30} {}'.format(name, casualties.split('–')[0].split()[0] if casualties else ''))
Prints:
Albania 30,000
Australia 40,400
Austria
Belgium 88,000
Brazil 2,000
Bulgaria 21,500
Burma 252,600
Canada 43,600
China 15,000,000
Cuba 100
Czechoslovakia 340,000
Denmark 6,000
Dutch East Indies 3,000,000
Egypt 1,100
Estonia 83,000
Ethiopia 100,000
Finland 85,000
France 600,000
French Indochina 1,000,000
Germany 6,900,000
Greece 507,000
Guam 1,000
Hungary 464,000
Iceland 200
India 2,200,000
Iran 200
Iraq 700
Ireland 100
Italy 492,400
Japan 2,500,000
Korea 483,000
Latvia 250,000
Lithuania 370,000
Luxembourg 5,000
Malaya & Singapore 100,000
Malta 1,500
Mexico 100
Mongolia 300
Nauru 500
Nepal
Netherlands 210,000
Newfoundland 1,200
New Zealand 11,700
Norway 10,200
Papua and New Guinea 15,000
Philippines 557,000
Poland 5,900,000
Portuguese Timor 40,000
Romania 500,000
Ruanda-Urundi 36,000
South Africa 11,900
South Pacific Mandate 10,000
Soviet Union 20,000,000
Spain
Sweden 2,100
Switzerland 100
Thailand 7,600
Turkey 200
United Kingdom 450,900
United States 419,400
Yugoslavia 1,027,000
Approx. totals 70,000,000

How to create spark datasets from a file without using File reader

I have a data file that has 4 data sections. Header data, Summary data, Detail data and Footer data. Each section has a fixed number of columns.Each section is divided by two rows that just have a single "#" as the row content.But different sections have different of columns. Is there a way I can avoid creating new files and just use spark tsv(tab seperated foramt) module or any other module to read the file into 4 datasets directly.If I read the file directly then I am loosing the extra columns in the next data section. It only reads the from the file only those columns as the first row of the file.
#deptno dname location
10 Accounting New York
20 Research Dallas
30 Sales Chicago
40 Operations Boston
#
#
#grade losal hisal
1 700.00 1200.00
2 1201.00 1400.00
4 2001.00 3000.00
5 3001.00 99999.00
3 1401.00 2000.00
#
#
#ENAME DNAME JOB EMPNO HIREDATE LOC
ADAMS RESEARCH CLERK 7876 23-MAY-87 DALLAS
ALLEN SALES SALESMAN 7499 20-FEB-81 CHICAGO
BLAKE SALES MANAGER 7698 01-MAY-81 CHICAGO
CLARK ACCOUNTING MANAGER 7782 09-JUN-81 NEW YORK
FORD RESEARCH ANALYST 7902 03-DEC-81 DALLAS
JAMES SALES CLERK 7900 03-DEC-81 CHICAGO
JONES RESEARCH MANAGER 7566 02-APR-81 DALLAS
#
#
#Name Age Address
Paul 23 1115 W Franklin
Bessy the Cow 5 Big Farm Way
Zeke 45 W Main St
Output:
Dataset d1 :
#deptno dname location
10 Accounting New York
20 Research Dallas
30 Sales Chicago
40 Operations Boston
Dataset d2 :
#grade losal hisal
1 700.00 1200.00
2 1201.00 1400.00
4 2001.00 3000.00
5 3001.00 99999.00
3 1401.00 2000.00
Dataset d3 :
#ENAME DNAME JOB EMPNO HIREDATE LOC
ADAMS RESEARCH CLERK 7876 23-MAY-87 DALLAS
ALLEN SALES SALESMAN 7499 20-FEB-81 CHICAGO
BLAKE SALES MANAGER 7698 01-MAY-81 CHICAGO
CLARK ACCOUNTING MANAGER 7782 09-JUN-81 NEW YORK
FORD RESEARCH ANALYST 7902 03-DEC-81 DALLAS
JAMES SALES CLERK 7900 03-DEC-81 CHICAGO
JONES RESEARCH MANAGER 7566 02-APR-81 DALLAS
Dataset d4 :
#Name Age Address
Paul23 1115 W Franklin
Bessy the Cow 5 Big Farm Way
Zeke 45 W Main St

If a cell value equals this, another cell equals that

I have a spreadsheet with a column for cities, of which their are only 4 different values. What is the formula for equating a new column to show the corresponding state and apply it to the entire list? Example:
Atlanta equals GA,
Phoenix equals AZ,
Chicago equals IL,
Nashville equals TN
Thanks!!
You can use the VLookup function for that:
Make a table with your city name in one column and the state in the next column. Then the following formula next to the city that you want populated:
=VLOOKUP(A1,A$20:B$23,2,FALSE)
In this example, the city you want to identify is in A1, and this formula goes in B1. You can copy it down to B2, B3, etc because the table is hard-coded as A$20:B$23, rather than A20:B23 (where each successive copy down the column would look for a table one row down as well). This example put the lookup table in the A-B columns, but you could put it anywhere you like.
The FALSE at the end means, look for an exact match, not closest. So if you get a "Dallas" in your list, the function will return NA rather than guessing between the state for Chicago and the state for Nashville (either side of Dallas, alphabetically).
Hope that helps!
EDIT:
You added that you also need zipcode info, and that's easy enough to add.
Your table that defines everything would put the zipcode in the 3rd column, so down at A20:B23 (in my example above) you'd end up with A20:C23, where the table would look like
Atlanta GA 12345
Chicago IL 23456
Nashville TN 34567
Phoenix AZ 45678
The cell next to your city in the table you want to populate would be in B1 as shown above giving the state, and then in C1 you'd have the following formula:
=VLOOKUP(A1,A$20:C$23,3,FALSE)
The changes are that here the table is defined out to column C, and instead of "2" returning the second column (i.e. the state abbreviation shown in B), it returns the zipcode shown in column C, the third column.
Again, hope that helps.
Since you mention "only 4 different values" maybe:
=CHOOSE(MATCH(LEFT(A1),{"A","P","C","N"},0),"GA","AZ","IL","TN")
You can use a VLOOKUP Table that contains the city and state abbreviation.
Here is a table that has the Capital, State, State Abbreviation.
Montgomery Alabama AL
Juneau Alaska AK
Phoenix Arizona AZ
Little Rock Arkansas AR
Sacramento California CA
Denver Colorado CO
Hartford Connecticut CT
Dover Delaware DE
Tallahassee Florida FL
Atlanta Georgia GA
Honolulu Hawaii HI
Boise Idaho ID
Springfield Illinois IL
Indianapolis Indiana IN
Des Moines Iowa IA
Topeka Kansas KS
Frankfort Kentucky KY
Baton Rouge Louisiana LA
Augusta Maine ME
Annapolis Maryland MD
Boston Massachusetts MA
Lansing Michigan MI
Saint Paul Minnesota MN
Jackson Mississippi MS
Jefferson City Missouri MO
Helena Montana MT
Lincoln Nebraska NE
Carson City Nevada NV
Concord New Hampshire NH
Trenton New Jersey NJ
Santa Fe New Mexico NM
Albany New York NY
Raleigh North Carolina NC
Bismarck North Dakota ND
Columbus Ohio OH
Oklahoma City Oklahoma OK
Salem Oregon OR
Harrisburg Pennsylvania PA
Providence Rhode Island RI
Columbia South Carolina SC
Pierre South Dakota SD
Nashville Tennessee TN
Austin Texas TX
Salt Lake City Utah UT
Montpelier Vermont VT
Richmond Virginia VA
Olympia Washington WA
Charleston West Virginia WV
Madison Wisconsin WI
Cheyenne Wyoming WY
Then you would use =VLOOKUP(A1,A1:C50,3, FALSE) to look for A1 (Montgomery) in the table and it would output AL for example.

Dynamically fusion rows cells with same values in Excel

In a datasheet with automatic filters, I have this (values and columns names are for example) :
Continent Country City Street
----------------------------------------------------------
Asia Vietnam Hanoi egdsqgdfgdsfg
Asia Vietnam Hanoi fhfdghdfdh
Asia Vietnam Hanoi dfhdfhfdhfdhfdhfdh
Asia Vietnam Saigon ggdsfgfdsdgsdfgdf
Asia Vietnam Hue qsdfqsfqsdf
Asia China Beijing qegfqsddfgdf
Asia China Canton sdgsdfgsdgsdg
Asia China Canton tjgjfgj
Asia China Canton tzeryrty
Asia Japan Tokyo ertsegsgsdfdg
Asia Japan Kyoto qegdgdfgdfgdf
Asia Japan Sapporo gsdgfdgsgsdfgf
Europa France Paris qfqsdfdsqfgsdfgsg
Europa France Toulon qgrhrgqzfqzetzeqrr
Europa France Lyon pàjhçuhàçuh
Europa Italy Rome qrgfqegfgdfg
Europa Italy Rome qergqegsdfgsdfgdsg
I would like this to be displayed like this, with rows fusionned dynamically if filters changes
Continent Country City Street
----------------------------------------------------------
egdsqgdfgdsfg
Hanoi fhfdghdfdh
Vietnam dfhdfhfdhfdhfdhfdh
Saigon ggdsfgfdsdgsdfgdf
Hue qsdfqsfqsdf
---
Asia Beijing qegfqsddfgdf
China sdgsdfgsdgsdg
Canton tjgjfgj
tzeryrty
---
Tokyo ertsegsgsdfdg
Japan Kyoto qegdgdfgdfgdf
Sapporo gsdgfdgsgsdfgf
---
Paris qfqsdfdsqfgsdfgsg
France Toulon qgrhrgqzfqzetzeqrr
Europa Lyon pàjhçuhàçuh
Italy Rome qrgfqegfgdfg
qergqegsdfgsdfgdsg
Is macro mandatory for this ?
I don't want to merge values in Street column. I want to keep all lines. I just want to work on the first column display to avoid having long series of same values.
You can also setup a PivotTable - this would look like this:
Just go to "insert->pivottable" and select your given data as input and create the pivottable as new worksheet ;)
Put all field in the "rows" section, remove any subsum or sum calculations.
Because you don't have any values to sum up, you should just hide those columns, to get a clear view.
If you want to use a Function.
You can do it like this:
=IF(MATCH(Tabelle1!A1;(Tabelle1!A:A);0)=ROW();Tabelle1!A1;"")
Insert this Formula in a other Sheet.

Resources