converting comma separated data of excel file into new line using python, pandas - excel

I have an excel file with data as
StudentId Details
1234 John, Texas, United States
9887 Roma, Moscow, Russia
I want to convert it into the following format, such that:
StudentId Details
1234 John
Texas
United States
9887 Roma
Moscow
Russia
I am using Pandas for this purpose but not getting the results
for i in range(len(df['Details'])):
df['Details'][i]=df['Details'][i].replace(',','\n')
I am using somewhat this kind of logic

Read it in with read_fwf() then split up that column later:
df['Details'].str.split(',')

Related

Excel, how to count multiple records of words

So I have a table sorta looking like this
f1 f2
EF German
EF German
EF America
RF Britain
RF Britain
DF German
DF America
DF Britain
ok so how do I create another field, which combines the other fields and counts everything on field 2?
so it would end up like this
f1 f2 calculated field
EF German 2
EF America 1
RF Britain 2
DF German 1
DF America 1
DF Britain 1
I am very confused on how to do this on Microsoft Excel, if you would help me thank you.
Based on the example you could use:
Formula:
=COUNTIFS($A$2:$A$9,A2,$B$2:$B$9,B2)
Image:
After you get the values you could go and remove duplicates:
Data tab
Data Tools
Remove Duplicates

How to merge two rows having same values into single row in python?

I am having a table called 'data' in that the values will be like following,
ID NAME DOB LOCATION
1 bob 08/10/1985 NEW JERSEY
1 bob 15/09/1987 NEW YORK
2 John 08/10/1985 NORTH CAROLINA
2 John 26/11/1990 OKLAHOMA
For example
I want output like,
ID NAME No.of.Days
1 bob difference of two given dates in days
2 John difference of two given dates in days
Please help me to form a python code to get the expected output.
If there will be only two dates in a for a given ID then below works!
df.groupby(['ID','NAME'])['DOB'].apply(lambda x: abs(pd.to_datetime(list(x)[0]) - pd.to_datetime(list(x)[1]))).reset_index(name='No.Of.Days')
Output
ID NAME No.Of.Days
0 1 bob 766 days
1 2 John 1934 days
you can use np.diff also
df.groupby(['ID','NAME'])['DOB'].apply(lambda x: np.diff(list(x))[0]).reset_index(name='No.Of.Days')
First, You need to convert Date column into date format. Lets suppose you are reading from .csv then read your .csv file as follows
df = pd.read_csv('yourfile.csv', parse_dates = ['DOB'])
otherwise, convert your existing dataframe column into date format as follows.
df['DOB'] = pd.to_datetime(df['DOB'])
now, you can perform the usual numeric operations.
df.groupby(['ID','NAME'])['DOB'].apply(lambda x: abs(pd.to_datetime(list(x)[0]) - pd.to_datetime(list(x)[1]))).reset_index(name='No.Of.Days')

Convert address into country

I have a panda table which has many countries as location based. The table is shown link this
Real Table |After conversion
Edinburgh, Scotland |UK
Nairobi, Kenya| Kenya
Manchester| UK
uk |UK
Sirajganj |Bangladesh
How to do that in python?
or coordinated to the country?
+05.0738+047.3288 to Somalia
+60.45148+022.26869 to Finland
+51.50853-000.12574 to United Kingdom
+33.24428-086.81638 to USA
+47.55839+007.57327 to Switzerland
This program is solved using Yandex python.
import geocoder
g = geocoder.yandex([55.95, 37.96], method='reverse')
print(g.json)
g.json['country']

Python: Read data file and work on process/filter the data

I have recently started learning Python and trying to grab the concepts and meanwhile got a sample data file for 30K rows as below , separated by spaces.
P160543 East Asia and Pacific IN C
P166720 Africa IN N
P165276 East Asia and Pacific AD n IIST
P159835 Latin America and Caribbean LA B
P160778 Latin America and Caribbean LA B
P164290 South Asia AS N
P165493 South Asia SA N
P165585 Latin America and Caribbean LAC N
P157987 South Asia SA C ALAESH
P158364 South Asia SAS B EPATET
Need to skip the rows where a row contains 'N' or 'n' in column 4:
Now read each line and save values of column in variables
Specify a search if Typest = 'IN' then return values as RegionName
='East Asia and Pacific' and 'Africa' and id = P160543,P166720
if colum 3= 'AD' then return values from column 2 = 'East Asia and Pacific' and id = P165276
if colum 3= 'LAC' then return values Latin America and Caribbean
I don't have Numpy and others libraries to use ... want to get this done with with file concepts.
I know to read the files and display the contents of file, removing blank lines and skipping the comments line, but stuck on said problem.
Please advice.
Create a generator to loop through the lines of your file
grab the header from the first line of the file
def read_file(fullname):
with open(fullname) as f:
for line in f:
yield header_line, line
myFile = read_file(r"Path/To/Your/File")
header_line = "id RegionName TypeSt TypePD TypeCode"
for line in myFile:
data = dict(zip(header.split("\t"), line.split("\t")))
# Here's a dictionary of the data for the current row
# You can access the elements of the row by name as follows in the filter example:
if data["TypePD"].lower() == "N":
continue
.....
That ought to be enough to get you started since this smells like a homework assignment.
Be wary of anyone recommending pandas - I work in a big data environment, pandas doesn't work w/ multi-gig files/millions of records generators will.

Excel: search two columns against another two columns to return a value fron a third

Got a bit of a pain. So got an interesting issue, basically. Have a long list of entities (200 plus) and I need to match them against a code which I have in another list. So from the entity list, I have the name and country of the entity (Name in column A, country in column D), I need to populate Column F with the code from the other list, or add unknown if a code cant be found.
So, tried to build the query by using the & operator
So =MATCH(A2&D2 to use as key, giving me a value like 'cool companyUNITED KINGDOM'.
In the second list (imported to sheet 2) contains the following columns
Code Name Country
So I want to search an array where Name and country have been combined:
=MATCH(A2&D2,Sheet2!B2:B99999&Sheet2!C2:C99999,0)
I then try to get the index back, so my complete list looks like
=INDEX(Sheet2!A2:C99999, MATCH(Sheet2!A2&Sheet2!D2,Sheet2!B2:B99999&Sheet2!C2:C99999,0))
And all I get back is #Value
Any suggestions
Edit: More infor
So sheet one looks like this (Its column C I need to populate from the code in column A, sheet two)
Entity name Status GIIN Country
Ben Dist Ltd NFFE N/a UNITED KINGDOM
Karamara Sdn Bhd PFFE N/a MALAYSIA
Farbion Trade (Curacao) N.V. LFFI N/a
Tentorim (International) B.V. LFFI N/a NETHERLANDS
Catamo B.V. TLTD N/a NETHERLANDS
Ben Dist Deutschland GmbH FLTD N/a GERMANY
Ben Dist Investments B.V. PFFE N/a NETHERLANDS
Ben Dist Limited TLTD N/a UNITED KINGDOM
Complete Solution Service Limited GLRS N/a UNITED KINGDOM
BDLT S.A. de C.V. TLTD N/a MEXICO
Telsa Telco Services SLTD N/a CHILE
And the second list will look like this
GIIN FINm CountryNm
AAAUG3.99999.SL.764 Asset Plus HSI Fund THAILAND
AABEIL.99999.SL.528 Gresham Capital CLO II B.V. NETHERLANDS
AAB36F.99999.SL.470 Maitland Malta Limited MALTA
AACRQK.99999.SL.756 BBGI GROUP SA SWITZERLAND
AADAD7.99999.SL.528 E-MAC DE 2009-I B.V. NETHERLANDS
AADDBX.99999.SL.060 GWD Limited BERMUDA
AAE9W5.99999.SL.764 Bualuang Money Market RMF THAILAND
AAGH8E.99999.SL.276 Sparda-Bank Baden-Wuerttemberg eG GERMANY
AAGR6U.99999.SL.438 Konsolidationsanstalt LIECHTENSTEIN
AAGWV3.99999.SL.360 BATAVIA PROTEKSI PRIMA 18 INDONESIA
AAGXH0.99999.SL.136 Monarch Capital Partners Ltd CAYMAN ISLANDS
AAHY1V.99999.SL.158 Pingtung County Farmers' Association TAIWAN
AAH0IZ.99999.SL.136 Diversified Absolute Return Fund CAYMAN ISLANDS
I suggest that you use following array formula:
= IFERROR(INDEX(List,SMALL(IF((INDEX(List,,2,1)=A2)*(INDEX(List,,3,1)=D2),ROW(List)-MIN(ROW(List))+1,""),1),1,1),"N/A")
To enter array formula in Windows use Ctrl+Alt+Enter.
On Mac keyboard use Command+Enter.
Then drag the formula downwards.
In this formula I have used named range List, which is equivalent to your Sheet2!$A$2:$C$99999. Named ranges make complicated formulas more readable and flexible.
If you do not want to use named ranges just replace List with Sheet2!$A$2:$C$99999.
=IFERROR(INDEX(Sheet2!$A$2:$C$99999,SMALL(IF((INDEX(Sheet2!$A$2:$C$99999,,2,1)=A2)*(INDEX(Sheet2!$A$2:$C$99999,,3,1)=D2),ROW(Sheet2!$A$2:$C$99999)-MIN(ROW(Sheet2!$A$2:$C$99999))+1,""),1),1,1),"N/A")
It works if your sheets look as follows:

Resources