create pandas column using cell values of another multi indexed data frame [closed] - python-3.x

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a multi indexed data frame as the attached image. Now I need to create a new column on a different data frame where each column name will be the unique room numbers.
For example, one expected output from the code will be as following:
N.B. I want to avoid for loops to save memory space and time. What would be the optimal way to get desired output ?
I have tried using for loops and could get desired output but I am not sure if it
s a good idea for a large dataset. Here is the code snippet :
import numpy as np
import pandas as pd
d = np.array(['624: COUPLE , 507: DELUXE+ ,301: HONEYMOON','624:FAMILY ,
507: FAMILY+','621:FAMILY , 517: FAMILY+','696:FAMILY , 585:
FAMILY+,624:FAMILY , 507: DELUXE'])
df = pd.Series(d)
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')
gh = df[df['room'] == '507'].index
rf = pd.DataFrame(index=range(0,4),columns=['room#507','room#624'],
dtype='float')
for i in range(0,rf.shape[0]):
for j in range(0,gh.shape[0]):
if (i == gh[j][0]):
rf['room#507'][i] = df.grd[gh[j][0]][gh[j][1]]

Use DataFrame.reset_index with DataFrame.pivot:
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')
df = df.reset_index(level=1, drop=True).reset_index()
df = df.pivot('index','room','grd').add_prefix('room_')
print (df)
room room_301 room_507 room_517 room_585 room_621 room_624 room_696
index
0 HONEYMOON DELUXE+ NaN NaN NaN COUPLE NaN
1 NaN FAMILY+ NaN NaN NaN FAMILY NaN
2 NaN NaN FAMILY+ NaN FAMILY NaN NaN
3 NaN DELUXE NaN FAMILY+ NaN FAMILY FAMILY
Or DataFrame.set_index with Series.unstack:
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')
df = (df.reset_index(level=1, drop=True)
.set_index('room', append=True)['grd']
.unstack()
.add_prefix('room_'))
print (df)
room room_301 room_507 room_517 room_585 room_621 room_624 room_696
0 HONEYMOON DELUXE+ NaN NaN NaN COUPLE NaN
1 NaN FAMILY+ NaN NaN NaN FAMILY NaN
2 NaN NaN FAMILY+ NaN FAMILY NaN NaN
3 NaN DELUXE NaN FAMILY+ NaN FAMILY FAMILY

Related

How do i remove nan values from dataframe in Python. dropna() does not seem to be working for me

How do i remove nan values from dataframe in Python? I already tried with dropna(), but that did not work for me. Also is NaN diffferent from nan. I am using Pandas.
While printing the data frame it does not print as NaN but instead as nan.
1 2.11358 0.649067060588935
2 nan 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 nan
You can change nan values with NaN using replace() and then use dropna().
import numpy as np
df = df.replace('nan', np.nan)
df = df.dropna()
Update:
Original dataframe:
1 2.11358 0.649067060588935
2 nan 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 nan
Applied df.replace('nan', np.nan):
1 2.11358 0.649067060588935
2 NaN 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 NaN
Applied df.dropna():
1 2.11358 0.649067060588935
3 2.10066 0.3653980276694516

Trying to append a single row of data to a pandas DataFrame, but instead adds rows for each field of input

I am trying to add a row of data to a pandas DataFrame, but it keeps adding a separate row for each piece of data. I feel I am missing something very simple and obvious, but what it is I do not know.
import pandas
colNames = ["ID", "Name", "Gender", "Height", "Weight"]
df1 = pandas.DataFrame(columns = colNames)
df1.set_index("ID", inplace=True, drop=False)
i = df1.shape[0]
person = [{"ID":i},{"Name":"Jack"},{"Gender":"Male"},{"Height":177},{"Weight":75}]
df1 = df1.append(pandas.DataFrame(person, columns=colNames))
print(df1)
Output:
ID Name Gender Height Weight
0 0.0 NaN NaN NaN NaN
1 NaN Jack NaN NaN NaN
2 NaN NaN Male NaN NaN
3 NaN NaN NaN 177.0 NaN
4 NaN NaN NaN NaN 75.0
You are using too many squiggly brackets. All of your data should be inside one pair of squiggly brackets. This creates a single python dictionary. Change that line to:
person = [{"ID":i,"Name":"Jack","Gender":"Male","Height":177,"Weight":75}]

How reindex_like function works with method "ffill" & "bfill"?

I have two dataframe of shape (6,3) & (2,3). Now I want to reindex second dataframe like first dataframe and also fill na values with either ffill method or bfill method. my code is as follows:
df1 = pd.DataFrame(np.random.randn(6,3),columns = ['Col1','Col2','Col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns = ['Col1','Col2','Col3'])
df2 = df2.reindex_like(df1,method='ffill')
But this code is not working well as I am getting following result:
Col1 Col2 Col3
0 0.578282 -0.199872 0.468505
1 1.086811 -0.707933 -0.924984
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
Any suggestion would be great

Pandas append returns DF with NaN values

I'm appending data from a list to pandas df. I keep getting NaN in my entries.
Based on what I've read I think I might have to mention the data type for each column in my code.
dumps = [];features_df = pd.DataFrame()
for i in range (int(len(ids)/50)):
dumps = sp.audio_features(ids[i*50:50*(i+1)])
for i in range (len(dumps)):
print(list(dumps[0].values()))
features_df = features_df.append(list(dumps[0].values()), ignore_index = True)
Expected results, something like-
[0.833, 0.539, 11, -7.399, 0, 0.178, 0.163, 2.1e-06, 0.101, 0.385, 99.947, 'audio_features', '6MWtB6iiXyIwun0YzU6DFP', 'spotify:track:6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/tracks/6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/audio-analysis/6MWtB6iiXyIwun0YzU6DFP', 149520, 4]
for one row.
Actual-
danceability energy ... duration_ms time_signature
0 NaN NaN ... NaN NaN
1 NaN NaN ... NaN NaN
2 NaN NaN ... NaN NaN
3 NaN NaN ... NaN NaN
4 NaN NaN ... NaN NaN
5 NaN NaN ... NaN NaN
For all rows
append() strategy in a tight loop isn't a great way to do this. Rather, you can construct an empty DataFrame and then use loc to specify an insertion point. The DataFrame index should be used.
For example:
import pandas as pd
df = pd.DataFrame(data=[], columns=['n'])
for i in range(100):
df.loc[i] = i
print(df)
time python3 append_df.py
n
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
real 0m13.178s
user 0m12.287s
sys 0m0.617s
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

Pandas 0.17.1 dataframe manipulation refering a column and writen to csv

Hi friends I am new to programming and I am having a difficult time accessing this dataframe that I read from a html
is just i want to print a specific column and because they are unnamed...I've tryed everything to access them and it throw's me some errors....this what I've tried so far to print them out:
print [data{'Unnamed: 0'}]
[ Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 \
0 Pinnacle NaN NaN NaN NaN
1 10/25 02:06:10pm NaN #101 Miami NaN
2 10/25 02:06:10pm NaN #102 New England NaN
3 10/25 04:40:03pm NaN #101 Miami NaN
4 10/25 04:40:04pm NaN #102 New England 8½-05
5 10/25 04:40:12pm NaN #101 Miami NaN
6 10/25 04:40:12pm NaN #102 New England 8½ev
Also I tried to write them down to a csv file and I get this error:
AttributeError: 'list' object has no attribute 'to_csv'
and here is the code I am using:
from pandas as pd
url = 'http://exoweb0.donbest.com/checkHTML/servlet/send_archive_history.servlet?by_casino=1&for_casino=37&league=1&game=0&date=20151029'
data = pd.read_html(url, header=0)
data.to_csv('Pinacle Lines.csv', index_col=0)
#print (data)
try this:
data = pd.read_html(url, header=0)[0]
You get a list of DataFrame's back when you call read_html, you need to figure out which one you want, the edit above will select the first one, you might want to look at all of them.

Resources