create pandas column using cell values of another multi indexed data frame [closed]

create pandas column using cell values of another multi indexed data frame [closed] - python-3.x

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a multi indexed data frame as the attached image. Now I need to create a new column on a different data frame where each column name will be the unique room numbers.
For example, one expected output from the code will be as following:
N.B. I want to avoid for loops to save memory space and time. What would be the optimal way to get desired output ?
I have tried using for loops and could get desired output but I am not sure if it
s a good idea for a large dataset. Here is the code snippet :
import numpy as np
import pandas as pd
d = np.array(['624: COUPLE , 507: DELUXE+ ,301: HONEYMOON','624:FAMILY ,
507: FAMILY+','621:FAMILY , 517: FAMILY+','696:FAMILY , 585:
FAMILY+,624:FAMILY , 507: DELUXE'])
df = pd.Series(d)
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')
gh = df[df['room'] == '507'].index
rf = pd.DataFrame(index=range(0,4),columns=['room#507','room#624'],
dtype='float')
for i in range(0,rf.shape[0]):
for j in range(0,gh.shape[0]):
if (i == gh[j][0]):
rf['room#507'][i] = df.grd[gh[j][0]][gh[j][1]]

Use DataFrame.reset_index with DataFrame.pivot:
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')
df = df.reset_index(level=1, drop=True).reset_index()
df = df.pivot('index','room','grd').add_prefix('room_')
print (df)
room room_301 room_507 room_517 room_585 room_621 room_624 room_696
index
0 HONEYMOON DELUXE+ NaN NaN NaN COUPLE NaN
1 NaN FAMILY+ NaN NaN NaN FAMILY NaN
2 NaN NaN FAMILY+ NaN FAMILY NaN NaN
3 NaN DELUXE NaN FAMILY+ NaN FAMILY FAMILY
Or DataFrame.set_index with Series.unstack:
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')
df = (df.reset_index(level=1, drop=True)
.set_index('room', append=True)['grd']
.unstack()
.add_prefix('room_'))
print (df)
room room_301 room_507 room_517 room_585 room_621 room_624 room_696
0 HONEYMOON DELUXE+ NaN NaN NaN COUPLE NaN
1 NaN FAMILY+ NaN NaN NaN FAMILY NaN
2 NaN NaN FAMILY+ NaN FAMILY NaN NaN
3 NaN DELUXE NaN FAMILY+ NaN FAMILY FAMILY

Related

How do i remove nan values from dataframe in Python. dropna() does not seem to be working for me

How do i remove nan values from dataframe in Python? I already tried with dropna(), but that did not work for me. Also is NaN diffferent from nan. I am using Pandas.
While printing the data frame it does not print as NaN but instead as nan.
1 2.11358 0.649067060588935
2 nan 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 nan

You can change nan values with NaN using replace() and then use dropna().
import numpy as np
df = df.replace('nan', np.nan)
df = df.dropna()
Update:
Original dataframe:
1 2.11358 0.649067060588935
2 nan 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 nan
Applied df.replace('nan', np.nan):
1 2.11358 0.649067060588935
2 NaN 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 NaN
Applied df.dropna():
1 2.11358 0.649067060588935
3 2.10066 0.3653980276694516

Trying to append a single row of data to a pandas DataFrame, but instead adds rows for each field of input

I am trying to add a row of data to a pandas DataFrame, but it keeps adding a separate row for each piece of data. I feel I am missing something very simple and obvious, but what it is I do not know.
import pandas
colNames = ["ID", "Name", "Gender", "Height", "Weight"]
df1 = pandas.DataFrame(columns = colNames)
df1.set_index("ID", inplace=True, drop=False)
i = df1.shape[0]
person = [{"ID":i},{"Name":"Jack"},{"Gender":"Male"},{"Height":177},{"Weight":75}]
df1 = df1.append(pandas.DataFrame(person, columns=colNames))
print(df1)
Output:
ID Name Gender Height Weight
0 0.0 NaN NaN NaN NaN
1 NaN Jack NaN NaN NaN
2 NaN NaN Male NaN NaN
3 NaN NaN NaN 177.0 NaN
4 NaN NaN NaN NaN 75.0

You are using too many squiggly brackets. All of your data should be inside one pair of squiggly brackets. This creates a single python dictionary. Change that line to:
person = [{"ID":i,"Name":"Jack","Gender":"Male","Height":177,"Weight":75}]

How reindex_like function works with method "ffill" & "bfill"?

I have two dataframe of shape (6,3) & (2,3). Now I want to reindex second dataframe like first dataframe and also fill na values with either ffill method or bfill method. my code is as follows:
df1 = pd.DataFrame(np.random.randn(6,3),columns = ['Col1','Col2','Col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns = ['Col1','Col2','Col3'])
df2 = df2.reindex_like(df1,method='ffill')
But this code is not working well as I am getting following result:
Col1 Col2 Col3
0 0.578282 -0.199872 0.468505
1 1.086811 -0.707933 -0.924984
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
Any suggestion would be great

Pandas append returns DF with NaN values

I'm appending data from a list to pandas df. I keep getting NaN in my entries.
Based on what I've read I think I might have to mention the data type for each column in my code.
dumps = [];features_df = pd.DataFrame()
for i in range (int(len(ids)/50)):
dumps = sp.audio_features(ids[i*50:50*(i+1)])
for i in range (len(dumps)):
print(list(dumps[0].values()))
features_df = features_df.append(list(dumps[0].values()), ignore_index = True)
Expected results, something like-
[0.833, 0.539, 11, -7.399, 0, 0.178, 0.163, 2.1e-06, 0.101, 0.385, 99.947, 'audio_features', '6MWtB6iiXyIwun0YzU6DFP', 'spotify:track:6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/tracks/6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/audio-analysis/6MWtB6iiXyIwun0YzU6DFP', 149520, 4]
for one row.
Actual-
danceability energy ... duration_ms time_signature
0 NaN NaN ... NaN NaN
1 NaN NaN ... NaN NaN
2 NaN NaN ... NaN NaN
3 NaN NaN ... NaN NaN
4 NaN NaN ... NaN NaN
5 NaN NaN ... NaN NaN
For all rows

append() strategy in a tight loop isn't a great way to do this. Rather, you can construct an empty DataFrame and then use loc to specify an insertion point. The DataFrame index should be used.
For example:
import pandas as pd
df = pd.DataFrame(data=[], columns=['n'])
for i in range(100):
df.loc[i] = i
print(df)
time python3 append_df.py
n
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
real 0m13.178s
user 0m12.287s
sys 0m0.617s
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

Pandas 0.17.1 dataframe manipulation refering a column and writen to csv

Hi friends I am new to programming and I am having a difficult time accessing this dataframe that I read from a html
is just i want to print a specific column and because they are unnamed...I've tryed everything to access them and it throw's me some errors....this what I've tried so far to print them out:
print [data{'Unnamed: 0'}]
[ Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 \
0 Pinnacle NaN NaN NaN NaN
1 10/25 02:06:10pm NaN #101 Miami NaN
2 10/25 02:06:10pm NaN #102 New England NaN
3 10/25 04:40:03pm NaN #101 Miami NaN
4 10/25 04:40:04pm NaN #102 New England 8½-05
5 10/25 04:40:12pm NaN #101 Miami NaN
6 10/25 04:40:12pm NaN #102 New England 8½ev
Also I tried to write them down to a csv file and I get this error:
AttributeError: 'list' object has no attribute 'to_csv'
and here is the code I am using:
from pandas as pd
url = 'http://exoweb0.donbest.com/checkHTML/servlet/send_archive_history.servlet?by_casino=1&for_casino=37&league=1&game=0&date=20151029'
data = pd.read_html(url, header=0)
data.to_csv('Pinacle Lines.csv', index_col=0)
#print (data)

try this:
data = pd.read_html(url, header=0)[0]
You get a list of DataFrame's back when you call read_html, you need to figure out which one you want, the edit above will select the first one, you might want to look at all of them.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

create pandas column using cell values of another multi indexed data frame [closed] - python-3.x

Related

How do i remove nan values from dataframe in Python. dropna() does not seem to be working for me

Trying to append a single row of data to a pandas DataFrame, but instead adds rows for each field of input

How reindex_like function works with method "ffill" & "bfill"?

Pandas append returns DF with NaN values

Pandas 0.17.1 dataframe manipulation refering a column and writen to csv

Categories

Resources