Tried to convert datafram to series in jupyter - python-3.x

I data framed the following CSV:
I ran the loop for it:
The result:
When trying to print s out of loop it is only showing the volume column and not the others:

This is the expected behavior, as you are assigning new Series to s instead of appending it. At the end of the for loop, s will only be the Series containing the volume column.
You can take a look at this page to learn more about appending Series to an existing series.
To be brief you should replace
s = pd.Series(df2[column])
you would do the following
s = s.append(pd.Series(df2[column])),
although i'm not sure why you would like to do that! If you go through the documentation you can see that you can reindex while appending by running the following code instead
s = s.append(pd.Series(df2[column]), ignore_index=True)

Related

Pandas: Add missing column name

I have a csv file which is missing the first two column names, therefore looking like that:
,,MedKD,AvgKD,Q,AKD,PKD(.2),GPKD(.2),FS
Genomics,First Query,0.007704,0.008301,0.005379,0.002975,0.000551,0.000547,0.000100
Genomics,PayOff,44,51,62,17,82,3,-
Genomics,Convergence,-,-,*,*,4062,*,-
Genomics,Robustness,-,-,8E-07,3E-07,5E-08,1E-08,-
Genomics,Time,0.968427,1.007088,1.445911,0.313453,1.088019,1.142605,2.277355
Power,First Query,0.000084,0.000049,0.000034,0.000035,0.000016,0.000016,0.000014
Power,PayOff,4,2,2,1,0,0,-
Power,Convergence,-,-,7,12,16,1993,-
Power,Robustness,-,-,9E-11,7E-11,2E-11,2E-11,-
Power,Time,0.017216,0.023430,0.019345,0.022128,0.017878,0.019799,0.044971
When reading into a pandas dataframe, it therefore looks like that:
I have tried all sorts of ways to give the first two a name, but every time, I just overwrote the other ones, but did not touch the first ones.
Example:
m.columns = ['Dataset', 'Measure'] + m.columns[2:].tolist()
This only results in moving all the other ones to the right, starting with MedKD
How is it possible to insert those by using pandas?

Pandas DataFrame indexing problem from 40,000 to 49,999

I have a strange problem with my code (At least it is strange for me!).
I have a Pandas DataFrame called "A". One of the column names is "asin". I want to execute all specific rows including my data. So I write this simple code:
df2 = A[A['asin']=='B0000UYZG0']
And it works normally as expected, except for data from 40,000 to 499,999!!
It doesn't work on these data series at all!
Refer to the picture, df2 = A[A['asin']=='0077614992'] (related to 50,000) works but df2 = A[A['asin']=='B0006O0WW6'] (related to 49,999) does not work!
I do not have tried all 10,000 data! But randomly I test them and have no answer.
I have grow accustomed to fixing bugs such as this one, usually when that happen is because of an alternative dtype or maybe because the string you see displayed to you isn't actually THE string itself. It seen your issue is mostly on the second part.
So lets first clear your "string" column from any white spaces.
df2['asin'] = df2.asin.str.strip()
# I am going with the idea that that is your non functional df object
After that try rerunning your filter
df2[df2['asin'].eq('0077614992')]

How to reformat the output from iexfinance stock.get_financials(). The current output is a 3D nested dictionary, not a dataframe

As per the documentation (https://addisonlynch.github.io/iexfinance/stable/) the default output for the endpoint get requests are dataframes.
However the following sample code returns a nested dictionary (2x1x70)
from iexfinance.stocks import Stock
stocks=Stock(['NEM','FCX'],token='my_live_token',output_format='pandas')
fins_data=stocks.get_financials(period='annual')
print(fins_data)
Is this a standard nested dictionary?
The target output should be a dataframe of two rows indexed on the first keys (which are stock tickers, in this case 'NEM' and 'FCX'). The other text from the dictionary output are the column headings (it appears there is no heading for the ticker as this is the index/key).
I would expect the same format to that which you get when you run the following:
from iexfinance.stocks import Stock
stocks=Stock(['NEM','FCX'],token='my_live_token')
co_info=stocks.get_company()
print(co_info)
Any ideas how to convert the output from get_financials() to a usable dataframe format??
I added the following:
fins_dict=stocks.get_financials(period='annual')
fins_data = pd.concat(fins_dict, axis=0, keys=fins_dict.keys())
fins_data.index = fins_data.index.droplevel(1)
print(type(fins_data))
This seems to strip away the outer dictionary and leave the inner dataframe and removes the date index. IEX Cloud have moved there 'financials' data into a times-series library which appears to want to add a time index to everything. Not sure if this is the correct solution as it appears there is still some inconsistency within the data-structures. Any insights appreciated.

Slice copy error while using two data frame and updating one of the column into other datafarame pandas

I'm trying to compare 2 df and fill the values of one data frame into another by creating a column
have used the following code
df['location']=df1['location']
for i in range(0,len(df)):
for j in range(0,len(df1)):
if df['Name'][i]==df1['Name'][j]:
df['location'][i] =(df1['location'][j])
df are listed below
I am getting the following error.
<ipython-input-14-7b3141ebb9f0>:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas- docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['location'][i] =(df1['location'][j])
i am able to get the desired output irrespective of the error warning..!
result :
If i use the following command i can bypass this error/warning
pd.options.mode.chained_assignment = None
Need to know is there a way to avoid warning without using the above-said command. Need your help Thanks in advance.
Data
df=pd.DataFrame({'Name':['A','B','C'], 'location':['South','north','east']})
df1=pd.DataFrame({'Name':['A','B','C','A','B','C'], 'count':[1,2,3,4,5,6]})
df
dict for reference
d=dict(zip(df.Name,df.location))
Map to Transfer
df1['location']=df1.Name.map(f)
Output

I'm getting a Key Error Message after using a For loop and then trying to access the columns by name in a Python Series

I hope that someone can help me here. I'm pretty new to Python and I got stuck with a For Loop to create a couple of time shifts for my datetime Series. Once I iterated over the shifts and want access the columns by name to calculate the percentage change, I get a Key Error.
Here is what my code looks like:
i=1
x=50
for i in range (x):
df_data_1['visits_lag_',i] = df_data_1['visits'].shift(i)
The output looks the following:
df.dtypes
Now, If I want to calculate or access one of the newly created columns, I receive a Key Error Message:
df_data_1['percent_change_test'] =
(df_data_1['visits']/df_data_1['(visits_lag_, 1)'])*100
It says:
Please, can anyone help me here, what I'm doing all wrong.
I think the problem is related to how you call the newly created column.
Instead of:
df_data_1["(visits_lag_, 1)"]
Try to do:
df_data_1[("visits_lag_", 1)]

Resources