Passing data from a for loop to a dataframe - python-3.x

I have just started out learning few things in python, I am stuck in between.
import yfinance as yf
import pandas as pd
import yahoo_fin.stock_info as si
ticker = ['20MICRONS.NS', '21STCENMGM.NS', '3IINFOTECH.NS', '3MINDIA.NS', '3PLAND.NS']
for i in ticker:
try:
quote = si.get_quote_table(i)
df = pd.DataFrame.from_dict(quote.items())
df = df.append(quote.items(), ignore_index=True)
except (ValueError, IndexError,TypeError):
continue
print(df)
Just for example: The value of i has more than 4 entries, whenever I am exiting the loop this data has to be added or should be appended in the dataframe.
But for some reason the dataframe is not appending these values.
Thanks in advance

You defined df within the loop, which means that a new dataframe is initialised in df at each iteration. Define a new dataframe in df before the loop and append to the df in the loop. Please add the information that you provided in the comments to the question.

Related

Is there a way to list the rows and columns in a pandas DataFrame that are empty strings?

I have a 1650x40 dataframe that is a matrix of people who worked on projects each day. It looks like this:
import pandas as pd
df = pd.DataFrame([['bob','11/1/19','X','','',''], ['pete','11/1/19','X','','',''],
['wendy','11/1/19','','','X',''], ['sam','11/1/19','','','',''],
['cara','11/1/19','','','X','']],
columns=['person', 'date', 'project1','project2','project3','project4'])
I am trying to sanity check the data by:
listing any columns that do not have an X in them (in this case
'project2' and 'project4')
listing any rows that do not have an X in them (in this case
'sam')
Desired outcome:
Something like df.show_empty(columns) returns ['project2','project4'] and df.show_empty(rows) returns ['sam']
Obviously the this method would need some way to tell it that the first two columns are not expected to be empty and they should be ignored.
My desired outcome above would return lists of column headings (or row indexes) so that I could go back and check my data and application to find out why there's no entry in the relevant cell (I am guessing there's a good chance that more than one row or column are affected). This seems like it should be trivial but I'm really stuck with figuring this out.
Thanks for any help offered!
For me, it is easier to use apply to accomplish this task. The working code is shown below:
import pandas as pd
df = pd.DataFrame([['bob','11/1/19','X','','',''], ['pete','11/1/19','X','','',''],
['wendy','11/1/19','','','X',''], ['sam','11/1/19','','','',''],
['cara','11/1/19','','','X','']],
columns=['person', 'date', 'project1','project2','project3','project4'])
import numpy as np
df = df.replace('', np.NaN)
colmns = df.apply(lambda x: x.count()==0, axis=0)
df[colmns.index[colmns]]
df[df.apply(lambda x: x[2:].count()==0, axis=1)]
df = df.replace('', np.NaN) will replace the '' with NaN, so that we can use count() function.
colmns = df.apply(lambda x: x.count()==0, axis=0): this will find the columns that are all NaN.
df[df.apply(lambda x: x[2:].count()==0, axis=1)]: this will ignore the first two columns.

log transformation of whole dataframe using numpy

I have a dataframe in python which is made using the following code:
import pandas as pd
df = pd.read_csv('myfile.txt', sep="\t")
df1 = df.iloc[:, 3:]
now in df1 there are 24 columns. I would like to transform the values to log2 value and make a new dataframe in which there are 24 columns with log value of original dataframe. to do so I used numpy.log like the following line:
df2 = (numpy.log(df1))
this code does not return what I would like to get. do you know how to fix it?

Convert panda column to a string

I am trying to run the below script to add to columns to the left of a file; however it keeps giving me
valueError: header must be integer or list of integers
Below is my code:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv",header='true')
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=pd.to_datetime('today'))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)
Any ways to make the creation_dt column a string?
According to pandas docs header is row number(s) to use as the column names, and the start of the data and must be int or list of int. So you have to pass header=0 to read_csv method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Also, pandas automatically creates dataframe from read file, you don't need to do it additionally. Use just
df = pd.read_csv("/home/ex.csv", header=0)
You can try:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv")
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=str(pd.to_datetime('today')))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)

Trying to name multiple indexes getting Key error

Good morning,
I'm using python 3.6. I'm trying to name my index (see last line in code below) because I plan on joining to another DataFrame. The DataFrame should be multi-indexed. The index is the first two columns ('currency' and 'rtdate') and the data
rate
AUD 2010-01-01 0.897274
2010-02-01 0.896608
2010-03-01 0.895943
2010-04-01 0.895277
2010-05-01 0.894612
This is the code that I'm running:
import pandas as pd
import numpy as np
import datetime as dt
df=pd.read_csv('file.csv',index_col=0)
df.index = pd.to_datetime(df.index)
new_index = pd.date_range(df.index.min(),df.index.max(),freq='MS')
df=df.reindex(new_index)
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)
Running this throw's an error message:
KeyError: 'currency'
What am I missing.
Thanks for the assistance
I think you need to set the names of the levels of MultiIndex by using rename_axis first and then reset_index for columns from MultiIndex:
So you'd end up with this:
rate = df.interpolate().unstack().set_axis(('currency','rtdate')).reset_index()
instead of this:
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)

Pandas puts all data from a row in one column

I have another problem with csv. I am using pandas to remove duplicates from a csv file. After doing so I noticed that all data has been put in one column (preprocessed data has been contained in 9 columns). How to avoid that?
Here is the data sample:
39,43,197,311,112,88,47,36,Label_1
Here is the function:
import pandas as pd
def clear_duplicates():
df = pd.read_csv("own_test.csv", sep="\n")
df.drop_duplicates(subset=None, inplace=True)
df.to_csv("own_test.csv", index=False)
Remove sep, because default separator is , in read_csv:
def clear_duplicates():
df = pd.read_csv("own_test.csv")
df.drop_duplicates(inplace=True)
df.to_csv("own_test.csv", index=False)
Maybe not so nice, but works too:
pd.read_csv("own_test.csv").drop_duplicates().to_csv("own_test.csv", index=False)

Resources