Using pytrends library with PowerBI - powerbi-desktop

I am trying to extract data from Google Trends by using the pytrends library to analyze it in MS PowerBI by using the following script:
import pandas as pd
from pytrends.request import TrendReq
pytrends = TrendReq()
data = pd.DataFrame()
kw_list = ["Bitcoin", "Ethereum"]
pytrends.build_payload(kw_list, timeframe='today 3-m')
data = pytrends.interest_over_time()
print(data)
When using the simple script in PowerBI, the date-column suddenly disappears. How can I include the date-column ?

import pandas as pd
from pytrends.request import TrendReq
pytrends = TrendReq()
data = pd.DataFrame()
kw_list = ["Bitcoin", "Ethereum"]
pytrends.build_payload(kw_list, timeframe='today 3-m')
data = pytrends.interest_over_time()
data.reset_index(inplace=True)
print(data)
Date column is index, you just need to add second last line
Hope this will work
Thanks!

Related

Yahoo Finance: unable to read URL

I have been using the following code for a while to extract stock price from yahoo finance. This code is now generating an error saying it cannot read the url.
import pandas_datareader.data as web
stock = web.DataReader(i_allStock+'.L', 'yahoo', start, end)
Has anyone had this problem and found a solution?
Try it like this.
from math import sqrt
from sklearn.cluster import MiniBatchKMeans
import pandas_datareader as dr
from matplotlib import pyplot as plt
import pandas as pd
import matplotlib.cm as cm
import seaborn as sn
start = '2019-1-1'
end = '2020-1-1'
tickers = ['AXP','AAPL','BA','CAT','CSCO','CVX','XOM','GS','HD','IBM','INTC','JNJ','KO','JPM','MCD', 'MMM', 'MRK', 'MSFT', 'NKE','PFE','PG','TRV','UNH','RTX','VZ','V','WBA','WMT','DIS','DOW']
prices_list = []
for ticker in tickers:
try:
prices = dr.DataReader(ticker,'yahoo',start)['Adj Close']
prices = pd.DataFrame(prices)
prices.columns = [ticker]
prices_list.append(prices)
except:
pass
prices_df = pd.concat(prices_list,axis=1)
prices_df.sort_index(inplace=True)
prices_df.head()
You can put the whole bunch of items in a single list. Yahoo Finance will retrive all those at once
import yfinance as yf
etf = ['AXP','AAPL','BA','CAT','CSCO','CVX','XOM','GS','HD','IBM','INTC','JNJ','KO']
tit = yf.download(tickers=etf, period='max')

Import dataset from url and convert text to csv in python3

I am pretty new to Python (using Python3) and read Pandas to import dataset.
I need to import dataset from url - https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt
and convert it to csv file, I am getting some special character in converted csv -> ��
I am download txt file and converting it to csv, is is the right approach?
and converted csv is putting entire text into one column
from urllib.request import urlretrieve
import pandas as pd
from pandas import DataFrame
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='/t', engine='python', lineterminator='\r\n')
csv_file = df.to_csv('index.csv', sep='\t', index=False, header=True)
print(csv_file)
after successful import, I have to Extract X as all columns except the first column and Y as first column also.
I'll appreciate your all help.
from urllib.request import urlretrieve
import pandas as pd
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='\t',encoding='utf-16')
Y = df[['REMISS']]
X = df.drop(['REMISS'],axis=1)

how to solve the keyerror when I load a CSV file using pandas

I use pandas to load a csv file and want to print out data of row, here is original data
orginal data
I want to print out 'violence' data for make a bar chart, but it occuar a keyerror, here is my code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
c_data=pd.read_csv('crime.csv')
print(c_data.head())
print (c_data['violence'])
and the error
error detail
error detail
I tried use capital VIOLENCE,print (c_data['VIOLENCE']),but also failed
error detail
error detail
can someone tell me how to work it out?
Try the following if your data is small:
with open('crime.csv', 'r') as my_file:
reader = csv.reader(my_file)
rows = list(reader)
print rows[3]
If your data is big, try this:
from itertools import islice
with open('crime.csv', 'r') as my_file:
reader = csv.reader(my_file)
print next(islice(reader, 3, 4))

LabVIEW TDMS file read with python pandas

How can I read a standard labVIEW generated TDMS file using python?
For the benefit of the community , posting sample code base i have used to efficiently read *.tdms file into pandas dataframe. After multiple trials simplified the code for ease of use and documentation.
#import required libraries
from nptdms import TdmsFile
import numpy as np
import pandas as pd
#bokeh plots
from bokeh.plotting import figure, output_file, show
from bokeh.io import output_notebook
#load the tdms file
tdms_file = TdmsFile("/Volumes/Data/dummy/sample.tdms")
#split all the tdms grouped channels to a separate dataframe
#tdms_file.as_dataframe()
for group in tdms_file.groups():
grp1_data = tdms_file.object('grp1').as_dataframe()
grp2_data = tdms_file.object('grp2').as_dataframe()
#plot the data on bokeh plots
# Use Bokeh chart to make plot
p = bokeh.charts.Line(grp1_data, x='time', y='values', color='parameter', xlabel='time (h)', ylabel='values')
# Display it
bokeh.io.show(p)
Suggestions and improvements are welcome.
For clarity, i would further simplify the answer by Sundar to:
from nptdms import TdmsFile
tdms_file = TdmsFile(r"path_to_.tdms")
for group in tdms_file.groups():
df = tdms_file.object(group).as_dataframe()
print(df.head())
print(df.keys())
print(df.shape)
That will read the different groups of the tdms into pandas dataframes.
This worked for me:
import pandas as pd
from nptdms import TdmsFile
tdms_file = TdmsFile("path/to/tdms_file.tdms")
df = tdms_file['group'].as_dataframe()
print(df.head())
print(df.keys())
print(df.shape)
The npTDMS version 1.1.0 at least didn't have any object method for TdmsFile objects that was used in the previous examples here.
Combination of answers given by Joris and ax7ster -- for npTMDS v1.3.1.
import nptdms
from nptdms import TdmsFile
print(nptdms.__version__)
fn = 'foo.tdms'
tdms_file = TdmsFile(fn)
for group in tdms_file.groups():
df = group.as_dataframe()
print(group.name)
print(df.head())
print(df.keys())
print(df.shape)
This reads all the groups in the TDMS file and doesn't require group names to be known beforehand.
It also possible to convert the whole TDMS file into one DataFrame, see example below.
from nptdms import TdmsFile
fn = 'foo.tdms'
tdms_file = TdmsFile(fn)
df = tdms_file.as_dataframe()

Using PyFolio alongside Pandas

I aim to do a time series analysis of financial data. Since I am working on Pakistan Stock Exchange (PSX), data is not available on yahoo. When I looked at some tutorials on Quantopian, the first step, data extraction is done through yahoo finance.
Now when I use PyFolio module and read in csv (Panda's Function) containing data, there is an issue with datetime format of Pandas and PyFolio. Below is the code of what I am doing.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pyfolio as pf
import datetime
from datetime import datetime
from datetime import timedelta
start_date = '2015-02-01'
end_date = '2017-03-20'
live_date = '2017-03-15'
symbols = ['KEL']
def converter(start_date):
convert=datetime.strptime(start_date, "%Y-%m-%d")
return convert
def data(symbols):
dates=pd.date_range(start_date,end_date)
df=pd.DataFrame(index=dates)
df_temp=pd.read_csv('/home/furqan/Desktop/python_data/{}.csv'.format(str(symbols[0])),usecols=['Date','Close'],
parse_dates=True,index_col='Date',na_values=['nan'])
df_temp = df_temp.rename(columns={'Close': symbols[0]})
df=df.join(df_temp)
df=df.fillna(method='ffill')
df=df.fillna(method='bfill')
return df
new_date = converter (live_date)
df= data(symbols)
sheet = pf.create_returns_tear_sheet(df, live_start_date=new_date)
The above code leads to following error
TypeError: Cannot compare tz-naive and tz-aware timestamps
Given the above information I have two questions.
1) Can Quantopian be any good for my analysis if I have data on my PC? Since the data is not available on yahoo finance.
2)What does the above error exactly means? How can I fix this error.
For reference below is the link to PyFolio and Pandas documentation.
https://quantopian.github.io/pyfolio/notebooks/single_stock_example/#fetch-the-daily-returns-for-a-stock
http://pandas.pydata.org/pandas-docs/stable/
I got around this problem by adding TZ information to my series. If you know the timezone of your datetime index you can apply the following method:
df.tz_localize('UTC')
I hope it helps.

Resources