I am new to python.
After researching some code based on my idea which is extracting historical stock data,
I have now working code(see below) when extracting individual name and exporting it to a csv file
import investpy
import sys
sys.stdout = open("extracted.csv", "w")
df = investpy.get_stock_historical_data(stock='JFC',
country='philippines',
from_date='25/11/2020',
to_date='18/12/2020')
print(df)
sys.stdout.close()
Now,
I'm trying to make it more advance.
I want to run this code multiple times automatically with different stock name(about 300 plus name) and export it respectively.
I know it is possible but I cannot search the exact terminology to this problem.
Hoping for your help.
Regards,
you can store the stock's name as a list and then iterate through the list and save all the dataframes into separate files.
import investpy
import sys
stocks_list = ['JFC','AAPL',....] # your stock lists
for stock in stocks_list:
df = investpy.get_stock_historical_data(stock=stock,
country='philippines',
from_date='25/11/2020',
to_date='18/12/2020')
print(df)
file_name = 'extracted_'+stock+'.csv'
df.to_csv(file_name,index=False)
Related
i have a question about efficeincy run time of reading and concatenating files into a single DF.
So i have about 15 files, and i want to each one of them read, filter and concatenate the file to the others.
right now the average size of a file is 8,000KB, and it takes about 8 minutes to run the code.
so basically i want to ask if there is a faster run time.
Thanks in advance!
plus the code is from another pc, so i copied it manually.
import pandas as pd
import os
Path = ~mypath~
Fields = pd.read_excel("Fields.xlsx")
Variable = Fields["Variable"].values.tolist()
Segments = Fields["Segment"].values.tolist()
Component= Fields["Component"].values.tolist()
li = []
for file in Path:
df= None
if file.endwith(".xlsx"):
df = pd.read_excel(file)
li.append(df[(df["Variable"].isin(Variable)) &
(df["Segment"].isin(Segments)) &
(df["Component"].isin(Components))])
Frame = pd.concat(li, axis=0, ignore_index=True)
EDIT:
Since i run the code on VDI the preformance is low.
tried to run it on a local pc, the execution time was a quater of the VDI.
Tried to search for a method
I've got a main folder and then I've got folders inside of that for different countries with Excel files.
I was wondering if someone knew how I can read all these Excel files and use the subfolder/country name as a column value.
Then I am planning to concatenate all these files as they are all the same structure.
Thanks
You can try something like that:
import pandas as pd
import pathlib
main_folder = './data'
data = []
for xlsxfile in pathlib.Path(main_folder).glob('**/*.xlsx'):
df = pd.read_excel(xlsxfile)
df['dirpath'] = xlsxfile.parent
data.append(df)
df = pd.concat(data)
I want to check a YouTube video's views and keep track of them over time. I wrote a script that works great:
import requests
import re
import pandas as pd
from datetime import datetime
import time
def check_views(link):
todays_date = datetime.now().strftime('%d-%m')
now_time = datetime.now().strftime('%H:%M')
#get the site
r = requests.get(link)
text = r.text
tag = re.compile('\d+ views')
views = re.findall(tag,text)[0]
#get the digit number of views. It's returned in a list so I need to get that item out
cleaned_views=re.findall('\d+',views)[0]
print(cleaned_views)
#append to the df
df.loc[len(df)] = [todays_date, now_time, int(cleaned_views)]
#df = df.append([todays_date, now_time, int(cleaned_views)],axis=0)
df.to_csv('views.csv')
return df
df = pd.DataFrame(columns=['Date','Time','Views'])
while True:
df = check_views('https://www.youtube.com/watch?v=gPHgRp70H8o&t=3s')
time.sleep(1800)
But now I want to use this function for multiple links. I want a different CSV file for each link. So I made a dictionary:
link_dict = {'link1':'https://www.youtube.com/watch?v=gPHgRp70H8o&t=3s',
'link2':'https://www.youtube.com/watch?v=ZPrAKuOBWzw'}
#this makes it easy for each csv file to be named for the corresponding link
The loop then becomes:
for key, value in link_dict.items():
df = check_views(value)
That seems to work passing the value of the dict (link) into the function. Inside the function, I just made sure to load the correct csv file at the beginning:
#Existing csv files
df=pd.read_csv(k+'.csv')
But then I'm getting an error when I go to append a new row to the df (“cannot set a row with mismatched columns”). I don't get that since it works just fine as the code written above. This is the part giving me an error:
df.loc[len(df)] = [todays_date, now_time, int(cleaned_views)]
What am I missing here? It seems like a super messy way using this dictionary method (I only have 2 links I want to check but rather than just duplicate a function I wanted to experiment more). Any tips? Thanks!
Figured it out! The problem was that I was saving the df as a csv and then trying to read back that csv later. When I saved the csv, I didn't use index=False with df.to_csv() so there was an extra column! When I was just testing with the dictionary, I was just reusing the df and even though I was saving it to a csv, the script kept using the df to do the actual adding of rows.
I am trying to convert .pdf data to a spreadsheet. Based on some research, some guys recommended transforming it into csv first in order to avoid errors.
So, I made the below coding which is giving me:
"TypeError: cannot concatenate object of type ''; only Series and DataFrame objs are valid"
Error appears at 'pd.concat' command.
'''
import tabula
import pandas as pd
import glob
path = r'C:\Users\REC.AC'
all_files = glob.glob(path + "/*.pdf")
print (all_files)
df = pd.concat(tabula.read_pdf(f1) for f1 in all_files)
df.to_csv("output.csv", index = False)
'''
Since this might be a common issue, I am posting the solution I found.
"""
df = []
for f1 in all_files:
df = pd.concat(tabula.read_pdf(f1))
"""
I believe that breaking the item iteration in two parts would generate the dataframe it needed and therefore would work.
I have one *.csv file which has 250000 row and 16 column. I would like to copy two specific columns of this file to a new *.csv file with python. All the suggested codes have done this by writing a for loop. But as the data is big it is very slow with for loop. Can anyone help me how this is possible without loop?
The csv headers look likes this: Image_ID, Image_Class, Age, ...
import pandas as pd
import csv
headers = 'ImageID, Image_class, Age, ...'
data = pd.read_csv(file, names=headers)
image_id_column = data.ImageID.tolist()
image_class_column = data.Image_class.tolist()
You can do something like this.
test = data.loc[:,['ImageID','Image_class']]
test.to_csv('test.csv')