I have this code but I need to import all the rows in a loop
I have an excel file with url links in one column, and I need to read those links to perform NLP on them. How can I use a loop to read those links? Here's my attempt so far:
import requests
link = 'https://www.sec.gov/Archives/edgar/data/3662/0000950170-98-000413.txt'
f = requests.get(link)
print(f.text)
When I have understood your question correctly you want to read an excel-file with some urls and perform an operation with every single url. In this case I would read the file with pandas. Assuming your file is named file.xlsx and has a column named url that contains all the urls, you could do the following
import pandas as pd
df = pd.read_excel ('file.xlsx')
for url in df['url']:
print(url) #or do some useful stuff with it
import pandas as pd
import requests as req
#assuming column A has url links
mydata = pd.read_excel('/your_file.xlsx',usecols="A",header=None)
for index, row in mydata.iterrows():
print(req.get(row))
Related
I want to check a YouTube video's views and keep track of them over time. I wrote a script that works great:
import requests
import re
import pandas as pd
from datetime import datetime
import time
def check_views(link):
todays_date = datetime.now().strftime('%d-%m')
now_time = datetime.now().strftime('%H:%M')
#get the site
r = requests.get(link)
text = r.text
tag = re.compile('\d+ views')
views = re.findall(tag,text)[0]
#get the digit number of views. It's returned in a list so I need to get that item out
cleaned_views=re.findall('\d+',views)[0]
print(cleaned_views)
#append to the df
df.loc[len(df)] = [todays_date, now_time, int(cleaned_views)]
#df = df.append([todays_date, now_time, int(cleaned_views)],axis=0)
df.to_csv('views.csv')
return df
df = pd.DataFrame(columns=['Date','Time','Views'])
while True:
df = check_views('https://www.youtube.com/watch?v=gPHgRp70H8o&t=3s')
time.sleep(1800)
But now I want to use this function for multiple links. I want a different CSV file for each link. So I made a dictionary:
link_dict = {'link1':'https://www.youtube.com/watch?v=gPHgRp70H8o&t=3s',
'link2':'https://www.youtube.com/watch?v=ZPrAKuOBWzw'}
#this makes it easy for each csv file to be named for the corresponding link
The loop then becomes:
for key, value in link_dict.items():
df = check_views(value)
That seems to work passing the value of the dict (link) into the function. Inside the function, I just made sure to load the correct csv file at the beginning:
#Existing csv files
df=pd.read_csv(k+'.csv')
But then I'm getting an error when I go to append a new row to the df (“cannot set a row with mismatched columns”). I don't get that since it works just fine as the code written above. This is the part giving me an error:
df.loc[len(df)] = [todays_date, now_time, int(cleaned_views)]
What am I missing here? It seems like a super messy way using this dictionary method (I only have 2 links I want to check but rather than just duplicate a function I wanted to experiment more). Any tips? Thanks!
Figured it out! The problem was that I was saving the df as a csv and then trying to read back that csv later. When I saved the csv, I didn't use index=False with df.to_csv() so there was an extra column! When I was just testing with the dictionary, I was just reusing the df and even though I was saving it to a csv, the script kept using the df to do the actual adding of rows.
I am new to python.
After researching some code based on my idea which is extracting historical stock data,
I have now working code(see below) when extracting individual name and exporting it to a csv file
import investpy
import sys
sys.stdout = open("extracted.csv", "w")
df = investpy.get_stock_historical_data(stock='JFC',
country='philippines',
from_date='25/11/2020',
to_date='18/12/2020')
print(df)
sys.stdout.close()
Now,
I'm trying to make it more advance.
I want to run this code multiple times automatically with different stock name(about 300 plus name) and export it respectively.
I know it is possible but I cannot search the exact terminology to this problem.
Hoping for your help.
Regards,
you can store the stock's name as a list and then iterate through the list and save all the dataframes into separate files.
import investpy
import sys
stocks_list = ['JFC','AAPL',....] # your stock lists
for stock in stocks_list:
df = investpy.get_stock_historical_data(stock=stock,
country='philippines',
from_date='25/11/2020',
to_date='18/12/2020')
print(df)
file_name = 'extracted_'+stock+'.csv'
df.to_csv(file_name,index=False)
I'm trying to put this information on an excel file, but can't seem to figure out how to use import csv for it. I looked at other posts as reference but I can't seem to apply it for what I'm doing. I'm sort of new to selenium. Thank you.
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
import csv
driver = webdriver.Chrome()
driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")
state_drop = driver.find_element_by_id("state")
state = Select(state_drop)
state.select_by_visible_text("New Jersey")
driver.find_element_by_id("city").send_keys("Galloway")
driver.find_element_by_id("name").send_keys("Absegami High School")
driver.find_element_by_class_name("forms_input_button").send_keys(Keys.RETURN)
driver.find_element_by_id("hsSelectRadio_1").click()
#scraping the caption of the tables
all_sub_head = driver.find_elements_by_class_name("tableSubHeaderForWsrDetail")
#scraping all the headers of the tables
all_headers = driver.find_elements_by_class_name("tableHeaderForWsrDetail")
#filtering the desired headers
required_headers = all_headers[5:]
#scraoing all the table data
all_contents = driver.find_elements_by_class_name("tdTinyFontForWsrDetail")
#filtering the desired tabla data
required_contents = all_contents[45:]
all_contents is a list of objects.
List comprehension is a quick and common way to gather property values from objects into another list.
Add these lines to the bottom of your script
lstdata = [e.text for e in required_contents]
with open('out.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(lstdata)
Note that you have newlines in your data, so they will appear in the csv file (school address column)
I have created a CSV file and it is currently empty. My code checks whether if the CSV file contains data or not. If it doesn't, it adds data to it. If it does, it doesn't do anything. This is what I tried so far:
import pandas as pd
df = pd.read_csv("file.csv")
if df.empty:
#code for adding in data
else:
pass #do nothing
But when implemented, I got the error:
pandas.errors.EmptyDataError: No columns to parse from file
Is there a better way to check if the CSV file is empty or not?
import pandas as pd
try:
#file.csv is an empty csv file
df=pd.read_csv('file.csv')
except pd.errors.EmptyDataError:
#Code to adding data
else:
pass #Do something
code :
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
links = pd.read_csv('C:\\Users\\acer\\Desktop\\hindustan_pages.csv',encoding = 'latin',dtype=str)
for i in range(1,3):
link = links.iloc[i,0]
r = requests.get(link)
soup = BeautifulSoup(r.text,'lxml')
div = soup.find('div',{"id":"company_list_grid"})
for links in div.find_all('th',{"id":"c_name"}):
link = links.find('a')
print("https://www.hindustanyellowpages.in/Ahmedabad" + link['href'][2:])
getting error:
Traceback (most recent call last):
File
"C:\Users\acer\AppData\Local\Programs\Python\Python37\hindustanyellowpages.py",
line 8, in
link = links.iloc[i,0]
TypeError: 'NoneType' object is not subscriptable
help me please to sort out this.
Found the issue, but first:
1) When pandas reads in the csv file, it's assuming you have a header. So row 1 in your csv actually never would get processed because that gets stored as the column name, so you'd need to add a parameter to tell pandas that there are no headers header=None. Also just as a note, not sure if you realize it or not, but the indices start at 0. So your code as is, is starting actually with the link at row 2 of the dataframe. If you want it to start at row 1, you'd need range(0,2)
2) the TypeError: 'NoneType' object is not subscriptable is raised because links which is storing your dataframe to iterate over, gets overwritten at for links in div.find_all('th',{"id":"c_name"}):. So when it goes to the 2nd row of your links dataframe, it no longer is a dataframe, but rather an element in your div.find_all('th',{"id":"c_name"}) tags. To fix this, we'll just rename your links dataframe as links_df:
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
# Renamed to links_df and added header=None
links_df = pd.read_csv('C:\\Users\\acer\\Desktop\\hindustan_pages.csv',encoding = 'latin',dtype=str, header=None)
#for i, row in links_df.iterrows(): <---- if you want it to do the whole dataframe, you can simply use .iterrows instead of for i in range()
for i in range(1,3):
link = links_df.iloc[i,0] #<--- refers to a row in your links_df
r = requests.get(link)
soup = BeautifulSoup(r.text,'lxml')
div = soup.find('div',{"id":"company_list_grid"})
for links in div.find_all('th',{"id":"c_name"}):
link = links.find('a')
print("https://www.hindustanyellowpages.in/Ahmedabad" + link['href'][2:])
Output:
https://www.hindustanyellowpages.in/Ahmedabad/Hetal-Mandap-Decorators/Satellite
https://www.hindustanyellowpages.in/Ahmedabad/Radhe-Krishna-Event-Management/Bapunagar
https://www.hindustanyellowpages.in/Ahmedabad/Amiraj-Decorators/Maninagar
https://www.hindustanyellowpages.in/Ahmedabad/Hiral-Handicraft/Saijpur-Bogha
https://www.hindustanyellowpages.in/Ahmedabad/S-D-Traders/Teen-Darwaja
https://www.hindustanyellowpages.in/Ahmedabad/Agro-Net-Plast/Kathwada
https://www.hindustanyellowpages.in/Ahmedabad/Shree-Krishna-Suppliers/Naroda
https://www.hindustanyellowpages.in/Ahmedabad/Bulakhidas-Vitthaldas/Panchkuva
https://www.hindustanyellowpages.in/Ahmedabad/Nagindas-AND-Sons-(Patva)/Relief-Road
https://www.hindustanyellowpages.in/Ahmedabad/New-Rahul-Electricals-OR-Dhruv-Light-Decoration-AND-Sound-System/Subhash-Bridge
https://www.hindustanyellowpages.in/Ahmedabad/Bhairavi-Craft/Thaltej
https://www.hindustanyellowpages.in/Ahmedabad/Saath-Sangath-Party-Plot/Rakanpur
https://www.hindustanyellowpages.in/Ahmedabad/Poonam-Light-Decoration-And-Electricals/Naroda
https://www.hindustanyellowpages.in/Ahmedabad/Muku-Enterprise/Isanpur
https://www.hindustanyellowpages.in/Ahmedabad/Malaviya-Decoration-Service/Nikol
https://www.hindustanyellowpages.in/Ahmedabad/Hariprabha-Enterprises/Paldi
https://www.hindustanyellowpages.in/Ahmedabad/Festo-Craft/Thaltej
https://www.hindustanyellowpages.in/Ahmedabad/Jay-Jognimata-Decoration/Kubernagar
https://www.hindustanyellowpages.in/Ahmedabad/Maruti-Decorators/Ranip
https://www.hindustanyellowpages.in/Ahmedabad/Krishna-Light-and-Decoration/Gandhi-Road
https://www.hindustanyellowpages.in/Ahmedabad/Shree-Ambika-Light-Decoration/Ghatlodiya
https://www.hindustanyellowpages.in/Ahmedabad/R-S-Power-House/Gandhi-Road
https://www.hindustanyellowpages.in/Ahmedabad/New-Amit-Stores/Paldi
https://www.hindustanyellowpages.in/Ahmedabad/Gayatri-Cetera-s-and-Decoration/Ranip
https://www.hindustanyellowpages.in/Ahmedabad/Poonam-lights-and-decoration/Navrangpura
https://www.hindustanyellowpages.in/Ahmedabad/Shree-Ambica-Engg-Works/Naroda
https://www.hindustanyellowpages.in/Ahmedabad/Vedant-Industries/Vatva-Gidc
https://www.hindustanyellowpages.in/Ahmedabad/Honest-Traders/Narol
https://www.hindustanyellowpages.in/Ahmedabad/Sai-Samarth-Industries/Nana-Chiloda
https://www.hindustanyellowpages.in/Ahmedabad/R-N-Industries/Odhav
https://www.hindustanyellowpages.in/Ahmedabad/Jay-Ambe-Enterprise/S-G-Road
https://www.hindustanyellowpages.in/Ahmedabad/Satyam-Enterprise/Narol
https://www.hindustanyellowpages.in/Ahmedabad/Maniar-And-Co/Rakhial
https://www.hindustanyellowpages.in/Ahmedabad/Shiv-Shakti-Plastic-Industries/Amraivadi
https://www.hindustanyellowpages.in/Ahmedabad/Dinesh-Engineering-Works/Kathwada
https://www.hindustanyellowpages.in/Ahmedabad/MARUTI-ENGINEERS-OR-MARUTI-SAND-BLASTING/Odhav
https://www.hindustanyellowpages.in/Ahmedabad/Cranetech-Equipments/Vatva
https://www.hindustanyellowpages.in/Ahmedabad/Suyog-Engineering-AND-Fabricators/Chhatral-GIDC
https://www.hindustanyellowpages.in/Ahmedabad/Sanghvii-Conveyors/Thaltej
https://www.hindustanyellowpages.in/Ahmedabad/Siddh-Kripa-Steel-Fab/Kathwada
https://www.hindustanyellowpages.in/Ahmedabad/Ashirvad-Industries/Vatva-Gidc
https://www.hindustanyellowpages.in/Ahmedabad/G-I-Lokhandwala/Chandola
https://www.hindustanyellowpages.in/Ahmedabad/M-P-Electrical-Engineering--OR--M-P-Crane/Naroda-Gidc
https://www.hindustanyellowpages.in/Ahmedabad/Krishna-Enterprise/Geeta-Mandir
https://www.hindustanyellowpages.in/Ahmedabad/Nmtg-Mechtrans-Techniques-Pvt-Ltd/Naroda-Gidc
https://www.hindustanyellowpages.in/Ahmedabad/Shree-Balaad-Handling-Works/Odhav
https://www.hindustanyellowpages.in/Ahmedabad/DLM-Enterprise/Vastral
https://www.hindustanyellowpages.in/Ahmedabad/Alfa-Engineers--OR--Everest-Sanitary/Amaraivadi
https://www.hindustanyellowpages.in/Ahmedabad/Parv-Engineering-Equipments/Vatva
https://www.hindustanyellowpages.in/Ahmedabad/Vikrant-Equipments/Kathwada
Additional:
use:
for i, row in links_df.head(10).iterrows():
link = links_df.iloc[i,0]
or
for i, row in links_df.iloc[:10].iterrows():
link = links_df.iloc[i,0]
Try this out.
link = df.link.iloc[i]
If i am not mistaking your df is links so
link = links.link.iloc[i]
Let me know if this helps