I am trying to read data from this link: https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY
using python request and urllib library. I tried both libraries but not able to see even the status code of url. Please suggest what is wrong with this. I am attaching my code as well please look for it. and tell me where i am doing wrong
import csv
import requests
from csv import reader
import xlrd
import pandas
import urllib.request
from bs4 import BeautifulSoup
# open a connection to a URL using urllib
webUrl = urllib.request.urlopen('https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY')
#get the result code and print it
print ("result code: " + str(webUrl.getcode()))
read the data from the URL and print it
data = webUrl.read()
print (data)
You need to add headers
I made this just to test it out
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail#domain.com' # This is another valid field
}
# open a connection to a URL using urllib
webUrl = requests.get('https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY', headers=headers).text
#read the data from the URL and print it
soup = BeautifulSoup(webUrl, 'html.parser')
print (soup.prettify())
Related
I have a problem with my code created in python.
I would like the URL API telegram to open with a change so that the downloaded item from the site is sent to chat.
# Import libraries
import requests
import urllib.request
import time
import sys
from bs4 import BeautifulSoup
stdoutOrigin=sys.stdout
sys.stdout = open("log.txt", "w")
# Set the URL you want to webscrape from
url = 'https://31asdasdasdasdasd.com/'
# Connect to the URL
response = requests.get(url)
# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(response.text, "html.parser")
zapisane = ''
row = soup.find('strong')
print(">> Ilosc opinii ktora przeszla:")
send = print(row.get_text()) # Print row as text
import urllib.request
u = urllib.request.urlopen("https://api.telegram.org/botid:ts/sendMessage?chat_id=-3channel1&text=")
You likely want to use a string format with a variable in your last line of code shown here. Here's a helpful resource for string formatting: https://www.geeksforgeeks.org/python-format-function/
I'm in the process of learning python3 and I try to solve a simple task. I want to get the name of account and the date of post from instagram link.
import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.instagram.com/p/BuPSnoTlvTR')
soup = BeautifulSoup(html.text, 'lxml')
item = soup.select_one("meta[property='og:description']")
name = item.find_previous_sibling().get("content").split("•")[0]
print(name)
This code works sometimes with links like this https://www.instagram.com/kingtop
But I need it to work also with post of image like this https://www.instagram.com/p/BuxB00KFI-x/
That's all what I could make, but this is not working. And I can't get the date also.
Do you have any ideas? I appreciate any help.
I found a way to get the name of account. Now I'm trying to find a way to get an upload date
import requests
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import time
from multiprocessing import Pool
from requests.exceptions import HTTPError
start = time.time()
file = open('users.txt', 'r', encoding="ISO-8859-1")
urls = file.readlines()
for url in urls:
url = url.strip ('\n')
try:
req = requests.get(url)
req.raise_for_status()
except HTTPError as http_err:
output = open('output2.txt', 'a')
output.write(f'не найдена\n')
except Exception as err:
output = open('output2.txt', 'a')
output.write(f'не найдены\n')
else:
output = open('output2.txt', 'a')
soup = BeautifulSoup(req.text, "lxml")
the_url = soup.select("[rel='canonical']")[0]['href']
the_url2=the_url.replace('https://www.instagram.com/','')
head, sep, tail = the_url2.partition('/')
output.write (head+'\n')
I'm performing the same web scraping pattern that I just learned from post , however, I'm unable to scrap the using below script. I keep getting an empty return and I know the tags are there. I want to find_all "mubox" then pulls values for O/U and goalie information. This so weird, what am I missing?
from bs4 import BeautifulSoup
import requests
import pandas as pd
page_link = 'https://www.thespread.com/nhl-scores-matchups'
page_response = requests.get(page_link, timeout=10)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")
# Take out the <div> of name and get its value
tables = page_content.find_all("div", class_="mubox")
print (tables)
# Iterate through rows
rows = []
This site uses an internal API before rendering the data. This api is an xml file, you can get here which contains all the match information. You can parse it using beautiful soup :
from bs4 import BeautifulSoup
import requests
page_link = 'https://www.thespread.com/matchups/NHL/matchup-list_20181030.xml'
page_response = requests.get(page_link, timeout=10)
body = BeautifulSoup(page_response.content, "lxml")
data = [
(
t.find("road").text,
t.find("roadgoalie").text,
t.find("home").text,
t.find("homegoalie").text,
float(t.find("ot").text),
float(t.find("otmoney").text),
float(t.find("ft").text),
float(t.find("ftmoney").text)
)
for t in body.find_all('event')
]
print(data)
I am trying to fetch some data from a webpage using bs4, but I am having troubles opening the link. So here is the code I am using:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
my_url = "https://www.transfermarkt.com/wettbewerbe/europa/"
client = urlopen(my_url)
page_html = client.read()
client.close()
The curious thing is that only this particular link won't work. Others work completely fine. So what is so special about this link? And how can I open it?
The problem is from the User-Agent. Use urllib.request.Request to set/change the header.
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as soup
my_url = "https://www.transfermarkt.com/wettbewerbe/europa/"
client = Request(my_url, headers={"User-Agent" : "Mozilla/5.0"})
page = urlopen(client).read()
print(page)
This is the code that wrote to fetch data from webpage:
import urllib.request
import urllib
from bs4 import BeautifulSoup
def make_soup(url):
req=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
thepage=urllib.request.urlopen(req)
soupdata=BeautifulSoup(thepage,'html5lib')
return soupdata
soup=make_soup("https://www.nseindia.com/live_market/dynaContent/live_analysis/top_gainers_losers.htm?cat=G")
t=soup.findAll('table')[0]
for record in t.findAll('tr'):
print(record.td.text)
'''
for record in t.findAll('tr'):
for data in record.findAll('td'):
print(data.text)
'''
But this code fetches only first tr. how to get values for remaining tr