python webscraping with BeautifulSoup Problem

python webscraping with BeautifulSoup Problem - python-3.x

Im trying to scrape the box score table from https://www.nascar.com/stats/2021/1/box-score
my code is not working if someone could take a look and point me in right direction.
`import requests
from bs4 import BeautifulSoup
import pandas as pd
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = ('https://www.nascar.com/stats/2021/1/box-score')
response = requests.get(url, headers = headers)
response.content
soup = BeautifulSoup(response.content, 'html.parser')
stats = soup.find_all('table', class_ = "stats-box-score-table-driver")
stats
for row in stats.find_all('tr'):
for cell in row.find_all('td'):
print(cell.text)

Related

Elements duplicated with Beautifulsoup

This is the url: https://yorkathletics.com/sports/mens-swimming-and-diving/roster"
If I run this command:
soup.find_all('span', class_="sidearm-roster-player-height")
then I try to get the length of the output, it is mentioned 20 while it is supposed to be 10.
I can't see why this happens.

Change your class selector as follows:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://yorkathletics.com/sports/mens-swimming-and-diving/roster')
soup = bs(r.content, 'lxml')
print([i.text for i in soup.select('.height')])
Note: You can grab the whole table with pandas:
import pandas as pd
table = pd.read_html('https://yorkathletics.com/sports/mens-swimming-and-diving/roster')[2]
print(table)

How to Message me if Soup can't find section on Website ? per discord or idk

import requests
from bs4 import BeautifulSoup
source = requests.get('https://shop.travisscott.com/password').text
soup = BeautifulSoup(source,'lxml')
Article = soup.find('p')
print(Article.prettify())

How to scrape from web all children of an attribute with one class?

I have tried to get the highlighted area (in the screenshot) in the website using BeautifulSoup4, but I cannot get what I want. Maybe you have a recommendation doing it with another way.
Screenshot of the website I need to get data from
from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip
import urllib
import csv
import html5lib
urls = ['https://e-mehkeme.gov.az/Public/Cases?page=1',
'https://e-mehkeme.gov.az/Public/Cases?page=2'
]
# scrape elements
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
content = soup.findAll("input", class_="casedetail filled")
print(content)
My expected output is like this:
Ətraflı məlumat:
İşə baxan hakim və ya tərkib
Xəyalə Cəmilova - sədrlik edən hakim
İlham Kərimli - tərkib üzvü
İsmayıl Xəlilov - tərkib üzvü
Tərəflər
Cavabdeh: MAHMUDOV MAQSUD SOLTAN OĞLU
Cavabdeh: MAHMUDOV MAHMUD SOLTAN OĞLU
İddiaçı: QƏHRƏMANOVA AYNA NUĞAY QIZI
İşin mahiyyəti
Mənzil mübahisələri - Mənzildən çıxarılma

Using the base url first get all the caseid and then pass those caseid to target url and then get the value of the first td tag.
import requests
from bs4 import BeautifulSoup
urls = ['https://e-mehkeme.gov.az/Public/Cases?page=1',
'https://e-mehkeme.gov.az/Public/Cases?page=2'
]
target_url="https://e-mehkeme.gov.az/Public/CaseDetail?caseId={}"
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
for caseid in soup.select('input.casedetail'):
#print(caseid['value'])
soup1=BeautifulSoup(requests.get(target_url.format(caseid['value'])).content,'html.parser')
print(soup1.select_one("td").text)

I would write it this way. Extracting the id that needs to be put in GET request for detailed info
import requests
from bs4 import BeautifulSoup as bs
urls = ['https://e-mehkeme.gov.az/Public/Cases?page=1','https://e-mehkeme.gov.az/Public/Cases?page=2']
def get_soup(url):
r = s.get(url)
soup = bs(r.content, 'lxml')
return soup
with requests.Session() as s:
for url in urls:
soup = get_soup(url)
detail_urls = [f'https://e-mehkeme.gov.az/Public/CaseDetail?caseId={i["value"]}' for i in soup.select('.caseId')]
for next_url in detail_urls:
soup = get_soup(next_url)
data = [string for string in soup.select_one('[colspan="4"]').stripped_strings]
print(data)

Python web scrape from multiple columns

I am trying to pull data from various columns in the odds table from this website:
https://www.sportsbookreview.com/betting-odds/nba-basketball/totals/?date=20190419
I have tried using the following code but I am only getting the open lines. I want to be able to get exact columns. For example, the pinnacle and bookmaker columns.
import urllib
import urllib.request
from bs4 import BeautifulSoup
theurl = "https://www.sportsbookreview.com/betting-odds/nba-
basketball/totals/?date=20190419"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
for lines in soup.findAll('span',{"class":"_3Nv_7"}):
print(lines.get_text())

import urllib
import urllib.request
from bs4 import BeautifulSoup
theurl = "https://www.sportsbookreview.com/betting-odds/nba-basketball/totals/?date=20190419"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
for lines in soup.findAll('span',{"class":"_3Nv_7 opener"}):
print(lines.get_text())

Python REST API scraping keyError

I have Scraped a REST API, and here is my code:
import json
from pprint import pprint
import sqlite3
import datetime
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import requests
url = "https://cex.io/api/ohlcv/hd/20180125/BTC/USD"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = soup(page.text, "html.parser")
a = soup("data1d")
I want the data of "data1d" from soup but when I try to do this it shows:
File "C:\Users\mubee\Downloads\Annaconda\lib\site-packages\bs4\element.py", line 1011, in __getitem__
return self.attrs[key]
KeyError: 'data1d'
while there is a data present in "data1d" in variable soup. How can I get the data present in "data1d" only, from the variable soup?

As the page is just json it is simple, no need for soup:
import requests
url = "https://cex.io/api/ohlcv/hd/20180125/BTC/USD"
page = requests.get(url)
page_dict=page.json()
print(page_dict['data1d'])

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

python webscraping with BeautifulSoup Problem - python-3.x

Related

Elements duplicated with Beautifulsoup

How to Message me if Soup can't find section on Website ? per discord or idk

How to scrape from web all children of an attribute with one class?

Python web scrape from multiple columns

Python REST API scraping keyError

Categories

Resources