how can I convert a text to float in python? - python-3.5

I will import exchange rate from a site and I will use exchange rate to calculate.I have no problem about importing.But I will import exchange rate as string.I must convert string to float to calculate.But I couldn't.I don't know where problem is.Section of Show some code is just a instance.It has same problem.
I will use data for calculation with exchange rate.
import requests
from bs4 import BeautifulSoup
url = "https://www.bloomberght.com"
response = requests.get(url)
icerik = response.content
soup = BeautifulSoup(icerik, "html.parser")
liste = []
liste2=[]
for i in soup.find_all("div", {"class", "line2"}):
i =i.text
liste.append(i.strip())
A=8*float(liste[2])
print(A)
Traceback (most recent call last):
File "C:/Users/proin/PycharmProjects/software222/BBBBBBB.py", line 15, in
A=8*float(liste[2])
ValueError: could not convert string to float: '6,5827'
Process finished with exit code 1

Use str.replace() to replace , to .. Then the conversion to float will work:
import requests
from bs4 import BeautifulSoup
url = "https://www.bloomberght.com"
response = requests.get(url)
icerik = response.content
soup = BeautifulSoup(icerik, "html.parser")
liste = []
liste2= []
for i in soup.find_all("div", {"class", "line2"}):
i = i.text
liste.append(i.strip().replace(',', '.'))
A=8*float(liste[2])
print(A)
Prints:
52.6456

Related

How to print item in a list without a for loop

I'm trying to just get the price off a website and found that "class="udYkAW2UrhZln2Iv62EYb" " gave me the price in one line. but when I try to print it out I keep getting
<span class="udYkAW2UrhZln2Iv62EYb">$0.312423</span>
and not just the price itself. I fixed this by using a for loop to get me item, but is there a way to just display the price with a print function without a for loop?
Please and thank you.
Here's the code
from bs4 import BeautifulSoup as bs
import requests
url = 'https://robinhood.com/crypto/DOGE'
r = requests.get(url)
#make to soup
soup = bs(r.content, 'lxml')
#where the price of the search was found "span class='udYkAW2UrhZln2Iv62EYb'"
#Using find() because this is the first instance of this class
price_class = soup.find('span', {'class' : 'udYkAW2UrhZln2Iv62EYb'})
print(price_class)
type(price_class)
#outout: <span class="udYkAW2UrhZln2Iv62EYb">$0.312423</span>
#output: bs4.element.Tag
for i in price_class:
print(i)
#output: $0.312423
Use .text or .get_text():
from bs4 import BeautifulSoup as bs
import requests
url = "https://robinhood.com/crypto/DOGE"
r = requests.get(url)
soup = bs(r.content, "lxml")
price = soup.find("span", {"class": "udYkAW2UrhZln2Iv62EYb"})
print(price.text) # <--- use .text
Prints:
$0.315917

Iterate all pages and crawler table's elements save as dataframe in Python

I need to loop all the entries of all the pages from this link, then click the menu check in the red part (please see the image below) to enter the detail of each entry:
The objective is to cralwer the infos from the pages such as image below, and save left part as column names and right part as rows:
The code I used:
import requests
import json
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
url = 'http://bjjs.zjw.beijing.gov.cn/eportal/ui?pageId=425000'
content = requests.get(url).text
soup = BeautifulSoup(content, 'lxml')
table = soup.find('table', {'class': 'gridview'})
df = pd.read_html(str(table))[0]
print(df.head(5))
Out:
序号 工程名称 ... 发证日期 详细信息
0 NaN 假日万恒社区卫生服务站装饰装修工程 ... 2020-07-07 查看
The code for entering the detailed pages:
url = 'http://bjjs.zjw.beijing.gov.cn/eportal/ui?pageId=308891&t=toDetail&GCBM=202006202001'
content = requests.get(url).text
soup = BeautifulSoup(content, 'lxml')
table = soup.find("table", attrs={"class":"detailview"}).findAll("tr")
for elements in table:
inner_elements = elements.findAll("td", attrs={"class":"label"})
for text_for_elements in inner_elements:
print(text_for_elements.text)
Out:
工程名称:
施工许可证号:
所在区县:
建设单位:
工程规模(平方米):
发证日期:
建设地址:
施工单位:
监理单位:
设计单位:
行政相对人代码:
法定代表人姓名:
许可机关:
As you can see, I only get column name, no entries have been successfully extracted.
In order to loop all pages, I think we need to use post requests, but I don't know how to get headers.
Thanks for your help at advance.
This script will go for all pages and gets the data into a DataFrame and saves them to data.csv.
(!!! Warning !!! there are 2405 pages total, so it takes a long time to get them all):
import requests
import pandas as pd
from pprint import pprint
from bs4 import BeautifulSoup
url = 'http://bjjs.zjw.beijing.gov.cn/eportal/ui?pageId=425000'
payload = {'currentPage': 1, 'pageSize':15}
def scrape_page(url):
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
return {td.get_text(strip=True).replace(':', ''): td.find_next('td').get_text(strip=True) for td in soup.select('td.label')}
all_data = []
current_page = 1
while True:
print('Page {}...'.format(current_page))
payload['currentPage'] = current_page
soup = BeautifulSoup(requests.post(url, data=payload).content, 'html.parser')
for a in soup.select('a:contains("查看")'):
u = 'http://bjjs.zjw.beijing.gov.cn' + a['href']
d = scrape_page(u)
all_data.append(d)
pprint(d)
page_next = soup.select_one('a:contains("下一页")[onclick]')
if not page_next:
break
current_page += 1
df = pd.DataFrame(all_data)
df.to_csv('data.csv')
Prints the data to screen and saves data.csv (screenshot from LibreOffice):

Bs4 scrape table specifik

I hope you guys are always healthy
I want to scrape a more specific table using BS4. this is my code:
from bs4 import BeautifulSoup
import requests
url = 'test.com'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
for row in soup.select('tbody tr'):
row_text = [x.text for x in row.find_all('td')]
print (row_text)
how do you get results like this:
Number, Name, address, telp, komoditi
1, "ABON JUARA" JUARA FOOD INDUSTRY, Jl. Jend Sudirman 339, Salatiga, Jawa Tengah, 0298-324060, Abon Sapi Dan Ayam
and saved in CSV
import requests
from bs4 import BeautifulSoup
import csv
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.select_one("table#newspaper-a").select("tr[valign=top]")
with open("data.csv", 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(["No", "Name", "Address", "Tel", "Komoditi"])
for item in target:
item = list(item.stripped_strings)
item[3] = item[3][6:]
writer.writerow(item)
main("https://kemenperin.go.id/direktori-perusahaan?what=&prov=&hal=1")
Output: view-online

How to get the values from this with beautifulsoup?

I am trying to learn beautifulsoup and is scraping this website.
My python code looks like this:
import requests
from bs4 import BeautifulSoup
print("Enter the last 3 characters from the share link")
share_link = input()
link = "https://website.com" + share_link
print(link)
r = requests.get(link)
raw = r.text
soup = BeautifulSoup(raw, features="html.parser")
print(soup.prettify)
inputTag = soup.find("input", {"id": "hiddenInput"})
output = inputTag["value"]
print(output)
It gives me this output:
{"broadcastId":"BroadcastID: 252940","rtmp_url":"rtmp://live.gchao.cn/live/23331_9wx2w0c9","sex":0,"accountType":"26073","hls_url":"http://live.gchao.cn/live/23331_9wx2w0c9.m3u8","onlineNum":99,"likeNum":67,"live_id":282878,"flv_url":"http://live.gchao.cn/live/23331_9wx2w0c9.flv?txSecret=40d318efbbbca6afb8be2450b8d1f8fa&txTime=5D6086D1","user_id":252940,"stream_id":"23331_9wx2w0c9","nick_name":"Princess","sdkAppID":"1400088004","info_id":33189,"info_name":"Hi","IM_ID":"#TGS#aXMZYZ7FB","earning":424}
How do I get inside this and with beautifulsoup get the values?
If it is json you can load with json library then parse e.g.
import json
s = '{"broadcastId":"BroadcastID: 252940","rtmp_url":"rtmp://live.gchao.cn/live/23331_9wx2w0c9","sex":0,"accountType":"26073","hls_url":"http://live.gchao.cn/live/23331_9wx2w0c9.m3u8","onlineNum":99,"likeNum":67,"live_id":282878,"flv_url":"http://live.gchao.cn/live/23331_9wx2w0c9.flv?txSecret=40d318efbbbca6afb8be2450b8d1f8fa&txTime=5D6086D1","user_id":252940,"stream_id":"23331_9wx2w0c9","nick_name":"Princess","sdkAppID":"1400088004","info_id":33189,"info_name":"Hi","IM_ID":"#TGS#aXMZYZ7FB","earning":424}'
data = json.loads(s)
print(data['broadcastId'])

Not able to use BeautifulSoup to get span content of Nasdaq100 future

from bs4
import BeautifulSoup
import re
import requests
url = 'www.barchart.com/futures/quotes/NQU18'
r = requests.get("https://" +url)
data = r.text
soup = BeautifulSoup(data)
price = soup.find('span', {'class': 'last-change',
'data-ng-class': "highlightValue('priceChange’)”}).text
print(price)
Result:
[[ item.priceChange ]]
It is not the span content. The result should be price. Where am I going wrong?
The following is the span tag of the page:
2nd screenshot: How can I get the time?
Use price = soup.find('span', {'class': 'up'}).text instead to get the +X.XX value:
from bs4 import BeautifulSoup
import requests
url = 'www.barchart.com/futures/quotes/NQU18'
r = requests.get("https://" +url)
data = r.text
soup = BeautifulSoup(data, "lxml")
price = soup.find('span', {'class': 'up'}).text
print(price)
Output currently is:
+74.75
The tradeTime you seek seems to not be present in the page_source, since it's dynamically generated through JavaScript. You can, however, find it elsewhere if you're a little clever, and use the json library to parse the JSON data from a certain script element:
import json
trade_time = soup.find('script', {"id": 'barchart-www-inline-data'}).text
json_data = json.loads(trade_time)
print(json_data["NQU18"]["quote"]["tradeTime"])
This outputs:
2018-06-14T18:14:05
If these don't solve your problem then you will have to resort to something like Selenium that can run JavaScript to get what you're looking for:
from selenium import webdriver
driver = webdriver.Chrome()
url = ("https://www.barchart.com/futures/quotes/NQU18")
driver.get(url)
result = driver.find_element_by_xpath('//*[#id="main-content-column"]/div/div[1]/div[2]/span[2]/span[1]')
print(result.text)
Currently the output is:
-13.00

Resources