This is my code:
import requests
from bs4 import BeautifulSoup
r=requests.get('https://www.morningstar.com/stocks/xtse/enb/quote')
c=r.content
soup=BeautifulSoup(c,"html.parser")
print(soup.prettify())
for item in soup.find("byId: {}".text):
print(item.text)
Once I ran that, on the bottom most of the whole html file it shows:
window.__NUXT__=(function(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N){J.id=1014028;J.title="Enbridge Expects a Rebound in 2021, Increases Dividend by 3%";J.deck=y;J.locale=K;J.publishedDate=L;J.updatedDate=L;J.paywalled=b;J.authors=[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}];J.authorDisclosure=[];J.body=[{type:w,contentObject:[{type:x,content:"Wide-moat Enbridge announced 2021 full-year adjusted EBITDA guidance of CAD 13.9 billion-CAD 14.3 billion at its annual investor day on Dec. 8, which is above our previous forecasts. The midpoint of the guidance also implies a 6% increase from 2020 expected EBITDA. Enbridge believes that the increased performance will be driven by a recovery of Mainline volumes and the associated downstream pipelines; customer growth in the gas utilities business; rate increases on its gas pipelines; and the impact of new projects, including the Line 3 replacement. At this point, the Mainline’s heavy oil capacity is full, and demand for light capacity continues to increase. Accordingly, Enbridge expects first-quarter 2021 volumes to average 2.7 million barrels of oil per day, which is also above our previous expectations and compares favorably with 2.56 mmbbl\u002Fd in third-quarter 2020.",gated:a}],gated:a}];M.name=H;M.performanceId=A;M.secId=A;M.ticker=N;M.exchange=I;M.type=F;return {layout:"default",data:[{marketPrice:{value:43.55,filtered:a,date:{value:v,filtered:a}},premiumDiscount:{value:"-92.6",filtered:b,date:{value:v,filtered:a},text:{value:"Discount",filtered:a},type:{value:D,filtered:a}},threeStarRatingPrice:{value:"86.9",filtered:b},headquarterAddress1:{value:"425-1st Street SW",filtered:a},headquarterAddress2:{value:"Suite 200, Fifth Avenue Place",filtered:a},industry:{value:"Oil & Gas Midstream",filtered:a},stockStarRating:{value:"6",filtered:b,date:{value:v,filtered:a},text:{value:"Undervalued",filtered:a},type:{value:D,filtered:a}},fiscalYearEndDate:{value:"2020-12-31",filtered:a},headquarterState:{value:"AB",filtered:a},reportDate:{value:"2020-09-30",filtered:a},headquarterPostalCode:{value:"T2P 3L8",filtered:a},stewardshipRating:{value:"Bkwvsknl",filtered:b,date:{value:v,filtered:a}},companyProfile:{value:"Enbridge is an energy generation, distribution, and transportation company in the U.S. and Canada. Its pipeline network consists of the Canadian Mainline system, regional oil sands pipelines, and natural gas pipelines. The company also owns and operates a regulated natural gas utility and Canada’s largest natural gas distribution company. Additionally, Enbridge generates renewable and alternative energy with 2,000 megawatts of capacity.",filtered:a},fairValue:{value:"23.3",filtered:b,date:{value:"2020-12-08",filtered:a},type:{value:D,filtered:a}},fax:{value:"+1 403 231-5929",filtered:a},fiveStarRatingPrice:{value:"58.3",filtered:b},sector:{value:"Energy",filtered:a},economicMoat:{value:"Nsfq",filtered:b,date:{value:v,filtered:a}},ticker:N,website:{value:"https:\u002F\u002Fwww.enbridge.com",filtered:a},headquarterCountry:{value:"Canada",filtered:a},contactEmail:{value:"investor.relations#enbridge.com",filtered:a},fourStarRatingPrice:{value:"77.0",filtered:b},economicMoatTrend:{value:"Zcbqwh",filtered:b,date:{value:v,filtered:a}},headquarterCity:{value:"Calgary",filtered:a},phone:{value:"+1 403 231-3900",filtered:a},universe:{value:"EQ",filtered:a},exchange:I,totalEmployees:{value:11300,filtered:a},twoStarRatingPrice:{value:"54.91",filtered:b},fairValueUncertainty:{value:"Nnczrm",filtered:b},name:H,performanceId:A,secId:A,type:F,articles:[{title:"Epic Oil Crash Sets Up Brutal Downturn for Energy Sector",link:"\u002Farticles\u002F980350\u002Fepic-oil-crash-sets-up-brutal-downturn-for-energy-sector",caption:"But recovery is inevitable, and stocks look very cheap--just watch out for bankruptcy risk.",author:"Preston Caldwell",label:C,isVideo:a},{title:"Enbridge's Sell-Off Looks Exaggerated",link:"\u002Farticles\u002F978902\u002Fenbridges-sell-off-looks-exaggerated",caption:"The market is underestimating long-term cash flows once oil prices normalize.",author:c,label:z,isVideo:a},{title:"Executive Orders More Symbolic Than Material for Pipelines",link:"\u002Farticles\u002F923945\u002Fexecutive-orders-more-symbolic-than-material-for-pipelines",caption:"We're not changing our outlook for our midstream coverage.",author:"Stephen Ellis",label:C,isVideo:a},{title:"New Permit Moves Keystone XL Forward",link:"\u002Farticles\u002F922171\u002Fnew-permit-moves-keystone-xl-forward",caption:"Our fair value estimates for TransCanada and Enbridge are unchanged.",author:c,label:z,isVideo:a},{title:"Enbridge Hikes Dividend, Remains Deeply Undervalued",link:"\u002Farticles\u002F904922\u002Fenbridge-hikes-dividend-remains-deeply-undervalued",caption:"It has a wide moat and an attractive yield, and now's the time to invest.",author:c,label:z,isVideo:a},{title:"Concerns About Enbridge's Dividend Are Overblown",link:"\u002Farticles\u002F870105\u002Fconcerns-about-enbridges-dividend-are-overblown",caption:"The wide-moat company is on course to boost its dividend and offers hefty upside.",author:c,label:z,isVideo:a},{title:"Enbridge's Growth Portfolio Is Underappreciated",link:"\u002Farticles\u002F841990\u002Fenbridges-growth-portfolio-is-underappreciated",caption:"And the company continues to reward investors with annual dividend growth.",author:c,label:z,isVideo:a},{title:"Enbridge's Economic Moat Widens",link:"\u002Farticles\u002F565094\u002Fenbridges-economic-moat-widens",caption:"Shifting economics and supply dynamics provide growth opportunities.",author:"David McColl",label:C,isVideo:a}],analysis:{id:1014019,title:"Enbridge Increases Its Dividend by 3%",locale:K,publishedDate:B,updatedDate:B,paywalled:b,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],authorDisclosure:[],body:[],pillars:{investmentThesis:{title:"Business Strategy and Outlook",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge is an energy distribution and transportation company in the United States and Canada. It operates crude and natural gas pipelines, including the Canadian Mainline system. It also owns and operates Canada's largest natural gas distribution company.",gated:a}],gated:a}]},moat:{title:"Economic Moat",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Midstream companies process, transport, and store natural gas, natural gas liquids, crude oil, and refined products. There are multiple ways for midstream companies to build moats, but efficient scale is the dominant source. Hydrocarbons are produced and consumed in different places and in different forms from how they come out of the ground. Midstream firms transport and process hydrocarbons. Once a transport route is established, there's usually little need to build a competing route. Doing so would drive returns for both routes below the cost of capital. Thus, pipelines are generally moaty because they efficiently serve markets of limited size.",gated:a}],gated:a}]},managementAndStewardship:{title:"Stewardship",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"President and CEO Al Monaco has been with Enbridge since 1995, serving in his current role since 2012. During his tenure at Enbridge, Monaco has experience in all business segments, international business development, corporate planning, and finance. His experience in various business segments, corporate development, growth projects, and finance positions him to successfully lead the proposed pipeline and gas distribution growth projects.",gated:a}],gated:a}]},enterpriseRisk:{title:"Risk and Uncertainty",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge’s profitability is not directly tied to commodity prices, as pipeline transportation costs are not tied to the price of natural gas and crude oil. However, the cyclical supply and demand nature of commodities and related pricing can have an indirect impact on the business as shippers may choose to accelerate or delay certain projects. This can affect the timing for the demand of transportation services and\u002For new gas pipeline infrastructure.",gated:a}],gated:a}]},valuation:{title:"Fair Value and Profit Drivers",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Our fair value estimate of $43 (CAD 57) per share is based on a discounted cash flow model. We believe that Enbridge’s broad network of midstream assets and geographic diversification will serve it well in the low oil and gas price environment, and crude and natural gas pipeline expansions in growing regions will fuel EBITDA growth. Our cash flow forecasts incorporate the addition of the Line 3 replacement pipeline, but we adjusted our Canadian fair value downward to a reflect a risk-weighted probability of 80% that the pipeline is built.",gated:a}],gated:a}]},notes:J},notes:J},listedCurrency:{value:"CAD",filtered:a}},{}],error:y,state:{history:{currentRoute:"\u002Fstocks\u002Fxtse\u002Fenb\u002Fquote",previousRoute:y,returnRoute:y},ids:{byTicker:{"ST::XTSE::ENB":M},byId:{"0P0000681O":M}},markets:{movers:{gainers:[],losers:[],actives:[]},quotes:{},trailingReturns:{},intradayTimeSeries:{},lastRefreshed:y},player:{nowPlaying:y},siteAlert:{message:G,type:G},user:{userType:"visitor",isAdvisor:a,contentType:"e7FDDltrTy+tA2HnLovvGL0LFMwT+KkEptGju5wXVTU="}},serverRendered:b,serverDate:new Date(1607916811524)}}(false,true,"Joe Gemino","https:\u002F\u002Fim.mstar.com\u002FContent\u002FCMSImages\u002F78x78\u002F2008-jgemino-78x78.jpg","Joe Gemino, CPA, is a senior equity analyst for Morningstar.","Joe Gemino, CPA","Senior Equity Analyst","2008","MST50","SGDLX","FB","TRRNX","DODIX","DODGX","MORN","MOAT","AAPL","V","T","DIS","DODBX","2020-12-11","p","text",null,"Stock Strategist","0P0000681O","2020-12-08T18:13:00Z","Stock Strategist Industry Reports","Qual","2020-04-16T14:59:00Z","ST","","Enbridge Inc","XTSE",{},"en-US","2020-12-08T18:28:00Z",{},"ENB"));
My question: how do I extract the information inside "byID:" so that print(item.text) will give me "0P0000681O" only.
If you only need to get a value, use a string find.
from simplified_scrapy import utils, SimplifiedDoc, req
html = req.get(
'https://www.morningstar.com/stocks/xtse/enb/quote')
start = html.find('byId:{"')
html = html[start+len('byId:{"'):]
end = html.find('":')
print(html[:end])
Result:
0P0000681O
I want to scrape texts from the below website using beautifulsoap, but not all text. so I want to avoid a text contained in any of the following:
1- text contained in a link
2- text contained in or describe the images.
3- avoid the last sentences that contain words such as 'Disclosure'.
I tried the following but did not work properly, so any help would be really appreciated
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html')
soup = BeautifulSoup(r.text,'lxml')
txt = ''
for row in soup.find_all('div', {"class": "group"}):
if row.a:
continue
txt += ''.join(row.text)
print(txt)
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html')
soup = BeautifulSoup(r.text,'lxml')
divs = soup.find_all("div", class_="group")
data = []
for div in divs:
p = div.find_all("p")
for i in p:
if i.find_all("a"):
for j in i.find_all("a"): j.extract()
if i.find_all("img"):
for j in i.find_all("img"): j.extract()
if "disclosure" not in i.text.lower(): data.append(i.get_text(strip=True))
print("\n".join(data))
Output:
CNBC's "College Voices 2020" is a series written by CNBC summer interns from universities across the country about coming of age, launching new careers and job hunting during a global pandemic. They're finding their voices during a time of great social change and hope for a better future. What money issues are they facing? How are they navigating their student loans? How are they getting work experience, networking and applying for jobs when so many opportunities have been canceled or postponed? How important is diversity and a company's values to Gen Z job seekers?
In life, challenges arise, but they are meant to be conquered through perseverance, and never giving up. This is something I was taught at a young age. As a kid, I ran track and field, which presented obstacles — both literally and psychologically.
As much as I wanted to believe in never giving up, the notion of doing so always lingered in the back of my mind during harsh practices and races that did not go my way. I did not like losing (even today!), but I especially did not like knowing that I could put everything I had down on the line, and still come up short. It was through this that I realized hard work and dedication do not always bring wins, but they do bring a spirit of perseverance, and from this perseverance — hope.
I did not prepare for the world we are in today. No one did. Coronavirus has devastated the economy — the unemployment rate has skyrocketed and the job market is the worst since the Great Depression. Graduating, finding that first job and launching your adult life are difficult enough but add in all of this and it can be overwhelming. Yet, just like how I used to jump over hurdles as a runner back in high school, all of these issues are just obstacles that we need to jump over in order to press forward with our lives and our careers.
It all comes down to one thing: perseverance.
The coronavirus has changed all of our lives: The way we consume, go outside, and work have all changed because of this pandemic. We need perseverance more now than ever!
One obstacle in particular that I believe my generation will have to overcome is social justice. Through the death of George Floyd, protests have occurred all over the country. What many are hoping, including myself, is this wave of protests help bring not only light, but true change in terms of how African Americans are treated in the system and everywhere – including the workplace. This includes being equally paid, better represented in positions of power and more financially supported on the local level. As an African American, I know the continuous struggle of not only having to be your best in every place you step foot in, but knowing a mistake could be the spark that confirms a bias or stereotype that is the product of generations worth of racism and abuse. I hope that as bad as this coronavirus has been, and the protests that have come from generations worth of frustration and anger … that perseverance and goodness come out through it.
More From Invest in You:
Another hurdle I believe my generation has to overcome is student-loan debt. This has always been an issue but now, many classes are being pushed online, but the cost of these classes has not gone down. The price tag that comes along with a four-year+ degree adds up to decades worth of paying back debt. (Thefor student loans of between $20,000 and $40,000, according to the Department of Education.) And sometimes you don't even wind up working in the field you have your degree in! I know personally that my first priority in finding a job is to be able to pay off my loans as quickly as possible. The ever daunting thought of carrying thousands of dollars worth of debt does not rest easy in my mind. But I know that the education I received has given me the opportunity to garner wealth for generations to come. My hope is that I can persevere through this battlefield called life with my newfound skills and knowledge.
And, of course, for all of us in Generation Z, there is the challenge of finding our first job out of college. What would be a daunting task already, has been heightened by the coronavirus. For me personally, I find that my job search has turned from wanting to find a Fortune 500 company job, to wanting a more local business. My thought behind this is finding ways to help businesses that want bright and hungry young people who might otherwise not get that talent because of the enticing income that comes from larger corporations. I hope to put my skills to work to help a company move toward a brighter tomorrow!
in the U.S. right now, according to the Labor Department — and the number is bound to go higher if restrictions on businesses stay where they are or increase. The struggle for all of these people to find good paying jobs will be reminiscent of the Great Depression fight for jobs — long lines and countless job applications to simply find a job to help pay bills. There are already so many people out of work and an entire generation graduating college and trying to join the work force — it becomes simple math to understand that not every one of us will get a job. This makes me concerned and frustrated but this obstacle will not stop me. The perseverance I learned growing up will help me to not give up when faced with challenges, no matter how big they are. I owe it to my family and friends to strive for success, because it is only then that I can be a better individual.
I also want my generation to do better for the environment. Like most people, I take the danger of climate change and global warming very seriously. What we do today will impact the way we are all able to survive in the future. As a result, the way that we work, operate and consume must change as well. Like many Gen Z young adults, I want the impact I make in my everyday life to help maintain our Earth, especially for the future generations that come after us.
Hard work and dedication don't always bring wins, but they do bring a spirit of perseverance. Perseverance brings hope, and hope brings forth action. As an African American, I think the future looks bright, but only because of the groundwork that my generation is putting forth to not let underrepresented voices go unheard. I truly believe that as hard as this next decade will be from a financial and job-status perspective, there will be positive change.
I believe a lot of people in Gen Z and generations to come won't just go for jobs in typical fields because of the money, they will look for jobs that can make a true impact — both professionally and personally. Having representation in fields that are overly male or overly white do not help people. In fact, we as a society end up working backwards when that happens. Rather, we need to extend our arms out to bring everyone in and be honest about the history of our country -- and where we wish to go from here. These hurdles will be just like the hurdles I once used to jump over as a track athlete. And just like it took hard work, dedication and perseverance in order for me to do my best to get over those hurdles, we will all have to do the same to get over these hurdles.
It's time to tap into the perseverance within all of us to get through these challenging times and make the world a better place for generations to come.
You can search for all <p> tags directly under tags with class group and extract the text non-recursively:
import requests
from bs4 import BeautifulSoup
url = 'https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# uncomment if you want to print the first header:
#print(soup.select_one('.group em').text)
for p in soup.select('.group > p')[:-3]:
for t in p.find_all(text=True, recursive=False):
print(t.strip())
Prints:
In life, challenges arise, but they are meant to be conquered through perseverance, and never giving up. This is something I was taught at a young age. As a kid, I ran track and field, which presented obstacles — both literally and psychologically.
As much as I wanted to believe in never giving up, the notion of doing so always lingered in the back of my mind during harsh practices and races that did not go my way. I did not like losing (even today!), but I especially did not like knowing that I could put everything I had down on the line, and still come up short. It was through this that I realized hard work and dedication do not always bring wins, but they do bring a spirit of perseverance, and from this perseverance — hope.
... and so on.
An alternative approach could be to search for all the things you don't want and remove it using .extract() and then you can loop through the results. The benefit is that it may be easier to read and extend. Also you might want to add validation to assert certain phrases exist.
Here is an example:
from bs4 import BeautifulSoup #py -m pip install beautifulsoup4 --user
import requests
def provide_soup(url):
r = requests.get(url)
r.raise_for_status()
return BeautifulSoup(r.text,'lxml')
def remove_noise(soup):
noise_starting_phraze = ('CNBC\'s "College Voices 2020"', 'More From Invest in You:', 'SIGN UP:', 'CHECK OUT:', 'Disclosure:')
paragraph = soup.find_all('p')
for p in paragraph:
if p.text.strip().startswith(noise_starting_phraze):
p.extract()
def remove_key_point(soup):
key_point = soup.find('div', {"class": "RenderKeyPoints-wrapper"})
key_point.extract()
def provide_content_as_text(soup):
return ''.join([row.text for row in soup.find_all('div', {"class": "group"})])
soup = provide_soup('https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html')
remove_key_point(soup)
remove_noise(soup)
results = provide_content_as_text(soup)
print(results)