Beautiful Soup not finding element by ID - python-3.x

I am trying to scrape incoming headlines from forexfactory.com.
from bs4 import *
import requests
source = requests.get("https://www.forexfactory.com/news").text
soup = BeautifulSoup(source, 'lxml')
list = soup.find(id="ui-outer")
print(list)
This returns None when it should return the Div containing all the headlines. Any idea what would be wrong? I have tried searching by div, by Ul id, by li id and some other ways. It always returns None.
Thank You.

For dynamically created elements give Selenium a try
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
URL = 'https://www.forexfactory.com/news'
driver.get(URL)
Wait a few seconds until also dynamically content is loaded
driver.implicitly_wait(5) # wait for seconds
Get your element
uiOuter = driver.find_element_by_id('ui-outer')
Example for all links (story tile)
aHref = driver.find_elements_by_css_selector('div.flexposts__story-title a')
[x.text for x in aHref]
Output
['EU\'s Barnier says "fundamental divergences" persist in UK trade talks',
'With end of crisis programs, Fed faces tricky post-pandemic transition',
'Markets Look Past Near-Term Challenges',
'Time is short, Divergences remain, but we continue to work hard for a deal',
'EURUSD consolidates around 1.1900; indicators are flat',
'Rush for emerging market company bonds as investors look beyond COVID-19',
'Europe’s Virus Lockdowns Push Economy Into Another Contraction',
'Interactive Brokers enhances Client Portal',
'BoE’s Haldane: Risk That Anxiety Leads To Gloom Loop',
'Sharpest fall in UK private sector output since May. Manufacturing growth offset by renewed...',
'Remote Working Shift Offers Silver Lining for Finance Gender Gap',
'EU Flash PMI signals steep downturn in November amid COVID-19 lockdowns',
'German PMI drops to five-month low in November due to tightening of COVID-19 restrictions, but...',
'Sharp decline in French business activity amid fresh COVID-19 lockdown',
'Rishi Sunak says Spending Review will not spell austerity',
'Remote Working Shift Offers Silver Lining for Finance Gender Gap',
'Japan’s Labor Thanksgiving Day goes viral',
'Ranking Asset Classes by Historical Returns (1985-2020)',
'Time is short, Divergences remain, but we continue to work hard for a deal',
'EURUSD consolidates around 1.1900; indicators are flat',
'US Dollar stuck near support, NZ$ strikes two-year high',
'US Dollar stuck near support, NZ$ strikes two-year high',
'Georgia confirms results in latest setback for Trump bid to overturn Biden win',
'Canada to roll over terms of EU trade deal with UK',
'Time is short, Divergences remain, but we continue to work hard for a deal',
'German PMI drops to five-month low in November due to tightening of COVID-19 restrictions, but...',
"COVID-19: 'No return to austerity', says chancellor as he hints at public sector pay freeze",
'EURUSD consolidates around 1.1900; indicators are flat',
'New Zealand Dollar May Rise as RBNZ Holds Fire on Negative Rates',
'Interactive Brokers enhances Client Portal']

Related

Removing specific <h2 class> from beautifulsoup4 web crawling results

I am currently trying to crawl headlines of the news articles from https://7news.com.au/news/coronavirus-sa.
After I found all headlines are under h2 classes, I wrote following code:
import requests
from bs4 import BeautifulSoup
url = f'https://7news.com.au/news/coronavirus-sa'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
titles = soup.find('body').find_all('h2')
for i in titles:
print(i.text.strip())
The result of this code was:
News
Discover
Connect
SA COVID cases surge into triple digit figures for first time
Massive headaches at South Australian testing clinics as COVID cases surge
Revellers forced into isolation after SA teen goes clubbing while infectious with COVID
COVID scare hits Ashes Test in Adelaide after two media members test positive
SA to ease restrictions despite record number of COVID cases
‘We’re going to have cases every day’: SA records biggest COVID spike in 18 MONTHS
Fears for Adelaide nursing homes after COVID infections creep detected
Families in pre-Christmas quarantine after COVID alert for Adelaide school
South Australia records a JUMP in new COVID-19 cases - including infections in children
‘LOCK IT IN’: Mark McGowan to reveal date of WA’s long-awaited reopening to Australia
BOOSTER BOOST-UP: Australia makes change to COVID-19 vaccinations amid Omicron concern
Frydenberg calls for Aussies to ‘keep calm and carry on’ in the face of COVID-19 Omicron strain
News Just In
Our Network
Our Partners
Connect with 7NEWS
which contains unnecessary texts such as 'News', 'Discover', and 'News Just In'.
This happened as these texts were under h2 class as well. Thus, I added following codes to delete them from the results:
soup.find('h2', id='css-1oh2gv-StyledHeading.e1fp214b7').decompose()
which turns out to have attribute error.
AttributeError: 'NoneType' object has no attribute 'decompose'
I tried clear() methods as well, but it did not give the result that I wanted.
Is there an another way to remove the texts that are unnecessary?
What happens?
Your selection is just too general, cause it is selecting all <h2> and it do not need a .decompose() to fix the issue.
How to fix?
Select the headlines mor specific:
soup.select('h2.Card-Headline')
Example
import requests
from bs4 import BeautifulSoup
url = f'https://7news.com.au/news/coronavirus-sa'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
for h2 in soup.select('h2.Card-Headline'):
print(h2.text)
Output
SA COVID cases surge into triple digit figures for first time
Massive headaches at South Australian testing clinics as COVID cases surge
Revellers forced into isolation after SA teen goes clubbing while infectious with COVID
COVID scare hits Ashes Test in Adelaide after two media members test positive
SA to ease restrictions despite record number of COVID cases
‘We’re going to have cases every day’: SA records biggest COVID spike in 18 MONTHS
Fears for Adelaide nursing homes after COVID infections creep detected
Families in pre-Christmas quarantine after COVID alert for Adelaide school
South Australia records a JUMP in new COVID-19 cases - including infections in children
‘LOCK IT IN’: Mark McGowan to reveal date of WA’s long-awaited reopening to Australia
BOOSTER BOOST-UP: Australia makes change to COVID-19 vaccinations amid Omicron concern
Frydenberg calls for Aussies to ‘keep calm and carry on’ in the face of COVID-19 Omicron strain
Just in addition to answer the question at all
Also to decompose() choose your selection more specific - But as mentioned it is not necessary to do this:
for i in titles:
if 'Heading' in ' '.join(i['class']):
i.decompose()

Change line after saving web crawled data from Beautifulsoup4 as txt file

I had set code to crawl headlines from the website https://7news.com.au/news/coronavirus-sa and tried to save headlines into txt file.
I wrote the following code:
import requests
from bs4 import BeautifulSoup as bs
f = open("/Users/j/Desktop/Python/chatbot project/headlines.txt", 'w')
url = f'https://7news.com.au/news/coronavirus-sa'
r = requests.get(url)
soup = bs(r.text, 'html.parser')
headlines = soup.select('h2.Card-Headline')
for h in headlines:
print(h.text)
f.write(h.text)
f.close()
The result of print(h.text) was:
TENS OF THOUSANDS to spend Christmas in quarantine as Omicron causes COVID carnage
SA records ‘steep increase’ in COVID cases as premier issues ominous warning
South Australia gives nod to widespread rollout of rapid antigen COVID-19 tests
South Australia’s Omicron cases almost TRIPLE as COVID-19 cases surge
Leading doctor’s warning about ‘essential’ and ‘necessary’ spread of COVID-19
SA ambos sound the alarm over rising COVID-19 cases after state records surge
Scott Morrison flags potential changes to COVID-19 approach after National Cabinet
STATE OF THE NATION: Australia fighting to contain COVID as cases soar to record highs
WATCH LIVE: Scott Morrison provides COVID-19 update after National Cabinet meeting
Australia’s COVID cases could hit 250,000 DAILY unless restrictions return
PM’s plea ahead of emergency meeting as he declares ‘we’re not going back to lockdowns’
South Australia scraps testing rule as cases surge to all-time high
The headlines were sorted by lines.
However, when I checked the text file, the result was:
TENS OF THOUSANDS to spend Christmas in quarantine as Omicron causes COVID carnageSA records ‘steep increase’ in COVID cases as premier issues ominous warningSouth Australia gives nod to widespread rollout of rapid antigen COVID-19 tests South Australia’s Omicron cases almost TRIPLE as COVID-19 cases surgeLeading doctor’s warning about ‘essential’ and ‘necessary’ spread of COVID-19SA ambos sound the alarm over rising COVID-19 cases after state records surgeScott Morrison flags potential changes to COVID-19 approach after National CabinetSTATE OF THE NATION: Australia fighting to contain COVID as cases soar to record highsWATCH LIVE: Scott Morrison provides COVID-19 update after National Cabinet meetingAustralia’s COVID cases could hit 250,000 DAILY unless restrictions returnPM’s plea ahead of emergency meeting as he declares ‘we’re not going back to lockdowns’South Australia scraps testing rule as cases surge to all-time high
The line was not separated as expected.
I had tried to split it by recalling the text file and use .split() method, which did not work.
Is there any way to recall this file and split it with lines, or save it as separated at first?
Try to add \n in f.write() so your string h will write to new line
for h in headlines:
print(h.text)
f.write(h.text+"\n")
f.close()

How to extract a particular element/text inside an HTML using Python 3.x

This is my code:
import requests
from bs4 import BeautifulSoup
r=requests.get('https://www.morningstar.com/stocks/xtse/enb/quote')
c=r.content
soup=BeautifulSoup(c,"html.parser")
print(soup.prettify())
for item in soup.find("byId: {}".text):
print(item.text)
Once I ran that, on the bottom most of the whole html file it shows:
window.__NUXT__=(function(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N){J.id=1014028;J.title="Enbridge Expects a Rebound in 2021, Increases Dividend by 3%";J.deck=y;J.locale=K;J.publishedDate=L;J.updatedDate=L;J.paywalled=b;J.authors=[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}];J.authorDisclosure=[];J.body=[{type:w,contentObject:[{type:x,content:"Wide-moat Enbridge announced 2021 full-year adjusted EBITDA guidance of CAD 13.9 billion-CAD 14.3 billion at its annual investor day on Dec. 8, which is above our previous forecasts. The midpoint of the guidance also implies a 6% increase from 2020 expected EBITDA. Enbridge believes that the increased performance will be driven by a recovery of Mainline volumes and the associated downstream pipelines; customer growth in the gas utilities business; rate increases on its gas pipelines; and the impact of new projects, including the Line 3 replacement. At this point, the Mainline’s heavy oil capacity is full, and demand for light capacity continues to increase. Accordingly, Enbridge expects first-quarter 2021 volumes to average 2.7 million barrels of oil per day, which is also above our previous expectations and compares favorably with 2.56 mmbbl\u002Fd in third-quarter 2020.",gated:a}],gated:a}];M.name=H;M.performanceId=A;M.secId=A;M.ticker=N;M.exchange=I;M.type=F;return {layout:"default",data:[{marketPrice:{value:43.55,filtered:a,date:{value:v,filtered:a}},premiumDiscount:{value:"-92.6",filtered:b,date:{value:v,filtered:a},text:{value:"Discount",filtered:a},type:{value:D,filtered:a}},threeStarRatingPrice:{value:"86.9",filtered:b},headquarterAddress1:{value:"425-1st Street SW",filtered:a},headquarterAddress2:{value:"Suite 200, Fifth Avenue Place",filtered:a},industry:{value:"Oil & Gas Midstream",filtered:a},stockStarRating:{value:"6",filtered:b,date:{value:v,filtered:a},text:{value:"Undervalued",filtered:a},type:{value:D,filtered:a}},fiscalYearEndDate:{value:"2020-12-31",filtered:a},headquarterState:{value:"AB",filtered:a},reportDate:{value:"2020-09-30",filtered:a},headquarterPostalCode:{value:"T2P 3L8",filtered:a},stewardshipRating:{value:"Bkwvsknl",filtered:b,date:{value:v,filtered:a}},companyProfile:{value:"Enbridge is an energy generation, distribution, and transportation company in the U.S. and Canada. Its pipeline network consists of the Canadian Mainline system, regional oil sands pipelines, and natural gas pipelines. The company also owns and operates a regulated natural gas utility and Canada’s largest natural gas distribution company. Additionally, Enbridge generates renewable and alternative energy with 2,000 megawatts of capacity.",filtered:a},fairValue:{value:"23.3",filtered:b,date:{value:"2020-12-08",filtered:a},type:{value:D,filtered:a}},fax:{value:"+1 403 231-5929",filtered:a},fiveStarRatingPrice:{value:"58.3",filtered:b},sector:{value:"Energy",filtered:a},economicMoat:{value:"Nsfq",filtered:b,date:{value:v,filtered:a}},ticker:N,website:{value:"https:\u002F\u002Fwww.enbridge.com",filtered:a},headquarterCountry:{value:"Canada",filtered:a},contactEmail:{value:"investor.relations#enbridge.com",filtered:a},fourStarRatingPrice:{value:"77.0",filtered:b},economicMoatTrend:{value:"Zcbqwh",filtered:b,date:{value:v,filtered:a}},headquarterCity:{value:"Calgary",filtered:a},phone:{value:"+1 403 231-3900",filtered:a},universe:{value:"EQ",filtered:a},exchange:I,totalEmployees:{value:11300,filtered:a},twoStarRatingPrice:{value:"54.91",filtered:b},fairValueUncertainty:{value:"Nnczrm",filtered:b},name:H,performanceId:A,secId:A,type:F,articles:[{title:"Epic Oil Crash Sets Up Brutal Downturn for Energy Sector",link:"\u002Farticles\u002F980350\u002Fepic-oil-crash-sets-up-brutal-downturn-for-energy-sector",caption:"But recovery is inevitable, and stocks look very cheap--just watch out for bankruptcy risk.",author:"Preston Caldwell",label:C,isVideo:a},{title:"Enbridge's Sell-Off Looks Exaggerated",link:"\u002Farticles\u002F978902\u002Fenbridges-sell-off-looks-exaggerated",caption:"The market is underestimating long-term cash flows once oil prices normalize.",author:c,label:z,isVideo:a},{title:"Executive Orders More Symbolic Than Material for Pipelines",link:"\u002Farticles\u002F923945\u002Fexecutive-orders-more-symbolic-than-material-for-pipelines",caption:"We're not changing our outlook for our midstream coverage.",author:"Stephen Ellis",label:C,isVideo:a},{title:"New Permit Moves Keystone XL Forward",link:"\u002Farticles\u002F922171\u002Fnew-permit-moves-keystone-xl-forward",caption:"Our fair value estimates for TransCanada and Enbridge are unchanged.",author:c,label:z,isVideo:a},{title:"Enbridge Hikes Dividend, Remains Deeply Undervalued",link:"\u002Farticles\u002F904922\u002Fenbridge-hikes-dividend-remains-deeply-undervalued",caption:"It has a wide moat and an attractive yield, and now's the time to invest.",author:c,label:z,isVideo:a},{title:"Concerns About Enbridge's Dividend Are Overblown",link:"\u002Farticles\u002F870105\u002Fconcerns-about-enbridges-dividend-are-overblown",caption:"The wide-moat company is on course to boost its dividend and offers hefty upside.",author:c,label:z,isVideo:a},{title:"Enbridge's Growth Portfolio Is Underappreciated",link:"\u002Farticles\u002F841990\u002Fenbridges-growth-portfolio-is-underappreciated",caption:"And the company continues to reward investors with annual dividend growth.",author:c,label:z,isVideo:a},{title:"Enbridge's Economic Moat Widens",link:"\u002Farticles\u002F565094\u002Fenbridges-economic-moat-widens",caption:"Shifting economics and supply dynamics provide growth opportunities.",author:"David McColl",label:C,isVideo:a}],analysis:{id:1014019,title:"Enbridge Increases Its Dividend by 3%",locale:K,publishedDate:B,updatedDate:B,paywalled:b,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],authorDisclosure:[],body:[],pillars:{investmentThesis:{title:"Business Strategy and Outlook",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge is an energy distribution and transportation company in the United States and Canada. It operates crude and natural gas pipelines, including the Canadian Mainline system. It also owns and operates Canada's largest natural gas distribution company.",gated:a}],gated:a}]},moat:{title:"Economic Moat",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Midstream companies process, transport, and store natural gas, natural gas liquids, crude oil, and refined products. There are multiple ways for midstream companies to build moats, but efficient scale is the dominant source. Hydrocarbons are produced and consumed in different places and in different forms from how they come out of the ground. Midstream firms transport and process hydrocarbons. Once a transport route is established, there's usually little need to build a competing route. Doing so would drive returns for both routes below the cost of capital. Thus, pipelines are generally moaty because they efficiently serve markets of limited size.",gated:a}],gated:a}]},managementAndStewardship:{title:"Stewardship",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"President and CEO Al Monaco has been with Enbridge since 1995, serving in his current role since 2012. During his tenure at Enbridge, Monaco has experience in all business segments, international business development, corporate planning, and finance. His experience in various business segments, corporate development, growth projects, and finance positions him to successfully lead the proposed pipeline and gas distribution growth projects.",gated:a}],gated:a}]},enterpriseRisk:{title:"Risk and Uncertainty",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge’s profitability is not directly tied to commodity prices, as pipeline transportation costs are not tied to the price of natural gas and crude oil. However, the cyclical supply and demand nature of commodities and related pricing can have an indirect impact on the business as shippers may choose to accelerate or delay certain projects. This can affect the timing for the demand of transportation services and\u002For new gas pipeline infrastructure.",gated:a}],gated:a}]},valuation:{title:"Fair Value and Profit Drivers",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Our fair value estimate of $43 (CAD 57) per share is based on a discounted cash flow model. We believe that Enbridge’s broad network of midstream assets and geographic diversification will serve it well in the low oil and gas price environment, and crude and natural gas pipeline expansions in growing regions will fuel EBITDA growth. Our cash flow forecasts incorporate the addition of the Line 3 replacement pipeline, but we adjusted our Canadian fair value downward to a reflect a risk-weighted probability of 80% that the pipeline is built.",gated:a}],gated:a}]},notes:J},notes:J},listedCurrency:{value:"CAD",filtered:a}},{}],error:y,state:{history:{currentRoute:"\u002Fstocks\u002Fxtse\u002Fenb\u002Fquote",previousRoute:y,returnRoute:y},ids:{byTicker:{"ST::XTSE::ENB":M},byId:{"0P0000681O":M}},markets:{movers:{gainers:[],losers:[],actives:[]},quotes:{},trailingReturns:{},intradayTimeSeries:{},lastRefreshed:y},player:{nowPlaying:y},siteAlert:{message:G,type:G},user:{userType:"visitor",isAdvisor:a,contentType:"e7FDDltrTy+tA2HnLovvGL0LFMwT+KkEptGju5wXVTU="}},serverRendered:b,serverDate:new Date(1607916811524)}}(false,true,"Joe Gemino","https:\u002F\u002Fim.mstar.com\u002FContent\u002FCMSImages\u002F78x78\u002F2008-jgemino-78x78.jpg","Joe Gemino, CPA, is a senior equity analyst for Morningstar.","Joe Gemino, CPA","Senior Equity Analyst","2008","MST50","SGDLX","FB","TRRNX","DODIX","DODGX","MORN","MOAT","AAPL","V","T","DIS","DODBX","2020-12-11","p","text",null,"Stock Strategist","0P0000681O","2020-12-08T18:13:00Z","Stock Strategist Industry Reports","Qual","2020-04-16T14:59:00Z","ST","","Enbridge Inc","XTSE",{},"en-US","2020-12-08T18:28:00Z",{},"ENB"));
My question: how do I extract the information inside "byID:" so that print(item.text) will give me "0P0000681O" only.
If you only need to get a value, use a string find.
from simplified_scrapy import utils, SimplifiedDoc, req
html = req.get(
'https://www.morningstar.com/stocks/xtse/enb/quote')
start = html.find('byId:{"')
html = html[start+len('byId:{"'):]
end = html.find('":')
print(html[:end])
Result:
0P0000681O

Exclude text of more than one tag using Beautifulsoup

I want to scrape texts from the below website using beautifulsoap, but not all text. so I want to avoid a text contained in any of the following:
1- text contained in a link
2- text contained in or describe the images.
3- avoid the last sentences that contain words such as 'Disclosure'.
I tried the following but did not work properly, so any help would be really appreciated
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html')
soup = BeautifulSoup(r.text,'lxml')
txt = ''
for row in soup.find_all('div', {"class": "group"}):
if row.a:
continue
txt += ''.join(row.text)
print(txt)
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html')
soup = BeautifulSoup(r.text,'lxml')
divs = soup.find_all("div", class_="group")
data = []
for div in divs:
p = div.find_all("p")
for i in p:
if i.find_all("a"):
for j in i.find_all("a"): j.extract()
if i.find_all("img"):
for j in i.find_all("img"): j.extract()
if "disclosure" not in i.text.lower(): data.append(i.get_text(strip=True))
print("\n".join(data))
Output:
CNBC's "College Voices 2020" is a series written by CNBC summer interns from universities across the country about coming of age, launching new careers and job hunting during a global pandemic. They're finding their voices during a time of great social change and hope for a better future. What money issues are they facing? How are they navigating their student loans? How are they getting work experience, networking and applying for jobs when so many opportunities have been canceled or postponed? How important is diversity and a company's values to Gen Z job seekers?
In life, challenges arise, but they are meant to be conquered through perseverance, and never giving up. This is something I was taught at a young age. As a kid, I ran track and field, which presented obstacles — both literally and psychologically.
As much as I wanted to believe in never giving up, the notion of doing so always lingered in the back of my mind during harsh practices and races that did not go my way. I did not like losing (even today!), but I especially did not like knowing that I could put everything I had down on the line, and still come up short. It was through this that I realized hard work and dedication do not always bring wins, but they do bring a spirit of perseverance, and from this perseverance — hope.
I did not prepare for the world we are in today. No one did. Coronavirus has devastated the economy — the unemployment rate has skyrocketed and the job market is the worst since the Great Depression. Graduating, finding that first job and launching your adult life are difficult enough but add in all of this and it can be overwhelming. Yet, just like how I used to jump over hurdles as a runner back in high school, all of these issues are just obstacles that we need to jump over in order to press forward with our lives and our careers.
It all comes down to one thing: perseverance.
The coronavirus has changed all of our lives: The way we consume, go outside, and work have all changed because of this pandemic. We need perseverance more now than ever!
One obstacle in particular that I believe my generation will have to overcome is social justice. Through the death of George Floyd, protests have occurred all over the country. What many are hoping, including myself, is this wave of protests help bring not only light, but true change in terms of how African Americans are treated in the system and everywhere – including the workplace. This includes being equally paid, better represented in positions of power and more financially supported on the local level. As an African American, I know the continuous struggle of not only having to be your best in every place you step foot in, but knowing a mistake could be the spark that confirms a bias or stereotype that is the product of generations worth of racism and abuse. I hope that as bad as this coronavirus has been, and the protests that have come from generations worth of frustration and anger … that perseverance and goodness come out through it.
More From Invest in You:
Another hurdle I believe my generation has to overcome is student-loan debt. This has always been an issue but now, many classes are being pushed online, but the cost of these classes has not gone down. The price tag that comes along with a four-year+ degree adds up to decades worth of paying back debt. (Thefor student loans of between $20,000 and $40,000, according to the Department of Education.) And sometimes you don't even wind up working in the field you have your degree in! I know personally that my first priority in finding a job is to be able to pay off my loans as quickly as possible. The ever daunting thought of carrying thousands of dollars worth of debt does not rest easy in my mind. But I know that the education I received has given me the opportunity to garner wealth for generations to come. My hope is that I can persevere through this battlefield called life with my newfound skills and knowledge.
And, of course, for all of us in Generation Z, there is the challenge of finding our first job out of college. What would be a daunting task already, has been heightened by the coronavirus. For me personally, I find that my job search has turned from wanting to find a Fortune 500 company job, to wanting a more local business. My thought behind this is finding ways to help businesses that want bright and hungry young people who might otherwise not get that talent because of the enticing income that comes from larger corporations. I hope to put my skills to work to help a company move toward a brighter tomorrow!
in the U.S. right now, according to the Labor Department — and the number is bound to go higher if restrictions on businesses stay where they are or increase. The struggle for all of these people to find good paying jobs will be reminiscent of the Great Depression fight for jobs — long lines and countless job applications to simply find a job to help pay bills. There are already so many people out of work and an entire generation graduating college and trying to join the work force — it becomes simple math to understand that not every one of us will get a job. This makes me concerned and frustrated but this obstacle will not stop me. The perseverance I learned growing up will help me to not give up when faced with challenges, no matter how big they are. I owe it to my family and friends to strive for success, because it is only then that I can be a better individual.
I also want my generation to do better for the environment. Like most people, I take the danger of climate change and global warming very seriously. What we do today will impact the way we are all able to survive in the future. As a result, the way that we work, operate and consume must change as well. Like many Gen Z young adults, I want the impact I make in my everyday life to help maintain our Earth, especially for the future generations that come after us.
Hard work and dedication don't always bring wins, but they do bring a spirit of perseverance. Perseverance brings hope, and hope brings forth action. As an African American, I think the future looks bright, but only because of the groundwork that my generation is putting forth to not let underrepresented voices go unheard. I truly believe that as hard as this next decade will be from a financial and job-status perspective, there will be positive change.
I believe a lot of people in Gen Z and generations to come won't just go for jobs in typical fields because of the money, they will look for jobs that can make a true impact — both professionally and personally. Having representation in fields that are overly male or overly white do not help people. In fact, we as a society end up working backwards when that happens. Rather, we need to extend our arms out to bring everyone in and be honest about the history of our country -- and where we wish to go from here. These hurdles will be just like the hurdles I once used to jump over as a track athlete. And just like it took hard work, dedication and perseverance in order for me to do my best to get over those hurdles, we will all have to do the same to get over these hurdles.
It's time to tap into the perseverance within all of us to get through these challenging times and make the world a better place for generations to come.
You can search for all <p> tags directly under tags with class group and extract the text non-recursively:
import requests
from bs4 import BeautifulSoup
url = 'https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# uncomment if you want to print the first header:
#print(soup.select_one('.group em').text)
for p in soup.select('.group > p')[:-3]:
for t in p.find_all(text=True, recursive=False):
print(t.strip())
Prints:
In life, challenges arise, but they are meant to be conquered through perseverance, and never giving up. This is something I was taught at a young age. As a kid, I ran track and field, which presented obstacles — both literally and psychologically.
As much as I wanted to believe in never giving up, the notion of doing so always lingered in the back of my mind during harsh practices and races that did not go my way. I did not like losing (even today!), but I especially did not like knowing that I could put everything I had down on the line, and still come up short. It was through this that I realized hard work and dedication do not always bring wins, but they do bring a spirit of perseverance, and from this perseverance — hope.
... and so on.
An alternative approach could be to search for all the things you don't want and remove it using .extract() and then you can loop through the results. The benefit is that it may be easier to read and extend. Also you might want to add validation to assert certain phrases exist.
Here is an example:
from bs4 import BeautifulSoup #py -m pip install beautifulsoup4 --user
import requests
def provide_soup(url):
r = requests.get(url)
r.raise_for_status()
return BeautifulSoup(r.text,'lxml')
def remove_noise(soup):
noise_starting_phraze = ('CNBC\'s "College Voices 2020"', 'More From Invest in You:', 'SIGN UP:', 'CHECK OUT:', 'Disclosure:')
paragraph = soup.find_all('p')
for p in paragraph:
if p.text.strip().startswith(noise_starting_phraze):
p.extract()
def remove_key_point(soup):
key_point = soup.find('div', {"class": "RenderKeyPoints-wrapper"})
key_point.extract()
def provide_content_as_text(soup):
return ''.join([row.text for row in soup.find_all('div', {"class": "group"})])
soup = provide_soup('https://www.cnbc.com/2020/07/16/perseverance-is-key-for-gen-z-to-succeedand-create-change-in-the-world.html')
remove_key_point(soup)
remove_noise(soup)
results = provide_content_as_text(soup)
print(results)

Python3: writing article in own words

I am trying to extract the summary from news article. Here is what I have tried till now:
>>> from newspaper import Article
>>> url = 'http://abcnews.go.com/International/wireStory/north-korea-ready-deploy-mass-produce-missile-47552675'
>>> article = Article(url)
>>> article.download()
>>> article.parse()
>>> article.nlp()
>>> article.keywords
['ready', 'north', 'test', 'missiles', 'deploy', 'tested', 'korea', 'missile', 'launch', 'nuclear', 'capable', 'media', 'massproduce']
>>> article.summary
'North Korea says it\'s ready to deploy and start mass-producing a new medium-range missile capable of reaching Japan and major U.S. military bases there following a test launch it claims confirmed the missile\'s combat readiness and is an "answer" to U.S. President Donald Trump\'s policies.\nPyongyang\'s often-stated goal is to perfect a nuclear warhead that it can put on a missile capable of hitting Washington or other U.S. cities.\nAt the request of diplomats from the U.S., Japan and South Korea, a United Nations\' Security Council consultation on the missile test will take place Tuesday.\nNorth Korea a week earlier had successfully tested a new midrange missile — the Hwasong 12 — that it said could carry a heavy nuclear warhead.\nExperts said that rocket flew higher and for a longer time than any other missile previously tested by North Korea and represents another big advance toward a viable ICBM.'
I have seen that the summary generated in the above paragraph is taken exactly from the news article itself. Whereas I want to achieve human like summarization (In own words or spin content or anything, but should be relevant).
Kindly, advice me or suggest me the what I need to do so that my code works exactly what I want?
There is sumy which does offer several ways to summarize english texts. Most (if not all) of those algorithm will extract sentences from the input document. Based on those sentences you can postprocess them to split and/or merge sentences and use synonyms.
Outside that, this topic is still not much in the field of engineering but research. Try AI StackExchange.

Resources