Scraping Data Using Requests and Beautifulsoup - python-3.x
I am trying to scrape data from this link. Where I want to first find all headings that are in bold.
I've achieved the above task using code below:
url = 'https://www.emirates.com/pk/english/help/covid-19/dubai-travel-requirements/tourists/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
headers = []
for sib in soup.findAll('strong'):
headers.append([sib.text])
The problem is there is a bold text in li tag I don't want that as header. E.g. If you are flying from India, Pakistan, Nigeria or Bangladesh is considered as header I don't want that to be included in header as it is in li tag. How can I solve this?
Next part where I am stuck is that I want to scrape all text under these headers. To achieve that I've written the following code:
main_data = []
data_str = ''
for i in range(0, len(headers)):
target = soup.find(['h3', 'p'], text=headers[i])
for sib in target.find_next_siblings():
if sib.name == "strong":
break
else:
data_str = sib.text + "."
main_data.append([data_str])
Currently the output contains list of lists but each tag is made a list. Also the content and headers are repeating.
The expected output is a list of lists containing text scraped from under each header.
Example:
For header Passengers will need to do COVID‑19 PCR tests only if it is mandated by the country they are travelling to.
main_data[0] = Please check the requirements of the country you are travelling to. The travel regulations change frequently. You may need to take a COVID‑19 PCR test before you depart or another particular type of COVID‑19 test specified by your destination.
This is a list of authorised COVID‑19 test laboratories in Dubai where you can get tested before you travel to your destination.
Solution of the first part can be:
import requests
from bs4 import BeautifulSoup
url = 'https://www.emirates.com/pk/english/help/covid-19/dubai-travel-requirements/tourists/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
headers = []
for sib in soup.select('p > strong'):
headers.append([sib.text])
for sib in soup.select('h3 > strong'):
headers.append([sib.text])
headers
Output:
[['Passengers will need to do COVID-19 PCR tests only if it is mandated by the country they are travelling to.'],
['Rapid COVID-19 testing at Dubai International airport for flights to China'],
['Special COVID-19 PCR test rates for Emirates passengers'],
['Requirements for all passengers arriving in Dubai'],
['Indian Nationals with a normal passport who are travelling to or from India via Dubai can obtain a visa on arrival in Dubai for a maximum stay of 14 days provided they:'],
['Test on arrival'],
['Transiting in Dubai'],
['Test exemptions:'],
['COVID-19 testing laboratories:'],
['Arriving passengers'],
['Vaccination certificate verification'],
['Before you book'],
['Before you travel'],
['When you arrive']]
Solution of the second part:
main_data = []
for i in range(0, len(headers)):
target = soup.find(['h3', 'p'], text=headers[i])
text = ''
for sib in target.find_next_siblings():
if sib.select_one('p > strong') is not None or sib.select_one('h3 > strong') is not None:
break
else:
text += sib.text
main_data.append([text])
main_data
Output:
[['Please check the requirements of the country you are travelling to. The travel regulations change frequently. You may need to take a COVID-19 PCR test before you depart or another particular type of COVID-19 test specified by your destination.This is a list of authorised COVID-19 test laboratories in Dubai\ufeff where you can get tested before you travel to your destination.'],
['All passengers, except children under the age of 1, who are travelling to China must have a negative rapid COVID-19 test certificate before travel. You must report to the check-in counter 5 hours before your flight and take this test at Dubai International airport at Emirates Terminal 3 departure area, next to Costa. For further information please refer to the travel requirements for China.'],
['Emirates has expanded its medical partnerships to offer all passengers exclusive home or office COVID-19 PCR testing rates at the following centres:Al Tadawi Medical CentreLocated at Al Masood building, Airport Road, Port Saeed area, Deira.The test costs AED 130 per person. Home or office testing within Dubai costs AED 240 per person. Test results will be available within 24 hours.Prime Medical CentresLocations in Dubai:\nAl Qusais Branch, Damascus StreetPremier Diagnostic and Medical Center, Salah Al Din Street\xa0Prime Corp Medical Center, Salah Al Din Street, DeiraSheikh Zayed Branch, Sheikh Zayed Street, near Noor Islamic BankPrime Specialist Medical Center Sharjah Branch, King Faisal St, Al MajazAjman Branch, Grand Mall, Sheikh Khalifa StThe test costs AED 150 per person. Home or office testing within Dubai for a minimum of two passengers is also available at AED 240 per person. Test results will be available within 24 hours.'],
["All passengers travelling to Dubai from any point of origin (GCC countries included) must hold a negative COVID-19 RT-PCR test certificate for a test taken no more than 72 hours before departure, except for travel from Bangladesh, Ethiopia, India, Nigeria, Pakistan, Sri Lanka, South Africa, Uganda, Vietnam, Zambia (for which specific requirements are stated above). Please see the requirements for travel from India below.The certificate must be a Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test. Other test certificates including antibody tests, NHS COVID Test certificates, Rapid PCR tests and home testing kits are not accepted in Dubai. Travellers must bring an official printed or digital certificate in English or Arabic to check in – SMS certificates are not accepted. PCR certificates in other languages are acceptable if they can be validated at the originating station.COVID-19 RT-PCR test certificates must be issued by an authorised facility in the passenger's departure country. Certificates that have already been presented for travel to another destination can't be used for re-entry even if they are still within the validity period.For passengers arriving from the following countries, it is mandatory that the COVID-19 PCR report includes a QR code linked to the original report for verification purposes. The QR code must be presented at check-in and to representatives of the Dubai Health Authority (DHA) upon arrival in Dubai airports: Indonesia, Sudan, Lebanon, Egypt and Ethiopia."],
['have a visitor visa or a green card issued by the Unites States, ora residence visa issued by the United Kingdom or Europe unionThe visa issued by United States, United Kingdom or Europe union has to be valid for a minimum of 6 months'],
['Passengers arriving in Dubai from the following countries will be required to take another COVID-19 PCR test on arrival at Dubai International airport:Afghanistan, Angola, Argentina, Azerbaijan, Bahrain, Bangladesh, Bosnia & Herzegovina, Brazil, Cambodia, Chile, Croatia, Cyprus, Djibouti, Egypt, Eritrea, Ethiopia, Georgia, Ghana, Greece, Guinea, Hungary, India, Indonesia, Iran, Iraq, Israel, Ivory Coast, Jordan, Kenya, Kuwait, Kyrgyzstan, Lebanon, Malta, Moldova, Montenegro, Morocco, Myanmar, Nepal, Pakistan, Poland, Philippines, Qatar, Rwanda, Russia, Senegal, Slovakia, Somaliland, Somalia, South Africa, South Sudan, Sudan, Syria, Tajikistan, Tanzania, Thailand, Tunisia, Turkey, Turkmenistan, Uganda, Ukraine, Uzbekistan, Zimbabwe.'],
['All transit passengers must complete all the requirements of their final destination.Transit passengers from the following countries must present a negative COVID-19 PCR test certificate for a test taken no more than 72 hours before departure:\xa0Bangladesh, Ethiopia, India, Nigeria, Pakistan, Sri Lanka, South Africa, Uganda, Vietnam, Zambia, IndonesiaAll other transit passengers are not required to present this certificate unless it is mandated by their final destination.'],
["UAE nationals are exempt from taking a COVID-19 PCR test before departing for Dubai. They must be tested on arrival in Dubai, irrespective if they are holding a valid negative COVID-19 RT-PCR certificate from the point of origin.\n\nThis is also applicable for:\nPassengers accompanying a 1st degree UAE nationals' relative or domestic workersDomestic workers escorting a UAE national sponsor during travel.Children under the age of 12 and passengers who have a moderate or severe disability are exempt from taking a COVID-19 RT-PCR test.\nModerate or severe disability includes neurological disorders and intellectual or developmental disabilities. For example: Acute spinal cord injury, Alzheimer's disease, Amyotrophic lateral sclerosis (ALS), Ataxia, Autism spectrum, Bell's palsy, Brain tumours, Cerebral aneurysm, Cerebral palsy, Down Syndrome, Epilepsy and seizuresAll other passengers, including those who are visually impaired, hearing impaired or physically challenged must hold a negative COVID-19 RT-PCR test certificate as per the requirements.There may be specific test exemptions in your country of origin and final destination. Please check the requirements before you travel."],
['The UAE government has specified designated laboratories.\ufeff You can either use the recommended laboratories in the list or any trusted and certified laboratories in your country of origin to get your COVID-19 RT-PCR test.If you are flying from India, Pakistan, Nigeria or Bangladesh , you must get your certificate from one of the labs listed in the designated laboratories document to be accepted on the flight.'],
['Passengers who are planning to travel to Abu Dhabi must comply with the following protocols in place at all Abu Dhabi borders. These procedures may affect travel time.Effective 5 September 2021, Abu Dhabi authorities have revised the rules and updated travel procedures for UAE citizens and residents as well as visitors entering Abu Dhabi.Vaccinated travellersVaccinated travellers from green list destinations must take a COVID-19 PCR test on arrival and on day 6 after arrival but do not have to undergo quarantine.When arriving from other destinations (non-green), they must take a COVID-19 PCR test on arrival and on days 4 and 8 after arrival but do not have to undergo quarantine.The protocol applies to fully vaccinated UAE citizens and residents as well as visitors, which is also documented in the Alhosn App.Unvaccinated travellersUnvaccinated citizens, residents and visitors arriving into Abu Dhabi from green list destinations must take a COVID-19 PCR test on arrival and on days 6 and 9 after arrival but do not have to undergo quarantine.When arriving from other destinations (non-green), they must take a COVID-19 PCR test on arrival, quarantine for 10 days, wear a medically approved wristband and take another COVID-19 PCR test on day 9 of quarantine.To be considered fully vaccinated, individuals must have received two doses of the same vaccine at least 14 days before departure.'],
['Before departure, visitors need to register in the Register Arrivals section of the Federal Authority for Identity and Citizenship (ICA) app, complete the register arrivals form and upload an international vaccination certificate. Visitors will then receive an SMS including a link to download the Alhosn app.Upon arrival in Abu Dhabi, visitors will receive a Unified Identification Number (UID) either at the airport or via ICA app or website. Visitors will then need to download and register on the Alhosn app using the UID and phone number used for ICA registration or when taking a COVID-19 PCR test in the UAE.Visitors will receive a one-time password (OTP) to complete the Alhosn app registration process. Alhosn app allows users to check status, vaccination information, test results and travel test requirements and use a live QR code.These tests and processes are a legal requirement and those failing to follow this process are liable for fines.Find out where to get tested in Dubai before you enter Abu Dhabi\ufeff.'],
['Check if you need a visa.\ufeff Depending on your nationality you can get a visa on arrival, or you can apply for your prearranged visit visa from Dubai Immigration before you travel.'],
['GDRFA\ufeff or ICA\ufeff approval is not required for tourists travelling to the UAE.Passengers arriving from the following countries must follow specific protocols:Bangladesh, Ethiopia, India, Nigeria, Pakistan, Sri Lanka, South Africa, Uganda, Vietnam, ZambiaRequirements for passengers from these countries:A valid negative COVID-19 PCR test certificate with a QR code issued within 48 hours prior to departure from an approved health facilityA rapid PCR test report with a QR code for a test conducted at the departure airport within six hours of departureFor passengers travelling to Dubai as their final destination from Bangladesh, Nigeria, Vietnam and Zambia, travel is currently not possible as there are no rapid PCR testing facilities at the airport.'],
['You may need to take another COVID-19 PCR test on arrival. If you take a test at the airport, you must remain in your hotel or residence until you receive the test result.If the test result is positive, you will be required to undergo isolation and follow the Dubai Health Authority guidelines.You must also download the COVID19 – DXB Smart App iOS\ufeff-Android\ufeff']]
Related
ValueError: k must be less than or equal to the number of training points
I am trying BerTopic on a cluster of sentences. I have actually employed Agglomerative clustering using Bert sentence embeddings, the result has many clusters one of them is this docs=["PARIS:France’s trade unions called for mass protests and strikes over pension reform that have brought much of the country to a halt to carry on next week, piling more pressure on President Emmanuel Macron.Commuters faced severe disruption getting to work on Friday, hospitals have been left understaffed and Paris City Hall said dozens of schools in the capital would stay closed, as unions dug in over Macron’s plans to streamline one of the developed world’s most generous pension systems.Transport workers went on strike on Thursday and took to the streets – joined by teachers, doctors, police, firemen and civil servants. Smoke and tear gas swirled through parts of Paris and Nantes as protests turned violent.Union leaders said public workers should maintain their industrial action until Tuesday when they urged members to flood the streets once again.“Unions will meet on Tuesday evening to decide on our next actions if by then Macron and (Prime Minister) Edouard Philippe has not reversed course and opened negotiations,” Catherine Perret of the hard-left CGT union told reporters.The strike pits Macron, a 41-year-old former investment banker who took office in 2017 on a promise of opening up France’s highly regulated economy, against powerful unions who say he is set on dismantling worker protections.“We’re going to protest for a week at least, and at the end of that week it’s the government that’s going to back down,” said 50-year-old Paris transport employee Patrick Dos Santos.The outcome depends on who blinks first – the unions who risk losing public support if the disruption goes on for too long, or the president whose two-and-a-half years in office have been rocked by waves of social unrest.Macron’s pension tsar Jean-Paul Delevoye is due to hold talks with the unions on Monday before the prime minister presents the broad outlines of the proposal to the public mid-week.Education Minister Jean-Michel Blanquer said far-reaching reform was needed to put the generous pension system on a sustainable footing. Fewer teachers went on strike on Friday, education ministry data showed.“It would be much easier for us to do nothing,” Blanquer told BFM TV. “We could see through this five-year term without enacting deep reform. But if every presidency reasons in this way, our children will not have an acceptable pension system.”Police had used tear gas in central Paris on Thursday afternoon when hooded protesters on the fringes of the trade unions’ march threw fireworks at officers, ransacked bus stops, and set fire to rubbish bins.More than 800,000 people rallied in protests countrywide on Thursday. Union leaders put the numbers higher.“There’s a noise in the streets, I hope the windows of the Elysee are open,” said Philippe Martinez, secretary-general of the CGT union, referring to the president’s office.Macron wants to simplify the unwieldy pension system, which comprises more than 40 different plans. Rail workers and mariners can, for instance, retire up to a decade earlier than the average worker.The president says the system is unfair and too costly and that the French will have to work longer, though he appears reluctant to simply raise the retirement age of 62.One alternative is to curb benefits for those who stop working before 64 and give a benefits boost to those who leave later.", "The French -- and particularly Parisians -- are face to face with what may be the largest strike in the country's history.On the heals of the Yellow Vest protests, employees of various sectors are preparing to go on indefinite strike beginning Thursday to protest pension reforms by the government of French President Emmanuel Macron.The walkout was sealed when the government announced its determination to implement pension reform despite pushback.According to France’s National Institute of Statistics and Economic Studies, Macron has further fueled the sense of anger and rebellion among French people against their presidents, with his economic policies that have given the wealthy a greater share of national income since his inauguration on May 17, 2017.He has been facing the biggest crisis since the yellow vest protests.The reform would lift the privileges granted to civil servants and gradually increasing the retirement age from 62 to 64. It is expected to adversely affect many sectors.Long list of strikersAmong the strikers will be employees of national carrier Air France, state-owned Parisian public transport operator RATP, electricity company EDF that is largely owned by the government, state-owned national railway firm SNCF, and automobile manufacturer Renault.Police, healthcare professionals, teachers, lawyers, taxi and freight drivers, postal workers, farmers, civil servants, refinery workers and students will also participate.Over half of all schools across the country will be suspended, while nearly all commuter trains and buses will halt and or work in intermittently. Air France will cancel 30% of its flights.The Yellow Vest protests started Nov. 17, 2018 in reaction to rising fuel costs and economic injustice, later spiraling into deadly anti-government riots.Protesters used yellow vests, part of the standard safety kit in French cars, to make their members more easily visible.The demonstrations left 11 dead and more than 4,000 injured including protesters and the police, according to government figures.Activists claim that 24 protesters were blinded in one eye and that five lost one of their hands.At least some 8,400 people have been arrested since the beginning of the Yellow Vest protests, and 2,000 were remanded into custody.A total of 17 protestors were arrested in Toulouse and five people -- two police and three civilians -- were injured.", "PARIS-The Eiffel Tower shut down, France’s vaunted high-speed trains stood still and several thousand people protested in Paris as unions launched open-ended, nationwide strikes Thursday over the government’s plan to overhaul the retirement system.Paris authorities barricaded the presidential palace and deployed 6,000 police as activists - many in yellow vests representing France’s year-old movement for economic justice - gathered in the capital in a mass outpouring of anger at President Emmanuel Macron and his centerpiece reform.Unions and their supporters fear that the changes to how and when workers can retire will threaten the hard-fought French way of life. Macron himself remained “calm and determined” to push it through, according to a top presidential official.The Louvre Museum warned of strike disruptions, and subway stations across Paris shut their gates. Many visitors - including the U.S. energy secretary - canceled plans to travel to one of the world’s most-visited countries amid the strike. Unprepared tourists discovered historic train stations standing empty Thursday, with about nine out of 10 of high-speed TGV trains canceled. Signs at Paris’ Orly Airport showed “canceled” notices, as the civil aviation authority announced 20% of flights were grounded.Some travelers showed support for the striking workers, but others complained about being embroiled in someone else’s fight. “I had no idea about the strike happening, and I was waiting for two hours in the airport for the train to arrive and it didn’t arrive,” said vacationer Ian Crossen, from New York. “I feel a little bit frustrated. And I’ve spent a lot of money. I’ve spent money I didn’t need to, apparently.”Vladimir Madeira, a Chilean tourist vacationing in Paris, said the strike has been “a nightmare.” He hadn’t heard about the protest until he arrived, and transport disruptions foiled his plans to travel directly to Zurich.Beneath the closed Eiffel Tower, tourists from Thailand, Canada and Spain echoed those sentiments. Bracing for possible violence along the route of the Paris march, police ordered all businesses, cafes and restaurants in the area to close. Authorities banned protests in the more sensitive neighborhoods around the Champs-Elysees avenue, presidential palace, parliament and Notre Dame Cathedral.Police carried out security checks of more than 6,000 people arriving for the protest and detained 65 even before it started. Embassies warned tourists to avoid the protest area. The mood was impassioned in the crowd massed on Boulevard Magenta in eastern Paris.Health workers showed up to decry conditions in hospitals. Students pointed to recent student suicides and demanded government action. Environmentalists emphasized that climate justice and social justice are one and the same. And young and old roundly condemned the new retirement plan, which they fear would take money out of their pockets and reduce the period of repose the French expect in the last decades of their lives.Eric Mettling, who joined the yellow vests at the start of their movement, said the general strike had brought together social movements across France in a manner unprecedented in recent memory to denounce “the social crisis.”Skirmishes broke out between police firing tear gas and protesters throwing flares at a protest in the western French city of Nantes, and thousands of red-vested union activists marched through cities from Marseille on the Mediterranean to Lille in the north.Lacking public transport, commuters used shared bikes or electric scooters despite near-freezing temperatures. Many workers in the Paris region worked from home or took a day off to stay with their children, since 78% of teachers in the capital were on strike.The big question is how long the strike will last. Transport Minister Elisabeth Borne said she expects the travel troubles to be just as bad Friday, and unions said they’ll maintain the Paris subway system strike at least through Monday. Public sector workers fear Macron’s reform will force them to work longer and shrink their pensions. Some private sector workers share their worries, while others welcome the reform.Joseph Kakou, who works an overnight security shift in western Paris, walked an hour to get home to the eastern side of town Thursday morning. “It doesn’t please us to walk. It doesn’t please us to have to strike,” Kakou told The Associated Press. “But we are obliged to, because we can’t work until 90 years old.”To Macron, the retirement reform is central to his plan to transform France so it can compete globally in the 21st century. The government argues France’s 42 retirement systems need streamlining. While Macron respects the right to strike, he “is convinced that the reform is needed, he is committed, that’s the project he presented the French in 2017” during his election campaign, the presidential official said. The official was not authorized to be publicly named.After extensive meetings with workers, the high commissioner for pensions is expected to detail reform proposals next week, and the prime minister will release the government’s plan days after that.", "Protesters mobilized across France on Thursday in a nationwide strike challenging President Emmanuel Macron’s controversial pension reform plans.The Interior Ministry said 806,000 people took part, while labor unions put the number at nearly 1.5 million.Some 250,000 people took part in the protests in Paris, where police used smoke bombs to disperse the crowd.The unlimited strike impacted all public transport systems in the country, according to local media reports.A total of 90 people have been arrested so far in Paris, police said.Some train, subway and bus services were canceled and many schools were closed while the law and order situation led to the cancelation of 20% of flights to the country.In a tweet, the Paris Police Department said it had conducted 6,476 checks. Labor unions said the strike will continue until Monday.The Gare du Nord, a station of the SNCF railway network in Paris, was almost empty in the morning, according to broadcaster France 24.Protesters, however, made their way to the Gare du Nord in the afternoon to attend the main march to Place de la Nation square.They included police, healthcare professionals, teachers, lawyers, taxi and freight drivers, postal workers, farmers, civil servants, refinery workers and students, according to the Le Monde daily.The walkout came after the government announced its determination to implement pension reform despite a nationwide outcry.According to France’s National Institute of Statistics and Economic Studies, Macron has further fueled the sense of anger and rebellion among French people against their president with his economic policies that have given wealthy people a greater share of national income since his inauguration on May 17, 2017.He has been facing the biggest crisis since the beginning of the Yellow Vest protests in October last year.Proposed reformFrance currently has 42 different pension programs for different sectors, but the government proposed to unify them into one pension scheme.France’s current program is based on the principle of solidarity between generations under which the working population finances the pensioners of that year.But due to the aging population, fewer people are paying into the current system.To fix this, the government introduced a point-based system that would compensate workers with pension points for every day they work or every euro they contribute.The reform would lift the privileges granted to civil servants and gradually increase the retirement age from 62 to 64, a move expected to adversely affect many sectors.Workers will get a full pension if they retire at the age of 64. If they retired before, they would lose 5% of their pensions for every year they retire early.They would also gain a 5% increase in their pensions for every year if they retire after the age of 64.The demonstrations and strikes have been supported by numerous labor and police unions as well as the Yellow Vests.Macron paused his overseas visits for a while to focus on a solution to the problems caused by the strikes and demonstrations.", "Paris-A strike over planned pension reforms that paralysed France on Thursday has entered its second day.Several unions, including rail and metro workers, voted to extend the strike action, meaning another day of major disruptions to key services.It comes after more than 800,000 people protested on Thursday, with violent clashes reported in a number of cities.Workers are angry about planned pension reforms that would see them retiring later or facing reduced payouts.France currently has 42 different pension schemes across its private and public sectors, with variations in retirement age and benefits. President Emmanuel Macron says his plans for a universal points-based system would be fairer, but many disagree.Rail workers voted to extend their strike through Friday, while unions at the Parisian bus and metro operator said their walkout would continue until at least Monday.Numerous rush-hour trains into Paris were cancelled on Friday and 10 out of 16 metro lines were closed, while others ran limited services, Reuters news agency reports.Traffic jams of more than 350km (217 miles) were reported on major roads in and around the capital.A number of flights have also been disrupted, while many schools are expected to remain shuttered and hospitals understaffed. Protesters sang songs against President Macron in ParisMr Macron’s government has reportedly made plans to deal with the strike action at the weekend.Some trade union leaders have vowed to strike until Mr Macron abandons his campaign promise to overhaul the retirement system.“We’re going to protest for a week at least, and at the end of that week it’s the government that’s going to back down,” 50-year-old Paris transport employee Patrick Dos Santos told Reuters.What happened on Thursday?French police gave the figure of 800,000 people taking to the streets across the country, including 65,000 in Paris. Union leaders put the numbers higher, with the CGT union saying 1.5m people turned out across France.The disruption meant popular tourist sites in Paris, including the Eiffel Tower, were closed for the day and usually busy transport hubs like the Gare du Nord were unusually quiet.", "Paris (AFP): France was on Saturday expecting its most serious nationwide strike in years to paralyse the country over the weekend, with unions warning the turmoil would last well into next week.", "PARIS: The French government on Friday expressed determination to plough ahead with far-reaching pension reforms in the face of the biggest strikes in years, which have brought public transport in much of the country to a standstill.The strikes, which began on Thursday, have seen most high-speed trains cancelled, flights affected and most of the Paris metro shut down in a major challenge to the ambitious reform agenda of President Emmanuel Macron.The turmoil is expected to continue over the weekend and through until at least Tuesday when unions have called more nationwide protests to follow mass rallies on Thursday that brought over 800,000 people onto the streets.With Macron not yet speaking publicly about the strikes and seeking for now to rise above the fray, Prime Minister Edouard Philippe insisted that the government would not abandon a plan which would require the French “to work a bit longer.” He pledged to work with trade unions to introduce a single “fairer”, points-based pension scheme for all, scrapping the 42 more advantageous plans currently enjoyed by train drivers, soldiers and a host of other workers in the process.The centre-right premier added that the government was “very determined” to implement the reform, adding he did not believe the French would always accept a situation where some retire earlier, and with more money than others doing comparable jobs.But he emphasised that the changes, which he said would be unveiled on Wednesday, would be introduced “progressively, without harshness”. “My logic will never be one of confrontation,” he said.Dozens of trains, metros and flights were cancelled, many schools were again closed or offering only daycare, and four of the country’s eight oil refineries remained blocked on Friday.Rail operator SNCF has already halted ticket sales through the weekend, with 90 percent of high-speed TGV trains again cancelled on Friday and little improvement expected over the weekend.Half of the Eurostar trains between Paris and London were dropped, and just two of three Thalys trains serving Paris, Brussels and Amsterdam were running.“I was supposed to take a train to Metz (northeast France), I reserved my ticket three days ago but it’s been cancelled and I’ve gotten no information,” Rachel Pallamidessi said at a deserted station in the city of Strasbourg.Several airlines cancelled flights as air traffic controllers walked off the job, with Air France cancelling 30 percent of domestic flights and 10 percent of nearby international routes.In Paris, nine of the capital’s 16 metro lines were shut while many others were running only during rush hours, prompting commuters to turn to bicycles, electric scooters and other alternatives or to work from home.It remains to be seen if the protests will match the magnitude of the 1995 strikes against pension overhauls when France was paralysed for three weeks from November to December, ultimately forcing the government to back down. The walkout is the latest test of Macron’s mettle after months of protests from teachers, hospital workers, police and firefighters, capping a year of social unrest triggered by the “yellow vest” protest movement.Unions say Macron’s proposal for a single pension system would force millions of people in both the public and private sectors to work well beyond the official retirement age of 62.At least 800,000 took part in rallies around the country on Thursday, according to the interior ministry, one of the biggest demonstrations of union strength in nearly a decade.Another day of strikes and rallies has been called for Tuesday, a day after union leaders are to meet again with government officials over the pension reform.“There were lots of people on strike, now we need even more if we want to influence these decisions,” Philippe Martinez of the hard-line CGT union told LCI television.While most of the rallies were peaceful, police fired tear gas to disperse dozens of black-clad protesters smashing windows and throwing stones during the Paris march, with one construction trailer set on fire.Several dozens of people were arrested, and three journalists were injured after reportedly being hit by tear gas or stun grenades, including a Turkish journalist who was struck in the face.Published in Dawn, December 7th, 2019Copyright © 2019, DawnScribe Publishing Platform", ] The code is as follows. from bertopic import BERTopic # docs=[i for i in all_text if type(i)==str] # docs=docs.T topic_model = BERTopic() topics, probs = topic_model.fit_transform(docs) Any help is much appreciated Thanks
Change line after saving web crawled data from Beautifulsoup4 as txt file
I had set code to crawl headlines from the website https://7news.com.au/news/coronavirus-sa and tried to save headlines into txt file. I wrote the following code: import requests from bs4 import BeautifulSoup as bs f = open("/Users/j/Desktop/Python/chatbot project/headlines.txt", 'w') url = f'https://7news.com.au/news/coronavirus-sa' r = requests.get(url) soup = bs(r.text, 'html.parser') headlines = soup.select('h2.Card-Headline') for h in headlines: print(h.text) f.write(h.text) f.close() The result of print(h.text) was: TENS OF THOUSANDS to spend Christmas in quarantine as Omicron causes COVID carnage SA records ‘steep increase’ in COVID cases as premier issues ominous warning South Australia gives nod to widespread rollout of rapid antigen COVID-19 tests South Australia’s Omicron cases almost TRIPLE as COVID-19 cases surge Leading doctor’s warning about ‘essential’ and ‘necessary’ spread of COVID-19 SA ambos sound the alarm over rising COVID-19 cases after state records surge Scott Morrison flags potential changes to COVID-19 approach after National Cabinet STATE OF THE NATION: Australia fighting to contain COVID as cases soar to record highs WATCH LIVE: Scott Morrison provides COVID-19 update after National Cabinet meeting Australia’s COVID cases could hit 250,000 DAILY unless restrictions return PM’s plea ahead of emergency meeting as he declares ‘we’re not going back to lockdowns’ South Australia scraps testing rule as cases surge to all-time high The headlines were sorted by lines. However, when I checked the text file, the result was: TENS OF THOUSANDS to spend Christmas in quarantine as Omicron causes COVID carnageSA records ‘steep increase’ in COVID cases as premier issues ominous warningSouth Australia gives nod to widespread rollout of rapid antigen COVID-19 tests South Australia’s Omicron cases almost TRIPLE as COVID-19 cases surgeLeading doctor’s warning about ‘essential’ and ‘necessary’ spread of COVID-19SA ambos sound the alarm over rising COVID-19 cases after state records surgeScott Morrison flags potential changes to COVID-19 approach after National CabinetSTATE OF THE NATION: Australia fighting to contain COVID as cases soar to record highsWATCH LIVE: Scott Morrison provides COVID-19 update after National Cabinet meetingAustralia’s COVID cases could hit 250,000 DAILY unless restrictions returnPM’s plea ahead of emergency meeting as he declares ‘we’re not going back to lockdowns’South Australia scraps testing rule as cases surge to all-time high The line was not separated as expected. I had tried to split it by recalling the text file and use .split() method, which did not work. Is there any way to recall this file and split it with lines, or save it as separated at first?
Try to add \n in f.write() so your string h will write to new line for h in headlines: print(h.text) f.write(h.text+"\n") f.close()
How to parse all <p> tags within a certain <div> tag?
I am using BeautifulSoup to parse some html page. I want to get all text information within the <p> tags under this <div id="commentary"> link to image of that html script content which I want to get When I use find_all to get all of the <p> tags, the list contains only the first one. I used to following code to count the no. of <p> tags present under <div>. You can clearly see from the above image that there are around 19 <p> tags within that highlighted <div> tag, still my code prints out 1. content = soup.find('div', attrs={'class':'company-profile'}) points = content.find('div', attrs={'id':'commentary'}) count = 0 for point in points.find_all('p'): count = count + 1 print(count) print(points.text) I don't know why is this happening and why the find_all method wont return the complete list. I also tried using the points.text to print all of the text within <div id="commentary"> tag, but it prints contents of first <p> tag only. (mlenv) chirag#debian10:~/ML/Finaments$ python main.py <class 'bs4.element.Tag'> State Bank of India is a Fortune 500 company. It is an Indian Multinational, Public Sector banking and financial services statutory body headquartered in Mumbai. It is the largest and oldest bank in India with over 200 years of history.# 1 1 Ratios (Q3FY21) Capital Adequacy Ratio - 14.50% Net Interest Margin - 3.34% Gross NPA - 4.77% Net NPA - 1.23% CASA Ratio - 45.15%# (mlenv) chirag#debian10:~/ML/Finaments$ ^C (mlenv) chirag#debian10:~/ML/Finaments$ Those 1's are the from print(count) and then it only prints the content of first <p> tag from print(points.text). I have just started using beautifulsoup, please help me.
You can go after the direct url that has that info. You'll need to pass in there the correct cookies and csrf tokens though: import requests from bs4 import BeautifulSoup url = 'https://www.screener.in/wiki/company/3188/commentary/' headers= {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36', 'referer': 'https://www.screener.in/company/SBIN/consolidated/', 'x-csrftoken': 'E8zDjm7CtmSqCM2B9rTYPXTcPMJ22w2oynWzWzT4bCgAIaKkt4DmrirBSEPdCP0W', 'cookie': '_gcl_au=1.1.69436223.1621345270; _ga=GA1.2.2056656539.1621345271; _gid=GA1.2.1452432592.1621345271; csrftoken=E8zDjm7CtmSqCM2B9rTYPXTcPMJ22w2oynWzWzT4bCgAIaKkt4DmrirBSEPdCP0W; sessionid=mrdcmrlqpe72dqjrqgtrb2m2v375sjv0; _gat_UA-2456523-7=1'} response = requests.post(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') count = 0 for point in soup.find_all('p'): count = count + 1 print(count) print(soup.text) Output: 19 Ratios (Q3FY21) Capital Adequacy Ratio - 14.50% Net Interest Margin - 3.34% Gross NPA - 4.77% Net NPA - 1.23% CASA Ratio - 45.15%# Branch Network Presently, the bank operates a network of 22,330 branches and ~58,000 ATMs across India. It also operates ~71,000 business correspondent outlets across India.# Market Share The bank has a market share of 22.84% in deposits and 19.69% share in advances in India. It has a strong customer base of ~45 crore customers.# Loan Book Retail loans account for 39% of the loan book, followed by corporate (37%), SME (14%) and Agriculture (10%).# Retail Book - Home loans account for 68% of the retail book, followed by xpress credit (22%), auto loans (9%), personal gold loans (2%) and others (9%).# Exposure The bank has a well-diversified loan book exposed to various sectors. Top sectors include home loans (23%), infrastructure (15%), services (12%) and agriculture (10%). ~75% of the corporate advances are rated A and better ratings from rating agencies. 38% of the corporate book accounts for PSUs & Govt. departments.# Segmental NPAs Presently, the total NPAs of the bank stands at 1,17,244 crores. agriculture segment accounts for the major ratio of NPAs i.e. 13.71% of all loans are NPA. Corporate segment accounts for 59,400 crores worth of NPAs i.e. 51% of total NPAs of the bank.# International Business The bank has a global footprint with a network of 233 branches/offices in 32 countries.# It has presence in USA, Canada, Brazil, Russia, Germany, France, Turkey, Australia, Bangladesh, Nepal, Sri Lanka and other countries.# Presently, Overseas business accounts for 3% of total deposits# and 13% of total advances.# Government Business SBI has always been the banker of choice to the government of India and is the market leader in government business. It had turnover of ~52,50,000 lakh crores and commissions of ~3,700 crores from government business in FY20.# Financial Inclusion Business The bank has ~71,000 BC outlets which has primary focus on financial inclusion customers.# The bank accounts for 40% of all PMJDY accounts i.e. more than 12 crore accounts.# Presently, the deposits from PMJDY accounts are ~42,500 crores i.e. 1.2% of total deposits of the bank. Digital Metrics Increasing digitization resulted in ~40% of asset accounts and ~60% of liability customers added via digital channels in FY21.# 67% of all transactions were initiated through digital channels in 2020 which is up from 58% in the previous year.# Subsidiaries Operations The bank owns various subsidiaries which are engaged in related business activities :- 1. SBI Capital Markets Ltd (100% stake) - SBICAP is a leading investment banker, offering investment banking and corporate advisory services to clients across three product categories i.e. project advisory and structured finance, equity capital markets and debt capital markets. This company further has wholly owned subsidiaries in related businesses viz. SBICAP Securities, SBICAP Trustee Co., SBICAP Ventures & others.# 2. SBI DHFI Ltd (72% stake) - It is a primary dealer and supports the book building process and provide depth and liquidity to secondary markets in G-Sec. It also deals in money market instruments, non G-Sec debt instruments, amongst others.# 3. SBI Cards and Payment Services Ltd (69% stake) - It is a non-banking financial company that offers extensive credit card portfolio to individual cardholders and corporate clients. It has diversified customer acquisition network that enables to engage prospective customers across multiple channels.# The IPO of SBI Cards was launched in March 2020 wherein the company sold ~13 crore equity shares for a consideration of ₹10,350 crores.# 4. SBI Life Insurance Co. Ltd (57.6% stake) - It is one of the leading life insurance company in India which offers a wide range of individual and group insurance solutions that meet various life stage needs of customers.# 5. SBI Funds Management Pvt Ltd (63% stake) - It is a JV between SBI and AMUNDI (France). It is an asset management company with the fastest CAGR of 33% as against industrial average of 14% in the last 3 years.# 6. SBI General Insurance Company Ltd (70% stake) - It is a general insurance company which focuses on profitable growth in banc-assurance channel along with other distribution channels and line of businesses. It is first non-life insurance company in India to cross 6,000 crores in a decade of operations.# Amalgamation of Associate Banks In March 2017, the bank acquired its 5 associate state banks and Bharatiya Mahila Bank by allotting ~13.5 crore equity shares of SBI.#
How to extract a particular element/text inside an HTML using Python 3.x
This is my code: import requests from bs4 import BeautifulSoup r=requests.get('https://www.morningstar.com/stocks/xtse/enb/quote') c=r.content soup=BeautifulSoup(c,"html.parser") print(soup.prettify()) for item in soup.find("byId: {}".text): print(item.text) Once I ran that, on the bottom most of the whole html file it shows: window.__NUXT__=(function(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N){J.id=1014028;J.title="Enbridge Expects a Rebound in 2021, Increases Dividend by 3%";J.deck=y;J.locale=K;J.publishedDate=L;J.updatedDate=L;J.paywalled=b;J.authors=[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}];J.authorDisclosure=[];J.body=[{type:w,contentObject:[{type:x,content:"Wide-moat Enbridge announced 2021 full-year adjusted EBITDA guidance of CAD 13.9 billion-CAD 14.3 billion at its annual investor day on Dec. 8, which is above our previous forecasts. The midpoint of the guidance also implies a 6% increase from 2020 expected EBITDA. Enbridge believes that the increased performance will be driven by a recovery of Mainline volumes and the associated downstream pipelines; customer growth in the gas utilities business; rate increases on its gas pipelines; and the impact of new projects, including the Line 3 replacement. At this point, the Mainline’s heavy oil capacity is full, and demand for light capacity continues to increase. Accordingly, Enbridge expects first-quarter 2021 volumes to average 2.7 million barrels of oil per day, which is also above our previous expectations and compares favorably with 2.56 mmbbl\u002Fd in third-quarter 2020.",gated:a}],gated:a}];M.name=H;M.performanceId=A;M.secId=A;M.ticker=N;M.exchange=I;M.type=F;return {layout:"default",data:[{marketPrice:{value:43.55,filtered:a,date:{value:v,filtered:a}},premiumDiscount:{value:"-92.6",filtered:b,date:{value:v,filtered:a},text:{value:"Discount",filtered:a},type:{value:D,filtered:a}},threeStarRatingPrice:{value:"86.9",filtered:b},headquarterAddress1:{value:"425-1st Street SW",filtered:a},headquarterAddress2:{value:"Suite 200, Fifth Avenue Place",filtered:a},industry:{value:"Oil & Gas Midstream",filtered:a},stockStarRating:{value:"6",filtered:b,date:{value:v,filtered:a},text:{value:"Undervalued",filtered:a},type:{value:D,filtered:a}},fiscalYearEndDate:{value:"2020-12-31",filtered:a},headquarterState:{value:"AB",filtered:a},reportDate:{value:"2020-09-30",filtered:a},headquarterPostalCode:{value:"T2P 3L8",filtered:a},stewardshipRating:{value:"Bkwvsknl",filtered:b,date:{value:v,filtered:a}},companyProfile:{value:"Enbridge is an energy generation, distribution, and transportation company in the U.S. and Canada. Its pipeline network consists of the Canadian Mainline system, regional oil sands pipelines, and natural gas pipelines. The company also owns and operates a regulated natural gas utility and Canada’s largest natural gas distribution company. Additionally, Enbridge generates renewable and alternative energy with 2,000 megawatts of capacity.",filtered:a},fairValue:{value:"23.3",filtered:b,date:{value:"2020-12-08",filtered:a},type:{value:D,filtered:a}},fax:{value:"+1 403 231-5929",filtered:a},fiveStarRatingPrice:{value:"58.3",filtered:b},sector:{value:"Energy",filtered:a},economicMoat:{value:"Nsfq",filtered:b,date:{value:v,filtered:a}},ticker:N,website:{value:"https:\u002F\u002Fwww.enbridge.com",filtered:a},headquarterCountry:{value:"Canada",filtered:a},contactEmail:{value:"investor.relations#enbridge.com",filtered:a},fourStarRatingPrice:{value:"77.0",filtered:b},economicMoatTrend:{value:"Zcbqwh",filtered:b,date:{value:v,filtered:a}},headquarterCity:{value:"Calgary",filtered:a},phone:{value:"+1 403 231-3900",filtered:a},universe:{value:"EQ",filtered:a},exchange:I,totalEmployees:{value:11300,filtered:a},twoStarRatingPrice:{value:"54.91",filtered:b},fairValueUncertainty:{value:"Nnczrm",filtered:b},name:H,performanceId:A,secId:A,type:F,articles:[{title:"Epic Oil Crash Sets Up Brutal Downturn for Energy Sector",link:"\u002Farticles\u002F980350\u002Fepic-oil-crash-sets-up-brutal-downturn-for-energy-sector",caption:"But recovery is inevitable, and stocks look very cheap--just watch out for bankruptcy risk.",author:"Preston Caldwell",label:C,isVideo:a},{title:"Enbridge's Sell-Off Looks Exaggerated",link:"\u002Farticles\u002F978902\u002Fenbridges-sell-off-looks-exaggerated",caption:"The market is underestimating long-term cash flows once oil prices normalize.",author:c,label:z,isVideo:a},{title:"Executive Orders More Symbolic Than Material for Pipelines",link:"\u002Farticles\u002F923945\u002Fexecutive-orders-more-symbolic-than-material-for-pipelines",caption:"We're not changing our outlook for our midstream coverage.",author:"Stephen Ellis",label:C,isVideo:a},{title:"New Permit Moves Keystone XL Forward",link:"\u002Farticles\u002F922171\u002Fnew-permit-moves-keystone-xl-forward",caption:"Our fair value estimates for TransCanada and Enbridge are unchanged.",author:c,label:z,isVideo:a},{title:"Enbridge Hikes Dividend, Remains Deeply Undervalued",link:"\u002Farticles\u002F904922\u002Fenbridge-hikes-dividend-remains-deeply-undervalued",caption:"It has a wide moat and an attractive yield, and now's the time to invest.",author:c,label:z,isVideo:a},{title:"Concerns About Enbridge's Dividend Are Overblown",link:"\u002Farticles\u002F870105\u002Fconcerns-about-enbridges-dividend-are-overblown",caption:"The wide-moat company is on course to boost its dividend and offers hefty upside.",author:c,label:z,isVideo:a},{title:"Enbridge's Growth Portfolio Is Underappreciated",link:"\u002Farticles\u002F841990\u002Fenbridges-growth-portfolio-is-underappreciated",caption:"And the company continues to reward investors with annual dividend growth.",author:c,label:z,isVideo:a},{title:"Enbridge's Economic Moat Widens",link:"\u002Farticles\u002F565094\u002Fenbridges-economic-moat-widens",caption:"Shifting economics and supply dynamics provide growth opportunities.",author:"David McColl",label:C,isVideo:a}],analysis:{id:1014019,title:"Enbridge Increases Its Dividend by 3%",locale:K,publishedDate:B,updatedDate:B,paywalled:b,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],authorDisclosure:[],body:[],pillars:{investmentThesis:{title:"Business Strategy and Outlook",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge is an energy distribution and transportation company in the United States and Canada. It operates crude and natural gas pipelines, including the Canadian Mainline system. It also owns and operates Canada's largest natural gas distribution company.",gated:a}],gated:a}]},moat:{title:"Economic Moat",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Midstream companies process, transport, and store natural gas, natural gas liquids, crude oil, and refined products. There are multiple ways for midstream companies to build moats, but efficient scale is the dominant source. Hydrocarbons are produced and consumed in different places and in different forms from how they come out of the ground. Midstream firms transport and process hydrocarbons. Once a transport route is established, there's usually little need to build a competing route. Doing so would drive returns for both routes below the cost of capital. Thus, pipelines are generally moaty because they efficiently serve markets of limited size.",gated:a}],gated:a}]},managementAndStewardship:{title:"Stewardship",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"President and CEO Al Monaco has been with Enbridge since 1995, serving in his current role since 2012. During his tenure at Enbridge, Monaco has experience in all business segments, international business development, corporate planning, and finance. His experience in various business segments, corporate development, growth projects, and finance positions him to successfully lead the proposed pipeline and gas distribution growth projects.",gated:a}],gated:a}]},enterpriseRisk:{title:"Risk and Uncertainty",publishedDate:E,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Enbridge’s profitability is not directly tied to commodity prices, as pipeline transportation costs are not tied to the price of natural gas and crude oil. However, the cyclical supply and demand nature of commodities and related pricing can have an indirect impact on the business as shippers may choose to accelerate or delay certain projects. This can affect the timing for the demand of transportation services and\u002For new gas pipeline infrastructure.",gated:a}],gated:a}]},valuation:{title:"Fair Value and Profit Drivers",publishedDate:B,authors:[{holdings:[o,p,q,r,s,t,k,u,n,m,l,j,i],id:h,name:c,jobTitle:g,byLine:f,shortBio:e,image:d,isPrimary:b}],body:[{type:w,contentObject:[{type:x,content:"Our fair value estimate of $43 (CAD 57) per share is based on a discounted cash flow model. We believe that Enbridge’s broad network of midstream assets and geographic diversification will serve it well in the low oil and gas price environment, and crude and natural gas pipeline expansions in growing regions will fuel EBITDA growth. Our cash flow forecasts incorporate the addition of the Line 3 replacement pipeline, but we adjusted our Canadian fair value downward to a reflect a risk-weighted probability of 80% that the pipeline is built.",gated:a}],gated:a}]},notes:J},notes:J},listedCurrency:{value:"CAD",filtered:a}},{}],error:y,state:{history:{currentRoute:"\u002Fstocks\u002Fxtse\u002Fenb\u002Fquote",previousRoute:y,returnRoute:y},ids:{byTicker:{"ST::XTSE::ENB":M},byId:{"0P0000681O":M}},markets:{movers:{gainers:[],losers:[],actives:[]},quotes:{},trailingReturns:{},intradayTimeSeries:{},lastRefreshed:y},player:{nowPlaying:y},siteAlert:{message:G,type:G},user:{userType:"visitor",isAdvisor:a,contentType:"e7FDDltrTy+tA2HnLovvGL0LFMwT+KkEptGju5wXVTU="}},serverRendered:b,serverDate:new Date(1607916811524)}}(false,true,"Joe Gemino","https:\u002F\u002Fim.mstar.com\u002FContent\u002FCMSImages\u002F78x78\u002F2008-jgemino-78x78.jpg","Joe Gemino, CPA, is a senior equity analyst for Morningstar.","Joe Gemino, CPA","Senior Equity Analyst","2008","MST50","SGDLX","FB","TRRNX","DODIX","DODGX","MORN","MOAT","AAPL","V","T","DIS","DODBX","2020-12-11","p","text",null,"Stock Strategist","0P0000681O","2020-12-08T18:13:00Z","Stock Strategist Industry Reports","Qual","2020-04-16T14:59:00Z","ST","","Enbridge Inc","XTSE",{},"en-US","2020-12-08T18:28:00Z",{},"ENB")); My question: how do I extract the information inside "byID:" so that print(item.text) will give me "0P0000681O" only.
If you only need to get a value, use a string find. from simplified_scrapy import utils, SimplifiedDoc, req html = req.get( 'https://www.morningstar.com/stocks/xtse/enb/quote') start = html.find('byId:{"') html = html[start+len('byId:{"'):] end = html.find('":') print(html[:end]) Result: 0P0000681O
Determining customary distance unit from ISO 3166 country code
ISO 3166 defines country codes such as GB, US, FR or RU. I would like a reasonably definitive association from these country codes to the customary unit of measure for distances between places in those countries. Specifically on iOS and OS X, the country code can be retrieved from NSLocale: [[NSLocale currentLocale] objectForKey: NSLocaleCountryCode]; NSLocale also provides a way to see if a country uses metric or non metric units: const bool useMetric = [[[NSLocale currentLocale] objectForKey: NSLocaleUsesMetricSystem] boolValue]; However, this is not sufficient. For example, in Great Britain (GB) the metric system is widely used, but distances between places continue to be officially measured in miles rather than kilometres.
I also faced this problem :-) Countries which uses Metric system but still use miles :-- 1. GB is only exception which still uses miles instead of metric. Note: Canada also stared using KMs for road transport. Although, Canada still follows miles for train and horse transport Countries which do not uses Metric System Liberia, Myanmar and United States of America. Note: Myanmar (Formerly Burma) is planning to move to metric system. Currently, Myanmar uses its own system different from imperial and metric. In my app, i check whether country uses imperial or metric. if (metric) then assign kms for all countries except britan if (imperial) then assign miles for all countries except Burma if burma then assign burma unit if britan then assign miles
A chart showing countries using miles per hour for road speeds is available. It cites Wikipedia's articles on miles per hour as its source, which has the following to say: These include roads in the United Kingdom,[1] the United States,[2] and UK and US territories; American Samoa,[3] the Bahamas,[4] Belize,[5] British Virgin Islands,[6] the Cayman Islands,[7] Dominica,[8] the Falkland Islands,[9] Grenada,[10] Guam,[11] Burma,[12] The N. Mariana Islands,[13] Samoa,[14] St. Lucia,[15] St. Vincent & The Grenadines,[16] St. Helena,[17] St. Kitts & Nevis,[18] Turks & Caicos Islands,[19] the U.S. Virgin Islands,[20][21] Antigua & Barbuda (although km are used for distance),[22] and Puerto Rico (same as former).[22] I don't see a way to download this as data keyed from ISO3166 country code, but it's not a huge task to compile one. I'll leave this answer unaccepted in case a better suggestion is available.
Officially, road distances in the UK are in kilometres, but road signs are in miles. Confusing? Yes! When a road engineer get aplan of a road, everythign is in kiolometres, government statistics are in kilometres, but road signs and car odometers are in miles. See https://en.wikipedia.org/wiki/Driver_location_sign for more info.