I'm attempting to scrape Indeed.com and want to get information pertaining to each job in their respective div. The response will print out in the terminal, but when I write to a file or run the spider I get a blank file and no items returned. How do I fix this issue?
I've tried changing my xpaths to relative to the container its pulling from and it still runs blank.
def parse(self, response):
html = response.body
container3 = response.xpath(".//div[contains(#class,'jobsearch-SerpJobCard unifiedRow row result clickcard')]").extract()
print(container3)
with open('container.txt', 'w') as cont:
cont.write(container3)
cont.close()
title = Selector(response=container3).xpath(".//*[#class='title']/a/#title").get()
titles = container3.xpath(".//*[#class='title']/a/#title").getall()
locations = container3.xpath(".//*[#class= 'sjcl']/span/text()").getall()
companies = container3.xpath(".//*[#class= 'company']/a/text()").getall()
summarys = container3.xpath(".//*[#class= 'summary']/.").getall()
links = response.css("div.title a::attr(href)").getall()
webscrape = WebscrapeItem()
webscrape['title'] = []
webscrape['company'] = []
webscrape['location'] = []
webscrape['desc'] = []
webscrape['link'] = []
for link in links:
self.links.append('https://www.indeed.com/' + link)
webscrape['link'].append('https://www.indeed.com/' + link)
for title, local in itertools.zip_longest(titles, locations):
webscrape['title'].append(title)
webscrape['location'].append(local)
for suma, com in itertools.zip_longest(summarys, companies):
webscrape['desc'].append(suma)
webscrape['company'].append(com)
yield webscrape
container3 output:
<div class="jobsearch-SerpJobCard unifiedRow row result clickcard" id="pj_23e4270b7501bb9b" data-jk="23e4270b7501bb9b" data-empn="5625259597886418" data-ci="291406065">\n\n <div class="title">\n <a target="_blank" id="sja2" href="/pagead/clk?mo=r&ad=-6NYlbfkN0AGcPE08CwaySIkGkcc_oP1ITgH03VIz0r4xVHFv1QhAqfdykiPOMynTjgufJX7HvDowBKp7j-7NHJP9GOjbo56Vjxh5NURcHO8VKHA2Y_kPQaP89uziwg10G1Cy7gxqliSnkyvAjNozb3dIZaFvs20PbgIEbVp-Hlps87Ix3AR1T6shfkApixB3pFjOLL7mVL86YGAk8ZDtjg1RSW02V3Z21NoirneOsjdmwulvgL84YrSuUydYlJaqi5F8aPMUi7pz0h9-mKPlGF9g2xadVCCe2GDYCw9Svjigifq0j5m6WWsToS9ZsU4_uJu3ZNLRr92Eiwq9QHaT2tJcVrjqtO1X7Lz2bHVDj0RBD_MvoO_FmG0_Sr_tCm8gCxu55S7Vk4GEi0nBslmfj4br8hgZ1AuLs4D_XWmJF6MErKJSgPJFZWn7X2SAlVC&p=2&fvj=1&vjs=3" onmousedown="sjomd(\'sja2\'); clk(\'sja2\');" onclick=" setRefineByCookie([]); sjoc(\'sja2\', 0); convCtr(\'SJ\')" rel="noopener nofollow" title="EMS Executive Director" class="jobtitle turnstileLink " data-tn-element="jobTitle">\n EMS Executive Director</a>\n\n </div>\n\n <div class="sjcl">\n <div>\n <span class="company">\n <a data-tn-element="companyName" class="turnstileLink" target="_blank" href="/cmp/Remsa-1" onmousedown="this.href = appendParamsOnce(this.href, \'from=SERP&campaignid=serp-linkcompanyname&fromjk=23e4270b7501bb9b&jcid=1075eae744bf7959\')" rel="noopener">\n REMSA</a></span>\n\n <a data-tn-element="reviewStars" data-tn-variant="cmplinktst2" class="turnstileLink slNoUnderline " href="/cmp/Remsa-1/reviews" title="Remsa reviews" onmousedown="this.href = appendParamsOnce(this.href, \'?campaignid=cmplinktst2&from=SERP&jt=EMS+Executive+Director&fromjk=23e4270b7501bb9b&jcid=1075eae744bf7959\');" target="_blank" rel="noopener">\n <span class="ratings" aria-label="3.9 out of 5 star rating"><span class="rating" style="width:44.4px"><!-- --></span></span>\n<span class="slNoUnderline">7 reviews</span>\n </a>\n </div>\n<div id="recJobLoc_23e4270b7501bb9b" class="recJobLoc" data-rc-loc="United States" style="display: none"></div>\n\n <div class="location ">United States</div>\n </div>\n\n <div class="summary">\n Responsible for the <b>financial</b>, operational and management performance of Healthcare services for the company. Directs daily operations in support of the mission…</div>
I expect each 'jobsearch-SerpJobCard unifiedRow row result clickcard' to be extracted into a list, then getting titles, locations, companies, and summarys from that list using relative xpaths.
However, what I'm getting is a blank container3, and no items returned. Here is the response.text info from the finished spider.
"{\"status\": \"ok\", \"items\": [], \"items_dropped\": [], \"stats\": {\"downloader/request_bytes\": 1132, \"downloader/request_count\": 3, \"downloader/request_method_count/GET\": 2, \"downloader/request_method_count/POST\": 1, \"downloader/response_bytes\": 1012262, \"downloader/response_count\": 3, \"downloader/response_status_count/200\": 2, \"downloader/response_status_count/404\": 1, \"finish_reason\": \"finished\", \"finish_time\": \"2019-08-21 06:29:40\", \"log_count/DEBUG\": 3, \"log_count/ERROR\": 1, \"log_count/INFO\": 8, \"log_count/WARNING\": 1, ...
Check this out, it works
for item in response.xpath('//div[#class="jobsearch-SerpJobCard unifiedRow row result"]'):
titles = item.xpath(".//*[#class='title']/a/#title").getall()
print(titles)
locations = item.xpath(".//*[#class= 'sjcl']/span/text()").getall()
print(locations)
Output
['Python Developer Freshers Trainees', 'Python Developer', 'Python Developer', 'Python Developer', 'Python Developers', 'Software Trainee', 'Python\\Django Developer', 'Hiring 2016 / 2017 / 2018 / 2019 freshers as software trainee', 'Python/Django Developer', 'Senior Python Developer']
['Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala', 'Kochi, Kerala']
Related
I am having trouble with my code not printing all the strings that I want and I am unsure of how to edit my code to change that.
I am trying to scrape all the strings including things like 460 hp # 7000 rpm which it is currently not scraping. Ideally the strings in the strong elements are kept separate. I have tried adding another .next_sibling, changing the br to p and strong they just return an error.
The HTML is as follows:
<div class="specs-content">
<p>
<strong>Displacement:</strong>
" 307 cu in, 5038 "
<br>
<strong>Power:</strong>
" 460 hp # 7000 rpm "
<br>
<strong>Torque:</strong>
" 420 lb-ft # 4600 rpm "
</p>
<p>
<strong>TRANSMISSION:</strong>
" 10-speed automatic with manual shifting mode "
</p>
<p>
<strong>CHASSIS</strong>
<br>
" Suspension (F/R): struts/multilink "
<br>
" Brakes (F/R): 15.0-in vented disc/13.0-in vented disc "
<br>
" Tires: Michelin Pilot Sport 4S, F: 255/40ZR-19 (100Y) R: 275/40ZR-19 (105Y) "
</p>
</div>
I have written the following code thus far:
import requests
from bs4 import BeautifulSoup
URL = requests.get('https://www.LinkeHere.com')
soup = BeautifulSoup(URL.text, 'html.parser')
FindClass = soup.find(class_='specs-content')
FindElement = FindClass.find_all('br')
for Specs in FindElement:
Specs = Specs.next_sibling
print(Specs.string)
This returns:
Power:
Torque:
Suspension (F/R): struts/multilink
Brakes (F/R): 13.9-in vented disc/13.0-in vented disc
Tires: Michelin Pilot Sport 4S, 255/40ZR-19 (100Y)
You can use the get_text() method with adding a newline \n as the separator argument:
from bs4 import BeautifulSoup
html = """THE ABOVE HTML SNIPPET"""
soup = BeautifulSoup(html, "html.parser")
for tag in soup.find_all(class_="specs-content"):
print(tag.get_text(strip=True, separator="\n").replace('"', ""))
Output:
Displacement:
307 cu in, 5038
Power:
460 hp # 7000 rpm
Torque:
420 lb-ft # 4600 rpm
TRANSMISSION:
10-speed automatic with manual shifting mode
CHASSIS
Suspension (F/R): struts/multilink
Brakes (F/R): 15.0-in vented disc/13.0-in vented disc
Tires: Michelin Pilot Sport 4S, F: 255/40ZR-19 (100Y) R: 275/40ZR-19 (105Y)
I'm having issues with old working code not functioning correctly anymore.
My python code is scraping a website using beautiful soup and extracting event data (date, event, link).
My code is pulling all of the events which are located in the tbody. Each event is stored in a <tr class="Box">. The issue is that my scraper seems to be stopping after this <tr style ="box-shadow: none;> After it reaches this section (which is a section containing 3 advertisements on the site for events that I don't want to scrape) the code stops pulling event data from within the <tr class="Box">. Is there a way to skip this tr style/ignore future cases?
import pandas as pd
import bs4 as bs
from bs4 import BeautifulSoup
import urllib.request
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='bs4')
source = urllib.request.urlopen('https://10times.com/losangeles-us/technology/conferences').read()
soup = bs.BeautifulSoup(source,'html.parser')
#---Get Event Data---
test1=[]
table = soup.find('tbody')
table_rows = table.find_all('tr') #find table rows (tr)
for x in table_rows:
data = x.find_all('td') #find table data
row = [x.text for x in data]
if len(row) > 2: #Exlcudes rows with only event name/link, but no data.
test1.append(row)
test1
The data is loaded dynamically via JavaScript, so you don't see more results. You can use this example to load more pages:
import requests
from bs4 import BeautifulSoup
url = "https://10times.com/ajax?for=scroll&path=/losangeles-us/technology/conferences"
params = {"page": 1, "ajax": 1}
headers = {"X-Requested-With": "XMLHttpRequest"}
for params["page"] in range(1, 4): # <-- increase number of pages here
print("Page {}..".format(params["page"]))
soup = BeautifulSoup(
requests.get(url, headers=headers, params=params).content,
"html.parser",
)
for tr in soup.select('tr[class="box"]'):
tds = [td.get_text(strip=True, separator=" ") for td in tr.select("td")]
print(tds)
Prints:
Page 1..
['Tue, 29 Sep - Thu, 01 Oct 2020', 'Lens Los Angeles', 'Intercontinental Los Angeles Downtown, Los Angeles', 'LENS brings together the entire Degreed community - our clients, invited prospective clients, thought leaders, partners, employees, executives, and industry experts for two days of discussion, workshops,...', 'Business Services IT & Technology', 'Interested']
['Wed, 30 Sep - Sat, 03 Oct 2020', 'FinCon', 'Long Beach Convention & Entertainment Center, Long Beach 20.1 Miles from Los Angeles', 'FinCon will be helping financial influencers and brands create better content, reach their audience, and make more money. Collaborate with other influencers who share your passion for making personal finance...', 'Banking & Finance IT & Technology', 'Interested 7 following']
['Mon, 05 - Wed, 07 Oct 2020', 'NetDiligence Cyber Risk Summit', 'Loews Santa Monica Beach Hotel, Santa Monica 14.6 Miles from Los Angeles', 'NetDiligence Cyber Risk Summit will conference are attended by hundreds of cyber risk insurance, legal/regulatory and security/privacy technology leaders from all over the world. Connect with leaders in...', 'IT & Technology', 'Interested']
... etc.
Trying to get the text and href for top news but not able to scrape it.
website : News site
My code:
import requests
from bs4 import BeautifulSoup
import psycopg2
import time
def checkResponse(url):
response = requests.get(url)
if response.status_code == 200:
return response.content
else:
return None
def getTitleURL():
url = 'http://sandesh.com/'
response = checkResponse(url)
if response is not None:
html = BeautifulSoup(response, 'html.parser')
for values in html.find_all('div', class_='d-top-news-latest'):
headline = values.find(class_='d-s-NSG-regular').text
url = values.find(class_='d-s-NSG-regular').['href']
print(headline + "->" + url)
if __name__ == '__main__':
print('Getting the list of names....')
names = getTitleURL()
print('... done.\n')
Output:
Getting the list of names....
Corona live
મેડિકલ સ્ટાફ પર હુમલા અંગે અમિત શાહે ડોક્ટર્સ સાથે કરી ચર્ચા, સુરક્ષાની ખાતરી આપતા કરી અપીલ
Ahmedabad
ગુજરાતમાં કૂદકેને ભૂસકે વધ્યો કોરોના વાયરસનો કહેર, આજે નવા 94 કેસ નોંધાયા, જાણો કયા- કેટલા કેસ નોંધાયા
Corona live
જીવન અને મોત વચ્ચે સંઘર્ષ કરી રહ્યો છે દુનિયાનો સૌથી મોટો તાનાશાહ કિમ જોંગ! ટ્રમ્પે કહી આ વાત
Ahmedabad
અમદાવાદમાં નર્સિંગ સ્ટાફનો ગુસ્સો ફૂટ્યો, ‘અમારું કોઈ સાંભળતું નથી, અમારો કોરોના ટેસ્ટ જલદી કરાવો’
Business
ભારતીય ટેલિકોમ જગતમાં સૌથી મોટી ડીલ, ફેસબુક બની જિયોની સૌથી મોટી શેરહોલ્ડર
->http://sandesh.com/amit-shah-talk-with-ima-and-doctors-through-video-conference-on-attack/
... done.
I want to skip text inside the tag and also I am able to get only 1 href. Also the headline is a list.
how do I get each title and url.
I am trying to scrape the part in red:
First, At for values in html.find_all('div', class_='d-top-news-latest') you don't need use for because at DOM just have one class d-top-news=latest.
Second, to get the title, you can use select('span') because of your title into the span tag.
Third, you knew the headline is a list, so you need to use for to get each title and URL.
values = html.find('div', class_='d-top-news-latest')
for i in values.find_all('a', href = True):
print(i.select('span'))
print(i['href'])
OUTPUT
Getting the list of names....
[<span>
Corona live
</span>]
http://sandesh.com/maharashtra-home-minister-anil-deshmukh-issue-convicts-list-of-
palghar-case/
[<span>
Corona live
</span>]
http://sandesh.com/two-doctors-turn-black-after-treatment-of-coronavirus-in-china/
[<span>
Corona live
</span>]
http://sandesh.com/bihar-asi-gobind-singh-suspended-for-holding-home-guard-jawans-
after-stopping-officers-car-asi/
[<span>
Ahmedabad
</span>]
http://sandesh.com/jayanti-ravi-surprise-statement-sparks-outcry-big-decision-taken-
despite-more-patients-in-gujarat/
[<span>
Corona live
</span>]
http://sandesh.com/amit-shah-talk-with-ima-and-doctors-through-video-conference-on-
attack/
... done.
to remove the "span" part:
values = html.find('div', class_='d-top-news-latest')
for i in values.find_all('a', href=True):
i.span.decompose()
print(i.text)
print(i['href'])
Output:
Getting the list of names....
ગુજરાતમાં કોરોનાનો કહેરઃ રાજ્યમાં આજે કોરોનાના 135 નવા કેસ, વધુ 8 લોકોનાં મોત
http://sandesh.com/gujarat-corona-update-206-new-cases-and-18-deaths/
ચીનના વૈજ્ઞાનિકોએ જ ખોલી જીનપિંગની પોલ, કોરોના વાયરસને લઈને કર્યો સનસની ખુલાસો
http://sandesh.com/chinese-scientists-claim-over-corona-virus/
શું લોકડાઉન ફરી વધારાશે? PM મોદી 27મીએ ફરી એકવાર તમામ CM સાથે કરશે ચર્ચા
http://sandesh.com/pm-modi-to-hold-video-conference-with-cms-on-april-27-lockdown-
extension/
કોરોના વાયરસને લઈ મોટી ભવિષ્યવાણી, દુનિયાના 30 દેશો પર ઉભુ થશે ભયંકર સંકટ
http://sandesh.com/after-corona-attack-now-hunger-will-kill-many-people-in-the-world/
દેશમાં 24 કલાકમાં 1,486 કોરોનાનાં નવા કેસ, પરંતુ મળ્યા સૌથી મોટા રાહતનાં સમાચાર
http://sandesh.com/recovery-rate-increased-in-corona-patients-says-health-ministry/
... done.
I am trying to iterate over a large list of dealership names and cities. I want to have it refer back to the list and loop over each entry and get the results separately.
#this is only a portion of the delers the rest are in a file
Dealers= ['Mossy Ford', 'Abel Chevrolet Pontiac Buick', 'Acura of Concord', 'Advantage Audi' ]
driver=webdriver.Chrome("C:\\Users\\kevin\\Anaconda3\\chromedriver.exe")
driver.set_page_load_timeout(30)
driver.get("https://www.bbb.org/")
driver.maximize_window()
driver.implicitly_wait(10)
driver.find_element_by_xpath("/html/body/div[1]/div/div/div/div[2]/div[1]/div/div[2]/div/form/div[2]/div[2]/button").click()
driver.find_element_by_xpath("""//*[#id="findTypeaheadInput"]""").send_keys("Mossy Ford")
driver.find_element_by_xpath("""//*[#id="nearTypeaheadInput"]""").send_keys("San Diego, CA")
driver.find_element_by_xpath("""/html/body/div[1]/div/div/div/div[2]/div[1]/div/div[2]/div/form/div[2]/button""").click()
driver.implicitly_wait(10)
driver.find_element_by_xpath("/html/body/div[1]/div/div/div/div[2]/div/div[2]/div[2]/div[1]/div[6]/div").click()
driver.implicitly_wait(10)
driver.find_element_by_xpath('/html/body/div[1]/div/div/div/div[2]/div[2]/div/div/div[1]/div/div[2]/div/div[2]/a').click()
#contact_names= driver.find_elements_by_xpath('/html/body/div[1]/div/div/div/div[2]/div/div[5]/div/div[1]/div[1]/div/div/ul[1]')
#print(contact_names)
#print("Query Link: ", driver.current_url)
#driver.quit()
from selenium import webdriver
dealers= ['Mossy Ford', 'Abel Chevrolet Pontiac Buick', 'Acura of Concord']
cities = ['San Diego, CA', 'Rio Vista, CA', 'Concord, CA']
driver=webdriver.Chrome("C:\\Users\\kevin\\Anaconda3\\chromedriver.exe")
driver.set_page_load_timeout(30)
driver.get("https://www.bbb.org/")
driver.maximize_window()
driver.implicitly_wait(10)
driver.find_element_by_xpath("/html/body/div[1]/div/div/div/div[2]/div[1]/div/div[2]/div/form/div[2]/div[2]/button").click()
for d in dealers:
driver.find_element_by_xpath("""//*[#id="findTypeaheadInput"]""").send_keys("dealers")
for c in cities:
driver.find_element_by_xpath("""//*[#id="nearTypeaheadInput"]""").send_keys("cities")
driver.find_element_by_xpath("""/html/body/div[1]/div/div/div/div[2]/div[1]/div/div[2]/div/form/div[2]/button""").click()
driver.implicitly_wait(10)
driver.find_element_by_xpath("/html/body/div[1]/div/div/div/div[2]/div/div[2]/div[2]/div[1]/div[6]/div").click()
driver.implicitly_wait(10)
driver.find_element_by_xpath('/html/body/div[1]/div/div/div/div[2]/div[2]/div/div/div[1]/div/div[2]/div/div[2]/a').click()
contact_names= driver.find_elements_by_class_name('styles__UlUnstyled-sc-1fixvua-1 ehMHcp')
print(contact_names)
print("Query Link: ", driver.current_url)
driver.quit()
I want to be able to go to each of these different dealerships pages and pull all of their details then loop thru the rest. I am just struggling with the ideas of for loops within selenium.
Its better to create a dictionary with a mapping of dealer and city and loop through
Dealers_Cities_Dict = {
Dealers_Cities_Dict = {
"Mossy Ford": "San Diego, CA",
"Abel Chevrolet Pontiac Buick": "City",
"Acura of Concord'": "City",
"Advantage Audi'": "City"
}
for dealer,city in Dealers_Cities_Dict.items():
//This is where the code sits
driver.find_element_by_id("findTypeaheadInput").send_keys(dealer)
driver.find_element_by_id("nearTypeaheadInput").send_keys(city)
[Problem]
I can't request left mouse click event to marker for activating tooltip through selenium.
[My intention]
scraping (crawling) text from tooltip window on map marker from this web service with selenium (python code)
daum service web map: http://www.socar.kr/reserve#jeju
<map id="daum.maps.Marker.Area:13u" name="daum.maps.Marker.Area:13u"><area href="javascript:void(0)" alt="" shape="rect" coords="0,0,40,38" title="" style="-webkit-tap-highlight-color: transparent;"></map>
<div class="tooltip myInfoWindow"><h4><a class="map_zone_name" href="#"><em class="map_zone_id" style="display:none;">2390</em><span title="제주대 후문주차장">제주대 후문주차장</span><span class="bg"></span></a></h4><p><a title="제주도 제주시 아라1동 368-60">제주도 제주시 아라1동 368-6...</a><br>운영차량 : 총 <em>4</em>대</p><p class="btn"><em class="map_zone_id" style="display:none;">2390</em><a class="btn_overlay_search" href="#"><img src="/template/asset/images/reservation/btn_able_socar.png" alt="예약가능 쏘카 보기"></a></p><img src="/template/asset/images/reservation/btn_layer_close.png" alt="닫기"></div>
P.S : is it possible crawling text of tooltip window on google map marker
When you click a tooltip, an xhr request is sent to https://api.socar.kr/reserve/zone_info using a zone_id, you may have to filter out the zones you want by using the page content, I don't have any more time to spend on this right now but this recreates the requests:
import requests
from time import time, sleep
# These params will be for https://api.socar.kr/reserve/oneway_zone_list
# which we can get the zone_ids from.
params = {"type": "start", "_": str(time())}
# We use the zone_id from each dict we parse from the json receievd
params2 = {"zone_id": ""}
with requests.Session() as s:
s.get("http://www.socar.kr/reserve#jeju")
s.headers.update({
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64)"})
r = s.get("https://api.socar.kr/reserve/oneway_zone_list", params=params)
result = r.json()["result"]
for d in result:
params2["zone_id"] = d["zone_id"]
params2["_"] = str(time())
sleep(1)
r2 = s.get("https://api.socar.kr/reserve/zone_info", params=params2)
print(r2.json())
Each d in result is a dict like:
{u'zone_lat': u'37.248859', u'zone_id': u'2902', u'zone_region1_short': u'\uacbd\uae30', u'zone_open_time': u'00:00:00', u'zone_region1': u'\uacbd\uae30\ub3c4', u'zone_close_time': u'23:59:59', u'zone_name': u'SK\ud558\uc774\ub2c9\uc2a4 \uc774\ucc9c', u'open_weekend': u'close', u'zone_region3': u'\ubd80\ubc1c\uc74d', u'zone_region2': u'\uc774\ucc9c\uc2dc', u'zone_lng': u'127.490639', u'zone_addr': u'\uacbd\uae30\ub3c4 \uc774\ucc9c\uc2dc \ubd80\ubc1c\uc74d \uc544\ubbf8\ub9ac 707'}
There probably other info in that that would allow you to filter by specific place, I don't speak korean so I cannot completely follow how the data relates.
The second requests gives us a dict like:
{u'retCode': u'1', u'retMsg': u'', u'result': {u'oper_way': u'\uc655\ubcf5', u'notice': u'<br>\u203b \ubc18\ub4dc\uc2dc \ubc29\ubb38\uc790 \uc8fc\ucc28\uc7a5 \uc9c0\uc815\uc8fc\ucc28\uad6c\uc5ed\uc5d0 \ubc18\ub0a9\ud574\uc8fc\uc138\uc694.<br>', u'notice_oneway': u'', u'zone_addr': u'\uacbd\uae30\ub3c4 \uc774\ucc9c\uc2dc \ubd80\ubc1c\uc74d \uc544\ubbf8\ub9ac 707', u'total_num': 2, u'able_num': 2, u'visit': u'\uc131\uc6b02\ub2e8\uc9c0 \uc544\ud30c\ud2b8 \uae30\uc900 \uc804\ubc29 \ud604\ub300\uc5d8\ub9ac\ubca0\uc774\ud130 \ubc29\uba74\uc73c\ub85c \ud6a1\ub2e8\ubcf4\ub3c4 \uc774\uc6a9 \ud6c4 \ud558\uc774\ub2c9\uc2a4 \uc774\ucc9c \ubc29\ubb38\uc790 \uc8fc\ucc28\uc7a5 \ub0b4 \uc3d8\uce74\uc804\uc6a9\uc8fc\ucc28\uad6c\uc5ed', u'zone_alias': u'\ud558\uc774\ub2c9\uc2a4 \ubc29\ubb38\uc790 \uc8fc\ucc28\uc7a5', u'zone_attr': u'[\uc774\ubca4\ud2b8]', u'state': True, u'link': u'http://blog.socar.kr/4074', u'oper_time': u'00:00~23:59', u'lat': u'37.248859', u'zone_name': u'SK\ud558\uc774\ub2c9\uc2a4 \uc774\ucc9c', u'lng': u'127.490639', u'zone_props': 0, u'visit_link': u'http://dmaps.kr/24ij6', u'zone_id': u'2902'}}
Again not sure of all that is in there but you can see html tags under u'notice and lots of other info.