Scrape the salary from indeed.com [closed] - python-3.x

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to scrape salary from indeed.com using beautiful soup.
The salary is given as:
<div id="vjs-jobinfo">
<div id="vjs-jobtitle">Senior Data Scientist/ Machine learning engineer</div>
<div>
<span id="vjs-cn">Intellify</span>
<span id="vjs-loc"> - Sydney NSW</span>
</div>
<div>
<span>$120,000 - $160,000 a year</span>
-
<span>Full-time, Part-time</span>
</div>
</div>
My solution:
new_soup = BeautifulSoup(new_html, 'html.parser', from_encoding='utf-8')
for titles in new_soup.find_all('div',{'id':'vjs-jobtitle'}):
print(titles.text)
print('\n')
for company_name in new_soup.find_all('span',{'id':'vjs-cn'}):
print(company_name.text)
print('\n')
for company_location in new_soup.find_all('span',{'id':'vjs-loc'}):
print(company_location.text)
But I can't get the salary as I have span has no attribute there. Can anyone help please.

You can use CSS-style selectors like:
new_soup.select_one("div#vjs-jobinfo div:nth-of-type(3)").findChild().text

One solution is since you know that the salary is in the 3rd span tag, you can directly access it
all_span=new_soup.find_all("span")
salary=all_span[2].getText()
#$120,000 - $160,000 a year
EDIT : Since you know salary must start with a dollar symbol, you can also use regex to find it
salary=new_soup.find('span', text=re.compile(r'^\$')).getText()
#$120,000 - $160,000 a year

Related

get css element with space selenium [duplicate]

This question already has answers here:
Is there a way to find an element by attributes in Python Selenium?
(3 answers)
Closed 3 years ago.
Is there a structure below that I need to get the value of the tag between>
For example
driver.find_element_by_xpath (u '//span[contains(text(),"name")]')
HTML
<span itemprop="name">Colombia U20 vs Ukraine U20</span>
Try the following xpath.
driver.find_element_by_xpath('//span[contains(text(),"Colombia U20 vs Ukraine U20")]').text
OR
driver.find_element_by_xpath('//span[contains(.,"Colombia U20 vs Ukraine U20")]').text
OR
driver.find_element_by_xpath('//span[#itemprop="name"]').text
If you want to use css selector try that.
driver.find_element_by_css_selector('span[itemprop="name"]').text

Start scraping only after and before certain element [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Here's what the HTML looks like:
<h4>Categories</h4>
<ul>
<li>Cars</li>
<li>Bikes</li>
<li>Planes</li>
</ul>
<h4>Brands</h4>
<ul>
<li>Audi</li>
<li>BMW</li>
<li>Mercedes</li>
</ul>
<h4>FAQ</h4>
<ul>
<li>FAQ1</li>
<li>FAQ2</li>
<li>FAQ3</li>
</ul>
I'm trying to extract only the brands using Scrapy. There are no distinguishing features between the category vs. brands section except that the H4 begins the new section. Also, there are many categories and brands so it's hard to hardcode it.
You can use the following or following-sibling axis.
For instance, in order to get the brands you can get to the desired h4 element by text and then get to the next ul sibling via following-sibling:
//h4[. = 'Brands']/following-sibling::ul[1]/li/text()
Demo from the Scrapy shell:
$ scrapy shell ./index.html
>>> response.xpath("//h4[. = 'Brands']/following-sibling::ul[1]/li/text()").extract()
['Audi', 'BMW', 'Mercedes']

kentico site and DesignMode css conflicts [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have my custom css (all Compass/SASS based) added to the site from the Master page, rather than include the CSS in the site settings. While in Design View, my custom css, has overridden the items from DesignMode.css.
My master page is loading in my compiled CSS this way:
<link href="/CMSPages/GetResource.ashx?stylesheetfile=/KFF/SalesForce/main.css" type="text/css" rel="stylesheet" />
What is the best method to isolate my CSS from the designview.css?
Might want to take a look at this article
I think you can also do something like this in your master page:
{% ViewMode == "LiveSite" ? "StyleSheet Link": "" %}

Plotting Multiple barcharts using Flot API [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
hi i am using FLOT charts API to show data in the form of bar charts in my application. I have got requirement to show the data in bar chart and in categories with Pre Data and Post data some thing as shown in the picture like this
Sample Diagram
Please tell me how to take the data to plot the bar chart
You need a plugin for this.
Check OrderBars and then use data like:
var series = [];
series.push({
data: [], // your raw data
bars: {
order: 0
}
});
series.push({
data: [], // your raw data
bars: {
order: 1
}
});
Example: http://jsfiddle.net/ZRXP5/
My example uses Mootools, but you find the jQuery version (.js file) in the link above.

Robots.TXT and Meta Tag Robots [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to make sure I understand this:
This: <meta content="noindex, nofollow" name="robots" /> in the <head> of a webpage
is the same as:
Disallow: /example-page.html in the Robots.txt
Right?
in the <head> of a webpage is the same as: Yes, if you are talking about the <head> of the example-page.html. The only difference is that when you have restriction for bots in the meta tag, the page will still be requested by the spider. This might be essential if that page is generated by any server-side script and you count the number of times it was displayed or gather any other information related to the visits to this page (from access logs, for example).
The bot (the valid bot from normal search engines) will access the page, read the meta tag, and subsequently not index it, while with the record in robots.txt no request of the page will be performed by the generic spider or the one mentioned in User-agent section of robots.txt.

Resources