how to get the value from specific nested class with xpath - python-3.x

im trying to get the value of the class "sum_num" with xpath .i have 4 classes witrh the same name
when i'm running the code, i'm getting the value '0' or the value for the 3rd class, which is the span text - "lblPrice1"
the class "sum_num" is exsiting 4 times in the pages
but i need only the value or the 2nd one.
how to get only the 2nd value from the class "sum_num" " ?
and more - is this the best way to crawl a web page ?
python (i have tried both option):
cost = product_link_selector.xpath('//div[./div/#class="product_code_price"]div/div/div/#class = sum_num/text()').get()
cost = product_link_selector.xpath('//*[contains(#class,"item_sum_group product compare_main")]//*[contains(#class, "sum_num")]').get()

You can use the index to get the 2nd item. Here is the sample code for using the index.
(//*[#attribute='attribute_value')[index]
Try with the below.
product_link_selector.xpath('(//*[contains(#class,"item_sum_group product compare_main")]//*[contains(#class, "sum_num")])[2]').get()

Related

Scrapy: Field Name Derived from Page Content

I am looking at pages that are structured in the following way, though the exact elements may not be a table. In general, there are key-value pairs where the count of keys are limited up to 3 per page (but not necessarily in a particular order) and the keys vary from page to page (and I otherwise have no way to know what all of the keys may be without pre-scraping every possible page). Also, there should not be repeats of a key in the same page (e.g., A -> 1, B -> 2, A -> 3). I don't have issues isolating the keys, values from the page using XPath, just on storing and exporting the values from my Spider.
Approach 1
If I use the dictionary approach with something like this pseudocode:
for th, td in table:
item[th.text()] = td.text()
Then the result would only show values for A, B, C because those values exist in the first page processed and only the headers and values for the first request are maintained.
Approach 2
If I use the scrapy.item.Item() and scrapy.item.Field() approach with something like this:
class MyItem(Item):
A = Field()
B = Field()
C = Field()
Then I have no way of declaring a value for the unknown values (shown as ...). And I'll receive a KeyError when trying to set the value (either directly or using an ItemLoader.add_value()).
I am using Python 3.8 and Scrapy 2.4.1.

Sorting custom columns with django-datatables-view

Using Django 3, python 3.6, django-datatable-view 1.19.1
Trying to do a datatable with columns from my model and computed before output ones.
I draw all values that I needed but after trying to sort custom column getting an error:
Cannot resolve keyword 'XXXX' into field. Choices are: ...
I've found a way to register a column as virtual column, without db source, but it was in django-datatable-view 0.5.4 docs but those ways don't work anymore. In last version documentation links with info that I need are unavailable.
Please, help me to figure out, how can I deal with custom computed columns from my model's fields( sort, render )
This is a little tricky now.
To define computed fields:
class ListJson(BaseDatatableView):
columns = ["id", "status_code", "computed_field"]
order_columns = ["id", "status_code"]
# Override render_column method
def render_column(self, row, column):
if column == "computed_field":
return row.computed_field()
else:
return super(ListJson, self).render_column(row, column)
This allows you to return the computed_field method value().
The situation becomes complicated when we want to sort on calculated fields. In this case, it is best to disable serverSide operations in JavaScript
$.extend($.fn.dataTable.defaults, {
serverSide: false,
});
However, you will then have to return all the lines at once, which can kill the server.
If you want to sort on the backend side, you need to arrange and return the appropriate queryset with a virtual field.
class ListJson(BaseDatatableView):
def get_initial_queryset(self):
return qs
Just build your query like THIS.

how to use selenium python to open new chat whatsapp (i need to target the second icon New Chat)

I need to target the second icon New Chat but they have the same class name
from selenium import webdriver
driver = webdriver.Chrome('C:/Users/ka-my/AppData/Local/Programs/Python/Python37-32/chromedriver')
driver.get('https://web.whatsapp.com/')
input('Enter anything after scanning QR code')
user1 = driver.find_element_by_class_name('_3j8Pd')
user1.click()
1.i need to target the second icon New Chat
just like Facebook and google the class names are dynamically generated So the best way around that is to look for something constant which is the icon string
new_chat = driver.find_elements_by_xpath('//div[#title="New chat"]') # return a list
if new_chat:
new_chat[0].click()
To get 2nd icon in new chat, you can use this:
# get the 2nd element in the list
second_icon = driver.find_elements_by_xpath("//div[#class='_3j8Pd']")[1]
Or:
# get the 2nd element in the list
second_icon = driver.find_elements_by_xpath("//div[#class='_3j8Pd'][2]")
In first example, we are getting a list of all the div elements, and picking the 2nd item using the [1] index. In second example, we are using element index in XPath [2] to get the second element in the list. List index is 0-based and XPath element index is 1-based, so that is why we see 1 and 2 here.

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?
After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

Binding an edit box within a custom control to a form field programatically

I have a notes form with a series of fields such as city_1, city_2, city_3 etc.
I have an XPage and on that XPage I have a repeat.
The repeat is based on an array with ten values 1 - 10
var repArray = new Array() ;
for (var i=1;i<=10;i++) {
repArray.push(i) ;
}
return(repArray) ;
Within the repeat I have a custom control which is used to surface the fields city_1 through city_10
The repeat has a custom property docdatasource which is passed in
It also has a string custom property called cityFieldName which is computed using the repeat
collection name so that in the first repeat row it is city_1 and in the second it is city_2 etc..
The editable text field on the custom control is bound using the EL formula
compositeData.docdatasource[compositeData.cityFieldName]
This works fine but each time I add new fields I have to remember to create a new custom property and then a reference to it on the parent page.
I would like to be able to simply compute the data binding such as
compositeData.docdatasource['city_' + indexvar]
where indexvar is a variable representing the current row number.
Is this possible ? I have read that you cannot use '+' in Expression Language.
First: you wouldn't need an array for a counter. Just 10 would do (the number) - repeats 10 times too. But you could build an array of arrays:
var repArray = [];
for (var i=1;i<=10;i++) {
repArray.push(["city","street","zip","country","planet"]) ;
}
return repArray;
then you should be able to use
#{datasource.indexvar[0]}
to bind city,
#{datasource.indexvar[1]}
to bind street. etc.
Carries a little the danger of messing with the sequence of the array, if that's a concern you would need to dig deeper in using an Object here.
compute to javascript and use something like
var viewnam = "#{" + (compositeData.searchVar )+ "}"
return viewnam
make sure this is computed on page load in the custom control
I was never able to do the addition within EL but I have been very successful with simply computing the field names outside the custom control and then passing those values into the custom control.
I can send you some working code if you wish from a presentation I gave.

Resources