Accessing Span Elements - web

When trying to scrape out the integer value for movie reviews from IMDB reviews , i am confused on how to access the rating when its inspect html is just listed as, 10, and changes for each individual rating (i.e 7 . How would I use the soup.find_all to access these values and add them to a list- i am confused how to do this when there is no class listed for the variable?
rate=soup.find_all('span')
rate_list=[]
for i in range(0,len(rate)):
rate_list.append(rate[i].get_text())

Try using the fact the target span sits next to the star
ratings = [i.text for i in soup.select('.ipl-star-icon + span')]
But, in case there are ratings for everything I would probably loop reviews (for review in soup.select('.lister-item-content): .....) and test if review.select_one('.ipl-star-icon + span') is not None

Related

Add element in many to many field and preserve order

class Country(Models.Model):
code = models.CharField(max_length=50)
name = models.CharField(max_length=500)
class Meta:
unique_together = (('code', 'name'),)
db_table = 'md_country'
class UserSettings(models.Model):
...
default_countries = models.ManyToManyField(Country, db_table='user_default_countries', related_name='default_countries')
I have two models inside django models, what im trying is when i add Country models to default_countries i want to preserve order. Currently when i append manytomany field django automatically sort by Country name (alphabetical order)
I have this code
# iterate one by one to preserve fetching order
country_models = [Country.objects.get(id=_id) for _id in request.data[default_countries]]
user_settings.default_countries.clear()
for c in country_models:
user_settings.default_countries.add(c)
After this when i inspect user_settings.default_countries i have ordered countries by name in alphabetical order.
I want to preserve when adding element. If i want to add France and Australia and i order the list like that i on the end when i pull data from db i want it to be ordered like that. Now on this example i have Australia then France.
EDIT:
I checked the database and when inserting the data, it insert in right order
For example if i want France(73) then Australia(13), France has smaller id so its inserted first. There is a problem with django when pulling the data from database.
So as I understand correct you want to sort by insert order:
someSetting = UserSettings.objects.first()
countries = someSetting.default_countries.order_by('id')
I found the workaround.
Firstly i defined new property inside model where default_countries is.
#property
def ordered_default_countries(self):
return self.default_countries.all().order_by('-id')
Then in serializer where i serialize this field i just pointed default_countries field to ordered_default_countries.

how to get the value from specific nested class with xpath

im trying to get the value of the class "sum_num" with xpath .i have 4 classes witrh the same name
when i'm running the code, i'm getting the value '0' or the value for the 3rd class, which is the span text - "lblPrice1"
the class "sum_num" is exsiting 4 times in the pages
but i need only the value or the 2nd one.
how to get only the 2nd value from the class "sum_num" " ?
and more - is this the best way to crawl a web page ?
python (i have tried both option):
cost = product_link_selector.xpath('//div[./div/#class="product_code_price"]div/div/div/#class = sum_num/text()').get()
cost = product_link_selector.xpath('//*[contains(#class,"item_sum_group product compare_main")]//*[contains(#class, "sum_num")]').get()
You can use the index to get the 2nd item. Here is the sample code for using the index.
(//*[#attribute='attribute_value')[index]
Try with the below.
product_link_selector.xpath('(//*[contains(#class,"item_sum_group product compare_main")]//*[contains(#class, "sum_num")])[2]').get()

How to check if all element of a list is inside of a list of strings

Im parsing a website to catch available products and there sizes. Theres 3 products loaded. Theres a list named 'find_id_1' that houses 3 elements, each element has the product name and their variant ids. I made 2 other list one named keywords and one named negative. the keywords list houses the keywords that my desired product title should have. If any elements from the negative list are in the product title then I don't want that product.
found_product = []
keywords = ['YEEZY','BOOST','700']
negative = ['INFANTS','KIDS']
find_id_1 = ['{"id":2069103968384,"title":
"\nYEEZY BOOST 700 V2","handle":**"yeezy-boost-700-v2-vanta-june-6"**,
[{"id":19434310238336,"parent_id":2069103968384,"available":true,
"sku":"193093889925","featured_image":null,"public_title":null,
"requires_shipping":true,"price":30000,"options"',
'{"id":2069103935616,"title":"\nYEEZY BOOST 700 V2 KIDS","handle":
"yeezy-boost-700-v2-vanta-kids-june-6",`
["10.5k"],"option1":"10.5k","option2":"",
`"option3":"","option4":""},{"id":19434309845120,"parent_id":2069103935616,
"available":false,"sku":"193093893625","featured_image":null,
"public_title":null,"requires_shipping":true,"price":18000,"options"',
'{"id":2069104001152,"title":"\nYEEZY BOOST 700 V2 INFANTS",
"handle":**"yeezy-boost-700-v2-vanta-infants-june-6"***,`
["4K"],"option1":"4k","option2":"",`
"option3":"","option4":""},{"id":161803398876,"parent_id":2069104001152,
"available":false,"sku":"193093893724",
"featured_image":null,"public_title":null,
"requires_shipping":true,"price":15000,"options"']
I've tried using a for loop to iterate through every element in find_info_1 then creating another for loop that iterates through every element in keyword and negative but i get the wrong product. Heres my code:
for product in find_id_1:
for key in keywords:
for neg in negative:
if key in product:
if neg not in product:
found_product = product
It prints the following:
'{"id":2069104001152,"title":"\nYEEZY BOOST 700 V2 INFANTS",
"handle":"yeezy-boost-700-v2-vanta-infants-june-6,`
["4K"],"option1":"4k","option2":"",`
"option3":"","option4":""},
{"id":161803398876,"parent_id":2069104001152,
"available":false,"sku":"193093893724",
"featured_image":null,"public_title":null,
"requires_shipping":true,"price":15000,"options"']
Im trying to get it to return element 0 from find_info_1 because thats the only one that doesn't have any of the elements from the list negative. Would using a for loop be the best and fastest way to iterate through my list? Thank you! Any help is welcome!
First of all you should'nt treat a json data as a string. Just parse the json using json library so you can check just the title of the product. As the product list and the specification of each of the product get bigger, the time taken for iteration increases.
To answer your question, you can simply do
for product in find_id_1:
if any(key in product for key in keywords):
if not any(neg in product for neg in negative):
found_product.append(product)
this will get you the element as per your specification. however I made some changes to your data, just to make it a valid python code..
found_product = []
keywords = ['YEEZY','BOOST','700']
negative = ['INFANTS','KIDS']
find_id_1 = [""""'{"id":2069103968384,"title":
"\nYEEZY BOOST 700 V2","handle":**"yeezy-boost-700-v2-vanta-june-6"**,
[{"id":19434310238336,"parent_id":2069103968384,"available":true,
"sku":"193093889925","featured_image":null,"public_title":null,
"requires_shipping":true,"price":30000,"options"'""",
""""'{"id":2069103935616,"title":"\nYEEZY BOOST 700 V2 KIDS","handle":
"yeezy-boost-700-v2-vanta-kids-june-6",`
["10.5k"],"option1":"10.5k","option2":"",
`"option3":"","option4":""},{"id":19434309845120,"parent_id":2069103935616,
"available":false,"sku":"193093893625","featured_image":null,
"public_title":null,"requires_shipping":true,"price":18000,"options"'""",
""""'{"id":2069104001152,"title":"\nYEEZY BOOST 700 V2 INFANTS",
"handle":**"yeezy-boost-700-v2-vanta-infants-june-6"***,`
["4K"],"option1":"4k","option2":"",`
"option3":"","option4":""},{"id":161803398876,"parent_id":2069104001152,
"available":false,"sku":"193093893724",
"featured_image":null,"public_title":null,
"requires_shipping":true,"price":15000,"options"'"""]
for product in find_id_1:
if any(key in product for key in keywords):
if not any(neg in product for neg in negative):
found_product.append(product)
print(found_product)

Insert values into API request dynamically?

I have an API request I'm writing to query OpenWeatherMap's API to get weather data. I am using a city_id number to submit a request for a unique place in the world. A successful API query looks like this:
r = requests.get('http://api.openweathermap.org/data/2.5/group?APPID=333de4e909a5ffe9bfa46f0f89cad105&id=4456703&units=imperial')
The key part of this is 4456703, which is a unique city_ID
I want the user to choose a few cities, which then I'll look through a JSON file for the city_ID, then supply the city_ID to the API request.
I can add multiple city_ID's by hard coding. I can also add city_IDs as variables. But what I can't figure out is if users choose a random number of cities (could be up to 20), how can I insert this into the API request. I've tried adding lists and tuples via several iterations of something like...
#assume the user already chose 3 cities, city_ids are below
city_ids = [763942, 539671, 334596]
r = requests.get(f'http://api.openweathermap.org/data/2.5/groupAPPID=333de4e909a5ffe9bfa46f0f89cad105&id={city_ids}&units=imperial')
Maybe a list is not the right data type to use?
Successful code would look something like...
r = requests.get(f'http://api.openweathermap.org/data/2.5/group?APPID=333de4e909a5ffe9bfa46f0f89cad105&id={city_id1},{city_id2},{city_id3}&units=imperial')
Except, as I stated previously, the user could choose 3 cities or 10 so that part would have to be updated dynamically.
you can use some string methods and list comprehensions to append all the variables of a list to single string and format that to the API string as following:
city_ids_list = [763942, 539671, 334596]
city_ids_string = ','.join([str(city) for city in city_ids_list]) # Would output "763942,539671,334596"
r = requests.get('http://api.openweathermap.org/data/2.5/group?APPID=333de4e909a5ffe9bfa46f0f89cad105&id={city_ids}&units=imperial'.format(city_ids=city_ids_string))
hope it helps,
good luck

How to efficiently use Django query and Q to filter each object in a queryset and return 1 field value for each unique field in the queryset

I have query that returns the following queryset:
results = <QuerySet [<Product: ItemA>, <Product: ItemA>, <Product: ItemB>, <Product: ItemB>, <Product: ItemB>, <Product: ItemC>, <Product: ItemC>]>
The __str__ representation of the model is name and each Product variation likely has a different value for the price field. After this query, I need to search my database for each Product in the queryset and return the lowest price for each unique name so like:
Lowest price for all in database where name is == to ItemA
Lowest price for all in database where name is == to ItemB
Lowest price for all in database where name is == to ItemC
I use the following block of code to accomplish this goal:
query_list = []
for each in results:
if each.name not in query_list: #Checks if the name of the object is not in in the query list
query_list.append(each.name) #Adds just the name of the objects so there is just one of each name in query_list
for each in query_list:
priced = results.filter(name=each).order_by('price').first() #Lowest price for each name in query_list
This feel very inefficient. Is there a way to make a similar computation without having to append the unique name of each Product to a separate list, and iterating over that list, and then making a query for each one? I feel like there is a way to use a type of complex lookup to accomplish my goals, maybe event use less Python, and make the db do more of the work, but the above is the best I've been able to figure out so far. There can be a lot of different hits in results so I need this block to be as efficient as possible
It is easy after reading docs Generating aggregates for each item in a QuerySet and also "Interaction with default ordering or order_by()".
from django.db.models import Min
prices = {x['name']: x['lowest_price']
for x in results.values('name').annotate(lowest_price=Min('price').order_by()}
for product in results:
if product.name in prices and product.price == prices[product.name]:
priced = row # output the row
del prices[product.name]
That runs by two database queries.
An even more efficient solution with one query is probably possible with Window function, but it requires an advanced database backend and it can't work e.g. in tests with sqlite3.

Resources