Extract number when scraping - python-3.x

I try to scrape som data from an apartment listing site.
I want to use the price to calculate. So I need to store it as numbers. But it's written like text on the website like this: 5 670 money/month
I want to remove all the characters and spaces, Then make it an integer to save in my db.
I tried regular expression, but get this error.
TypeError: expected string or bytes-like object
This is a element I collect the price from.
<p class="info-price">399 euro per month</p>
I get the price with xpath like this
p = response.xpath('//p[#class="info-price"]/text()').extract()
And the output when I collect name of object and price would be like this
{'object': ['North West End 24'], 'price': ['399\xa0euro\xa0per\xa0month']}
How and when should I convert it?

So I found a solution. Maybe it's a dirty solution and someone comes along with elegant one-liner.
But as I understand, the text I scrape with this line
p = response.xpath('//p[#class="info-price"]/text()').extract()
is a list object.
So I add a line to 'convert' it to sa string with this code
p = ''.join(map(str, p)) #Convert to string from list object
And finally to remove all space and text, so I end up with just the price in numbers I use this code
p = re.sub('\D', '', p) #Remove all but numbers
So all in all this snippet takes the text of the price, convert it to string and then removes all but niumbers.
p = response.xpath('//p[#class="info-price"]/text()').extract()
p = ''.join(map(str, p)) #Convert to string from list object
p = re.sub('\D', '', p) #Remove all but numbers

What the .extract() method does is find all occurences of your xpath expression; that's why it returns a list - there might be more than one result. If you know there's only one result or only care about the first one, use .extract_first() instead - it will return the first result as a string (or None, if no match is found), so you don't have to convert the list to a string. (See https://docs.scrapy.org/en/latest/topics/selectors.html#id1)
p = response.xpath('//p[#class="info-price"]/text()').extract_first()

Related

Convert an unknown data item to string in Python

I have certain data that need to be converted to strings. Example:
[ABCGHDEF-12345, ABCDKJEF-123235,...]
The example above does not represent a constant or a string by itself but is taken from an Excel sheet (ranging upto 30+ items for each row). I want to convert these to strings. Since data is undefined, explicitly converting them doesn't work. Is there a way to do this iteratively without placing double/single quotes manually between each data element?
What I want finally:
["ABCGHDEF-12345", "ABCDKJEF-123235",...]
To convert the string to list of strings you can try:
s = "[ABCGHDEF-12345, ABCDKJEF-123235]"
s = s.strip("[]").split(", ")
print(s)
Prints:
['ABCGHDEF-12345', 'ABCDKJEF-123235']

Is there any way to pass letters to this library expecting a tuple?

I'm pulling data from google sheet columns like so:
latitude = list(map(float, wks.col_values(3)[1:]))
longitude = list(map(float, wks.col_values(4)[1:]))
name = list(map(float, wks.col_values(2)[1:]))
I use this loop to access my list:
for x,y in enumerate(latitude):
gmap.marker(latitude[x], longitude[x], title=name[x])
name will only pass if the cells contain numbers only. If I have a single letter anywhere in any column, my map is never generated. I don't even see an error. I'm using gmplot to generate my .html map. Is there any way to get text to pass here?
name = (wks.col_values(2)[1:])
print(name)
doesn't need to be a float. :|

Comparing user input list with dictionary and printing out corresponding value

Starting out by saying this is for school and I'm still learning so I'm not looking for a direct solution.
What I want to do is take an input from a user (one word or more).
I then make it in to a list.
I have my dictionary and the code that I'm posting is printing out the values correctly.
My question is how do I compare the characters in my list to the keys in the dictionary and then print only those values that correspond to the keys?
I have also read a ton of different questions regarding dictionaries but it was no help at all.
Example on output;
Word: wow
Output: 96669
user_word = input("Please enter a word: ")
user_listed = list(user_word)
def keypresses():
my_dict = {'.':1, ',':11, '?':111, '!':1111, ':':11111, 'a':2, 'b':22, 'c':222, 'd':3, 'e':33, 'f':333, 'g':4, 'h':44,
'i':444, 'j':5, 'k':55, 'l':555, 'm':6, 'n':66, 'o':666, 'p':7, 'q':77, 'r':777, 's':7777, 't':8, 'u':88,
'v':888, 'w':9, 'x':99, 'y':999, 'z':9999, ' ':0}
for key, value in my_dict.items():
print(value)
I am not going to hand you code for the project, but I will definitely send you in a right direction;
so, 2 parts to this in my view; match each character to a key/get a value, and combine the numbers for an output.
For the first part, you can iterate character-by-character by simply making a for loop;
for letter in 'string':
print(letter)
would output s t r i n g. So you can use this to find the value of the key(each letter)
Then, you can get the definition as a string(so as not to add each number mathematically) so something like;
letter = 'w'
value = my_dict[letter]
value_as_string = str(value)
then, combine this all into a for loop and add each string to each other to create the desired output.

Indexing the list in python

record=['MAT', '90', '62', 'ENG', '92','88']
course='MAT'
suppose i want to get the marks for MAT or ENG what do i do? I just know how to find the index of the course which is new[4:10].index(course). Idk how to get the marks.
Try this:
i = record.index('MAT')
grades = record[i+1:i+3]
In this case i is the index/position of the 'MAT' or whichever course, and grades are the items in a slice comprising the two slots after the course name.
You could also put it in a function:
def get_grades(course):
i = record.index(course)
return record[i+1:i+3]
Then you can just pass in the course name and get back the grades.
>>> get_grades('ENG')
['92', '88']
>>> get_grades('MAT')
['90', '62']
>>>
Edit
If you want to get a string of the two grades together instead of a list with the individual values you can modify the function as follows:
def get_grades(course):
i = record.index(course)
return ' '.join("'{}'".format(g) for g in record[i+1:i+3])
You can use index function ( see this https://stackoverflow.com/a/176921/) and later get next indexes, but I think you should use a dictionary.

How to get dictionary values in Python

I'm working with python dictionaries and ntlk on some reviews.I have and input (txt)file which is a simple review. In a dictionary all_dict.txt. I have all words (negative and positive) with word polarities and value.
all_dict.txt looks like this
"acceptable":("positive",1),"good":("positive",1),"shame":("negative",2),"bad":("negative",4),...
I want to know how can I get this polarities from a dictionary and a number value for each word so that I can get an output like this:
"acceptable_positive":1,"good_positive":1,"shame_negative":2,"bad_negative":4
I tried with dict.get(), dict.values but I don't get what I want. Is there a method to fetch key and values automatically?:
I tried with my code:
f_all_dict=open('all_dict.txt','r',encoding='utf-8').read()
f = eval(f_all_dict)
result_all = {}
for word in f.items():
suffix, pol=result_all[word] #pol->polarity
result_all[word + "_" + suffix] = pol
But I get KeyError if the word doesn't exist in an input file (review).
Thank you for your help
First off, the dict.items() return a dictitem object contains tuples of key and value and when you want to pass it as a key to your dictionary it raise a KeyError.
suffix, pol=result_all[word]
Secondly you better to use with statement in order to dealing with external objects like files. And use ast.literal_eval() for evaluating your dictionary. Also you can access to your value's items, by using throwaway variables unpacking :-) within a dict comprehension.
from ast import literal_eval
with open('all_dict.txt','r',encoding='utf-8') as f_all_dict:
dictionary = literal_eval(f_all_dict.read().strip())
result_all = {"{}_{}".format(word, suffix): pol for word, (suffix, pol) in dictionary.items()}
After modification my code looks like this. I didn't use with statement and it is working good.
f_all_dict=open('all_dict.txt','r',encoding='utf-8').read()
f = literal_eval(f_all_dict)
result_all = {}
for word in f.items():
result_all = {"{}_{}".format(word, suffix): pol * tokens.count(word) for word, (suffix, pol) in f.items()}
print(result_all)

Resources