How to get dictionary values in Python - python-3.x

I'm working with python dictionaries and ntlk on some reviews.I have and input (txt)file which is a simple review. In a dictionary all_dict.txt. I have all words (negative and positive) with word polarities and value.
all_dict.txt looks like this
"acceptable":("positive",1),"good":("positive",1),"shame":("negative",2),"bad":("negative",4),...
I want to know how can I get this polarities from a dictionary and a number value for each word so that I can get an output like this:
"acceptable_positive":1,"good_positive":1,"shame_negative":2,"bad_negative":4
I tried with dict.get(), dict.values but I don't get what I want. Is there a method to fetch key and values automatically?:
I tried with my code:
f_all_dict=open('all_dict.txt','r',encoding='utf-8').read()
f = eval(f_all_dict)
result_all = {}
for word in f.items():
suffix, pol=result_all[word] #pol->polarity
result_all[word + "_" + suffix] = pol
But I get KeyError if the word doesn't exist in an input file (review).
Thank you for your help

First off, the dict.items() return a dictitem object contains tuples of key and value and when you want to pass it as a key to your dictionary it raise a KeyError.
suffix, pol=result_all[word]
Secondly you better to use with statement in order to dealing with external objects like files. And use ast.literal_eval() for evaluating your dictionary. Also you can access to your value's items, by using throwaway variables unpacking :-) within a dict comprehension.
from ast import literal_eval
with open('all_dict.txt','r',encoding='utf-8') as f_all_dict:
dictionary = literal_eval(f_all_dict.read().strip())
result_all = {"{}_{}".format(word, suffix): pol for word, (suffix, pol) in dictionary.items()}

After modification my code looks like this. I didn't use with statement and it is working good.
f_all_dict=open('all_dict.txt','r',encoding='utf-8').read()
f = literal_eval(f_all_dict)
result_all = {}
for word in f.items():
result_all = {"{}_{}".format(word, suffix): pol * tokens.count(word) for word, (suffix, pol) in f.items()}
print(result_all)

Related

Iterating thru a not so ordinary Dictionary in python 3.x

Maybe it is ordinary issue regarding iterating thru a dict. Please find below imovel.txt file, whose content is as follows:
{'Andar': ['primeiro', 'segundo', 'terceiro'], 'Apto': ['101','201','301']}
As you can see this is not a ordinary dictionary, with a key value pair; but a key with a list as key and another list as value
My code is:
#/usr/bin/python
def load_dict_from_file():
f = open('../txt/imovel.txt','r')
data=f.read()
f.close()
return eval(data)
thisdict = load_dict_from_file()
for key,value in thisdict.items():
print(value)
and yields :
['primeiro', 'segundo', 'terceiro'] ['101', '201', '301']
I would like to print a key,value pair like
{'primeiro':'101, 'segundo':'201', 'terceiro':'301'}
Given such txt file above, is it possible?
You should use the builtin json module to parse but either way, you'll still have the same structure.
There are a few things you can do.
If you know both of the base key names('Andar' and 'Apto') you can do it as a one line dict comprehension by zipping the values together.
# what you'll get from the file
thisdict = {'Andar': ['primeiro', 'segundo', 'terceiro'], 'Apto': ['101','201','301']}
# One line dict comprehension
newdict = {key: value for key, value in zip(thisdict['Andar'], thisdict['Apto'])}
print(newdict)
If you don't know the names of the keys, you could call next on an iterator assuming they're the first 2 lists in your structure.
# what you'll get from the file
thisdict = {'Andar': ['primeiro', 'segundo', 'terceiro'], 'Apto': ['101','201','301']}
# create an iterator of the values since the keys are meaningless here
iterator = iter(thisdict.values())
# the first group of values are the keys
keys = next(iterator, None)
# and the second are the values
values = next(iterator, None)
# zip them together and have dict do the work for you
newdict = dict(zip(keys, values))
print(newdict)
As other folks have noted, that looks like JSON, and it'd probably be easier to parse it read through it as such. But if that's not an option for some reason, you can look through your dictionary this way if all of your lists at each key are the same length:
for i, res in enumerate(dict[list(dict)[0]]):
ith_values = [elem[i] for elem in dict.values()]
print(ith_values)
If they're all different lengths, then you'll need to put some logic to check for that and print a blank or do some error handling for looking past the end of the list.

Assigning specific dictionary values to variables

I have a series of dictionaries which each contain the same keys but their values are different i.e Age in dictionary 1 = 2, Age in dictionary 2 = 4 etc etc but they are broadly identical in structure.
what I would like to do is to randomly select one of these dictionaries and then assign specific values with the dictionary to variables. i.e python randomly chooses Dictionary 1 and then I then want to fill the dictAge variable with the age value from Dictionary 1.
import random
dictList = ['myDict', 'otherDict']
mydict = {
'age' : 10,
'other': "dummy data"
}
.
.
.
randomDict = random.choice(dictList)
dictAge = randomDict['age']
print(dictAge)
In the case of the code above what should happen is:
randomDict is assigned a random value from the distList variable (at the top). This sets which dictionary's values will be used going forward.
I next want the dictAge variable to then be assigned the age value from the selected dictionary. In this case (as mydict is was the only dictionary available) it should be assigned the age value of 10.
The error I am getting is:
TypeError: string indices must be integers
I know this is such a common error but my brain can't quite work out what the best solution is.
(Disclaimer: I haven't used python in ages so I know I am doing something really obviously silly but I can't quite work out what to do).
Right now, you are not actually using the definition of your dicts.
This is because dictList is comprised of strings: ['myDict', 'otherDict'].
So, when doing randomDict = random.choice(dictList), randomDict will either be the string 'myDict', or the string 'otherDict'.
Then you are doing randomDict['age'], which means you are trying to slice a string, with a string. As the error suggests, this can't be done and indices can only be ints.
What you want to do, is move the definition of the dictList to be after the definitions of your dicts, and include references to the dicts themselves, not strings. Something like:
mydict = {
'age' : 10,
'other': "dummy data"
}
.
.
.
dictList = [myDict, otherDict]
In the following piece of code:
dictAge = randomDict['age']
You are trying to index the name of dictionary variable (a string) returned by random.choice function.
To make it work you would need to do it using locals:
locals()[randomDict]['age']
or rather correct the dictList to contain the dictionaries instead of their names:
dictList = [myDict, otherDict]
In the latter case please note that myDict and otherDict should be declared before dictList.

List, tuples or dictionary, differences and usage, How can I store info in python

I'm very new in python (I usually write in php). I want to understand how to store information in an associative array, and if you can explain me whats the difference of "tuples", "arrays", "dictionary" and "list" will be wonderful (I tried to read different source but I still not caching it).
So This is my code:
#!/usr/bin/python3.4
import csv
import string
nidless_keys = dict()
nidless_keys = ['test_string1','test_string2'] #this contain the string to
# be searched in linesreader
data = {'type':[],'id':[]} #here I want to store my information
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader: #every line in this csv have a url like
#www.test.com/?test_string1&id=123456
current_row_string = str(row)
for needle in nidless_keys:
current_needle = str(needle)
if current_needle in current_row_string:
data[current_needle[current_row_string[-8:]]) += 1 # also I
#need to count per every id how much rows there are.
In conclusion:
my_data_stored = [current_needle][current_row_string[-8]]
current_row_string[-8] is a url which the last 8 digit of the url is an ID.
So the array should looks like this at the end of the script:
test_string1 = 123456 = 20
= 256468 = 15
test_string2 = 123155 = 10
Edit 1:
Which type I need here to store the information?
Can you tell me how to resolve this script?
It seems you want to count how many times an ID in combination with a test string occurs.
There can be multiple ID/count combinations associated with every test string.
This suggests that you should use a dictionary indexed by the test strings to store the results. In that dictionary I would suggest to store collections.Counter objects.
This way, you would have to add a special case when a key in the results dictionary isn't found to add an empty Counter. This is a common problem, so there is a specialized form of dictionary in the collections module called defaultdict.
import collections
import csv
# Using a tuple for the keys so it cannot be accidentally modified
keys = ('test_string1', 'test_string2')
result = collections.defaultdict(collections.Counter)
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader:
for key in keys:
if key in row:
id = row[-6:] # ID's are six digits in your example.
# The first index is into the dict, the second into the Counter.
result[key][id] += 1
There is an even easier way, by using regular expressions.
Since you seem to treat every row in a CSV file as a string, there is little need to use the CSV reader, so I'll just read the whole file as text.
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
pattern = r'\?(.*)&id=(\d+)'
The pattern is a regular expression. This is a large topic in and of itself, so I'll only cover briefly what it does. (You might also want to check out the relevant HOWTO) At first glance it looks like complete gibberish, but it is actually a complete language.
In looks for two things in a line. Anything between ? and &id=, and a sequence of digits after &id=.
I'll be using IPython to give an example.
(If you don't know it, check out IPython. It is great for trying things and see if they work.)
In [1]: import re
In [2]: pattern = r'\?(.*)&id=(\d+)'
In [3]: text = """www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=234567
....: www.test.com/?foo&id=234567
....: www.test.com/?foo&id=123456
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234"""
The text variable points to the string which is a mock-up for the contents of your CSV file.
I am assuming that:
every URL is on its own line
ID's are a sequence of digits.
If these assumptions are wrong, this won't work.
Using findall to extract every match of the pattern from the text.
In [4]: re.findall(pattern, test)
Out[4]:
[('test_string1', '123456'),
('test_string1', '123456'),
('test_string1', '234567'),
('foo', '234567'),
('foo', '123456'),
('foo', '1234'),
('foo', '1234'),
('foo', '1234')]
The findall function returns a list of 2-tuples (that is key, ID pairs). Now we just need to count those.
In [5]: import collections
In [6]: result = collections.defaultdict(collections.Counter)
In [7]: intermediate = re.findall(pattern, test)
Now we fill the result dict from the list of matches that is the intermediate result.
In [8]: for key, id in intermediate:
....: result[key][id] += 1
....:
In [9]: print(result)
defaultdict(<class 'collections.Counter'>, {'foo': Counter({'1234': 3, '123456': 1, '234567': 1}), 'test_string1': Counter({'123456': 2, '234567': 1})})
So the complete code would be:
import collections
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
result = collections.defaultdict(collections.Counter)
pattern = r'\?(.*)&id=(\d+)'
intermediate = re.findall(pattern, test)
for key, id in intermediate:
result[key][id] += 1
This approach has two advantages.
You don't have to know the keys in advance.
ID's are not limited to six digits.
A brief summary of the python data types you mentioned:
A dictionary is an associative array, aka hashtable.
A list is a sequence of values.
An array is essentially the same as a list, but limited to basic datatypes. My impression is that they only exists for performance reasons, don't think I've ever used one. If performance is that critical to you, you probably don't want to use python in the first place.
A tuple is a fixed-length sequence of values (whereas lists and arrays can grow).
Lets take them one by one.
Lists:
List is a very naive kind of data structure similar to arrays in other languages in terms of the way we write them like:
['a','b','c']
This is a list in python , but seems very similar to array structure.
However there is a very large difference in the way lists are used in python and the usual arrays.
Lists are heterogenous in nature. This means that we can store any kind of data simultaneously inside it like:
ls = [1,2,'a','g',True]
As you can see, we have various kinds of data within a list and is a valid list.
However, one important thing about them is that we can access the list items using zero based indices. So we can write:
print ls[0],ls[3]
output: 1 g
Dictionary:
This datastructure is similar to a hash map data structure. It contains a (key,Value) pair. An empty dictionary looks like:
dc = {}
Now, to store a key,value pair, e.g., ('potato',3),(tomato,5), we can do as:
dc['potato'] = 3
dc['tomato'] = 5
and we saved the data in the dictionary dc.
The important thing is that we can even store another data structure element like a list within a dictionary like:
dc['list1'] = ls , where ls is the list defined above.
This shows the power of using dictionary.
In your case, you have difined a dictionary like this:
data = {'type':[],'id':[]}
This means that your dictionary will consist of only two keys and each key corresponds to a list, which are empty for now.
Talking a bit about your script, the expression :
current_row_string[-8:]
doesn't make a sense. The index should have been -6 instead of -8 that would give you the id part of the current row.
This part is the id and should have been stored in a variable say :
id = current_row_string[-6:]
Further action can be performed as seen the answer given by Roland.

Indexing the list in python

record=['MAT', '90', '62', 'ENG', '92','88']
course='MAT'
suppose i want to get the marks for MAT or ENG what do i do? I just know how to find the index of the course which is new[4:10].index(course). Idk how to get the marks.
Try this:
i = record.index('MAT')
grades = record[i+1:i+3]
In this case i is the index/position of the 'MAT' or whichever course, and grades are the items in a slice comprising the two slots after the course name.
You could also put it in a function:
def get_grades(course):
i = record.index(course)
return record[i+1:i+3]
Then you can just pass in the course name and get back the grades.
>>> get_grades('ENG')
['92', '88']
>>> get_grades('MAT')
['90', '62']
>>>
Edit
If you want to get a string of the two grades together instead of a list with the individual values you can modify the function as follows:
def get_grades(course):
i = record.index(course)
return ' '.join("'{}'".format(g) for g in record[i+1:i+3])
You can use index function ( see this https://stackoverflow.com/a/176921/) and later get next indexes, but I think you should use a dictionary.

How can I add to Python dictionary value using string keys

I want to add string dictionary keys like this:
x = "%s-%s-%s %s:%s:00"%(dt.year,dt.month,dt.day,dt.hour,dt.minute)
dict[x] +=a1
But it gives me an error like this:
KeyError: '2015-11-26 8:47:00'
If I try print type(x) it prints str
But if i try this:
dict = {}
x = "abc"
dict[x] = 1
print dict
it print to this:
{'abc': 1}
I don't understand what is the difference.
First error is that you named your dictionary dict. That name's
already being used; it's the name of the dictionary type. Overwriting an
existing name like this is called "shadowing". Don't do it, it will mess
you up.
You're using +=. This implies that there's already a value associated
with the key, which can be incremented. If that key isn't in the dict
yet, you get a KeyError.
You probably want to set a default value of zero. This can be done in
various ways. The simplest is:
d[x] = d.get(x, 0) + a1
Also see the collections standard library, which has a defaultdict
type.

Resources