I have an extensive list with tuples of pairs. It goes like this:
travels =[(passenger_1, destination_1), (passenger_2, destination_2),(passenger_1, destination_2)...]
And so on. Passengers and destinations may repeat and even the same passenger-destination tuple may repeat.
I want to make a comprehensive dict thay have as key each passenger and as value its most recurrent destination.
My first try was this:
dictionary = {k:v for k,v in travels}
but each key overwrites the last. I was hoping to get multiple values for each key so then i could count for each key. Then I tried like this:
dictionary = {k:v for k,v in travels if k not in dictionary else dictionary[k].append(v)}
but i can't call dictionary inside its own definition. Any ideas on how can i get it done? It's important that it's done comprehensively and not by loops.
That is how it can be done with for loop:
result = dict()
for passenger, destination in travels:
result.setdefault(passenger, list()).append(destination)
result is a single dictionary where keys are passengers, values are lists with destinations.
I doubt you can do the same with a single dictionary comprehesion expression since inside comprehension you can just generate elements but can not freely modify them.
EDIT.
If you want to (or have to) use comprehension expression no matter what then you can do it like this (2 comprehensions and no explicit loops):
result = {
passenger: [destination_
for passenger_, destination_
in travels
if passenger_ == passenger]
for passenger, dummy_destination
in travels}
This is a poor algorithm to get what you want. Its efficiency is O(n^2) while efficiency of the first method is O(n).
Related
I have a dictionary which contains key-value pairs where the key is a string and the value is stored as a list.
I am looking to get the intersection of all the elements in the lists of each entry in the dictionary.
For instance, if I had a dictionary like this:
athletes = {"athlete_A" : [16,43,34,23], "athlete_B": [23,60,80,75]}
I would like to get the list [23]. I can find solutions on intersection of dictionaries, but I don't seem to find how to work with only the values of a dict.
You can use functools.reduce and set intersection:
from functools import reduce
reduce(set.intersection, map(set, athletes.values()))
# {23}
If there are duplicates within your lists and you want to catch all (e.g. if two 23s occur in each list), you can use Counter intersection instead:
from collections import Counter
[*reduce(Counter.__and__, map(Counter, athletes.values())).elements()]
# [23]
Create set from the first athlete's list
A_as_set = set(athletes['athlete_A'])
intersection = A_as_set.intersection(athletes['athlete_B'])
intersection_as_list = list(intersection)
Source HERE for intersection and HERE for dictionary
With the following command i can print the balance of my assets from my binance ac.
Command:
USDT_BAL = client.futures_account_balance(asset='USDT')
Return:
[{'accountAlias': 'sRuXXqTioCfWFz', 'asset': 'BNB', 'balance': '0.00000142', 'withdrawAvailable': '0.00000142', 'updateTime': 1621516315044}, {'accountAlias': 'sRuXXqTioCfWFz', 'asset': 'USDT', 'balance': '0.00000000', 'withdrawAvailable': '0.00000000', 'updateTime': 0}, {'accountAlias': 'sRuXXqTioCfWFz', 'asset': 'BUSD', 'balance': '0.00000000', 'withdrawAvailable': '0.00000000', 'updateTime': 0}]
It returns the balances of other assets, but i only need the balance of the USDT asset. How could I filter the USDT_BAL variable for it?
Expanding on my comment:
You have a list of dict. list access is done by iteration (for loops) or by indexes. my_list[0], etc..
dict access can, also done by iteration, but a big benefit is keyed access. my_dict['some_key'], etc..
Python has simplified ways to do common list and dict building commonly called "comprehensions".
So a list comprehension for something like:
my_list = []
for i in range(10):
my_list.append(i)
Could be written as
my_list = [i for i in range(10)]
What I gave you isn't necessarily a list comprehension but follows the same idea. It's called a "generator expression". The difference is it generates some output when you iterate over it but it's output as a whole isn't in the form of some built-in collection (list or dict).
The reason it makes sense in this context is:
I need to iterate over the list to find dict with the correct 'asset' key.
I expect there is only one occurrence of this so I care only about the first occurrence.
So to break it down you have a generator expression:
(i['balance'] for i in USDT_BAL if i['asset'] == 'USDT')
Which is roughly equivalent to.
def my_gen():
for i in USDT_BAL:
if i['asset'] == 'USDT':
yield i['balance']
Or if you're not familiar with generators and would like it as a list:
my_list = []
for i in USDT_BAL:
if i['asset'] == 'USDT':
my_list.append(i['balance'])
So now you can see we have a problem.
If we have it as a list comprehension it's in the form of a list with one element.
print(my_list) # ['0.00000000']
We could access it with my_list[0] but that looks ugly IMO but to each it's own.
So that's where the next function comes in.
According to the docs next calls the __next__ method on an iterator (which a generator is) and basically advances the generator.
So if our generator were to produce 1 then 2 then 3, calling next(my_gen) would produce 1 then calling it again would produce 2 and so on.
Since I expect this generator expression to only produce 1 item, I only call it once. Giving it a default of None means, if it's empty, rather than raising an error it will produce None.
So:
next((i['balance'] for i in USDT_BAL if i['asset'] == 'USDT'), None)
creates a generator that iterates over your list, only produces the 'balance' key of dicts who's 'asset' key equals 'USDT' and calls next on that generator with a default of None.
I have a mylist = [[a,b,c,d],...[]] with 650 lists inside. I am trying to insert this into a relational database with dictionaries. I have the following code:
for i in mylist:
if len(i) == 4:
cve_ent = {'state':[], 'muni':[], 'area':[]}
cve_ent['state'].append(i[1])
cve_ent['muni'].append(i[2])
cve_ent['area'].append(i[3])
However this code just yields the last list in mylist in the dictionary. I have tried also with a counter and a while loop but I cannot make it run.
I do not know if this is the fastest way to store the data, what I will do is compare the values of the first and second keys with other tables to multiply the values of the third key.
First of all, pull
cve_ent = {'state':[], 'muni':[], 'area':[]}
out of your for loop. That will solve issues with re-writing things.
This is currently my code.
if Pokémon == 'Charmander':
selectyourmove = input('Select your move: Flamethrower, Fire Fang,
Scratch or Ember: ')#select your move
if selectyourmove == 'Flamethrower':
numberchoosing1 = random.randint(20, 22)#randomly decides
damage of the chosen move in the range
print(choice, 'has lost' ,numberchoosing1, 'health out of its'
,HP, 'health!')
My dictionary is quite simple. It is:
HP = {'Char':'60', 'Squir':'50', 'Pika':'80', 'Eve':'50', 'Bulb':'70', 'Clef':'100'}
Also all these have been defined.
How do I get a value that was randomly chosen from a dictionary
The 1st way is to use dict.popitem:
Remove and return an arbitrary (key, value) pair from the dictionary.
popitem() is useful to destructively iterate over a dictionary, as often used in set algorithms. If the dictionary is empty, calling popitem() raises a KeyError.
Note, that this method randomness actually comes from implementation of hashing algorithm and python dict elements layout. That's more obscurity than randomness.
The 2nd way, the truely 'random', is using the random.choice. It doesn't modify the dict, as chooses random index in the list supplied to it:
import random
hp = {'Char':'60', 'Squir':'50', 'Pika':'80', 'Eve':'50', 'Bulb':'70', 'Clef':'100'}
print(random.choice(list(hp.keys())))
Illustration of working principle:
>>> random.choice(list(HP.keys()))
'Pika'
>>> random.choice(list(HP.keys()))
'Clef'
>>> random.choice(list(HP.keys()))
'Pika'
The list is constructed here from .keys(), but when you need pairs (like from popitem()) you could use .items():
>>> random.choice(list(HP.items()))
('Clef', '100')
>>> random.choice(list(HP.items()))
('Pika', '80')
>>> random.choice(list(HP.items()))
('Char', '60')
The same way, of course the .values() will work producing only right-hand elements of dict items though, thus won't give you much satisfaction unlike .keys() or .items() does.
PS: Then if you need the reproduce prev. run, you can fix the 'randomness' with random.seed
That depends on what you mean by "value". A dictionary is a set of key,value pairs, so technically the values of your dictionary are just the strings '50', '60', '50', '100', '70', '80', and the keys are the strings 'Eve', 'Char', 'Squir', 'Clef', 'Bulb', 'Pika'.
You can these collections by using HP.keys() and HP.values(), and you can use list() to cast these collections to lists. Then, you can use random.choice to get a random value.
So to get a random key from your dictionary (which it seems like is what you actually want), you could do:
import random
keys = HP.keys()
key_list = list(keys)
choice = random.choice(key_list)
Or, more concisely:
import random
choice = random.choice(list(HP.keys()))
Then you can get the associated value for that key with HP[choice]
I'm very new in python (I usually write in php). I want to understand how to store information in an associative array, and if you can explain me whats the difference of "tuples", "arrays", "dictionary" and "list" will be wonderful (I tried to read different source but I still not caching it).
So This is my code:
#!/usr/bin/python3.4
import csv
import string
nidless_keys = dict()
nidless_keys = ['test_string1','test_string2'] #this contain the string to
# be searched in linesreader
data = {'type':[],'id':[]} #here I want to store my information
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader: #every line in this csv have a url like
#www.test.com/?test_string1&id=123456
current_row_string = str(row)
for needle in nidless_keys:
current_needle = str(needle)
if current_needle in current_row_string:
data[current_needle[current_row_string[-8:]]) += 1 # also I
#need to count per every id how much rows there are.
In conclusion:
my_data_stored = [current_needle][current_row_string[-8]]
current_row_string[-8] is a url which the last 8 digit of the url is an ID.
So the array should looks like this at the end of the script:
test_string1 = 123456 = 20
= 256468 = 15
test_string2 = 123155 = 10
Edit 1:
Which type I need here to store the information?
Can you tell me how to resolve this script?
It seems you want to count how many times an ID in combination with a test string occurs.
There can be multiple ID/count combinations associated with every test string.
This suggests that you should use a dictionary indexed by the test strings to store the results. In that dictionary I would suggest to store collections.Counter objects.
This way, you would have to add a special case when a key in the results dictionary isn't found to add an empty Counter. This is a common problem, so there is a specialized form of dictionary in the collections module called defaultdict.
import collections
import csv
# Using a tuple for the keys so it cannot be accidentally modified
keys = ('test_string1', 'test_string2')
result = collections.defaultdict(collections.Counter)
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader:
for key in keys:
if key in row:
id = row[-6:] # ID's are six digits in your example.
# The first index is into the dict, the second into the Counter.
result[key][id] += 1
There is an even easier way, by using regular expressions.
Since you seem to treat every row in a CSV file as a string, there is little need to use the CSV reader, so I'll just read the whole file as text.
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
pattern = r'\?(.*)&id=(\d+)'
The pattern is a regular expression. This is a large topic in and of itself, so I'll only cover briefly what it does. (You might also want to check out the relevant HOWTO) At first glance it looks like complete gibberish, but it is actually a complete language.
In looks for two things in a line. Anything between ? and &id=, and a sequence of digits after &id=.
I'll be using IPython to give an example.
(If you don't know it, check out IPython. It is great for trying things and see if they work.)
In [1]: import re
In [2]: pattern = r'\?(.*)&id=(\d+)'
In [3]: text = """www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=234567
....: www.test.com/?foo&id=234567
....: www.test.com/?foo&id=123456
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234"""
The text variable points to the string which is a mock-up for the contents of your CSV file.
I am assuming that:
every URL is on its own line
ID's are a sequence of digits.
If these assumptions are wrong, this won't work.
Using findall to extract every match of the pattern from the text.
In [4]: re.findall(pattern, test)
Out[4]:
[('test_string1', '123456'),
('test_string1', '123456'),
('test_string1', '234567'),
('foo', '234567'),
('foo', '123456'),
('foo', '1234'),
('foo', '1234'),
('foo', '1234')]
The findall function returns a list of 2-tuples (that is key, ID pairs). Now we just need to count those.
In [5]: import collections
In [6]: result = collections.defaultdict(collections.Counter)
In [7]: intermediate = re.findall(pattern, test)
Now we fill the result dict from the list of matches that is the intermediate result.
In [8]: for key, id in intermediate:
....: result[key][id] += 1
....:
In [9]: print(result)
defaultdict(<class 'collections.Counter'>, {'foo': Counter({'1234': 3, '123456': 1, '234567': 1}), 'test_string1': Counter({'123456': 2, '234567': 1})})
So the complete code would be:
import collections
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
result = collections.defaultdict(collections.Counter)
pattern = r'\?(.*)&id=(\d+)'
intermediate = re.findall(pattern, test)
for key, id in intermediate:
result[key][id] += 1
This approach has two advantages.
You don't have to know the keys in advance.
ID's are not limited to six digits.
A brief summary of the python data types you mentioned:
A dictionary is an associative array, aka hashtable.
A list is a sequence of values.
An array is essentially the same as a list, but limited to basic datatypes. My impression is that they only exists for performance reasons, don't think I've ever used one. If performance is that critical to you, you probably don't want to use python in the first place.
A tuple is a fixed-length sequence of values (whereas lists and arrays can grow).
Lets take them one by one.
Lists:
List is a very naive kind of data structure similar to arrays in other languages in terms of the way we write them like:
['a','b','c']
This is a list in python , but seems very similar to array structure.
However there is a very large difference in the way lists are used in python and the usual arrays.
Lists are heterogenous in nature. This means that we can store any kind of data simultaneously inside it like:
ls = [1,2,'a','g',True]
As you can see, we have various kinds of data within a list and is a valid list.
However, one important thing about them is that we can access the list items using zero based indices. So we can write:
print ls[0],ls[3]
output: 1 g
Dictionary:
This datastructure is similar to a hash map data structure. It contains a (key,Value) pair. An empty dictionary looks like:
dc = {}
Now, to store a key,value pair, e.g., ('potato',3),(tomato,5), we can do as:
dc['potato'] = 3
dc['tomato'] = 5
and we saved the data in the dictionary dc.
The important thing is that we can even store another data structure element like a list within a dictionary like:
dc['list1'] = ls , where ls is the list defined above.
This shows the power of using dictionary.
In your case, you have difined a dictionary like this:
data = {'type':[],'id':[]}
This means that your dictionary will consist of only two keys and each key corresponds to a list, which are empty for now.
Talking a bit about your script, the expression :
current_row_string[-8:]
doesn't make a sense. The index should have been -6 instead of -8 that would give you the id part of the current row.
This part is the id and should have been stored in a variable say :
id = current_row_string[-6:]
Further action can be performed as seen the answer given by Roland.