How can I condence this function into returning a value from a list comprehension? - python-3.x

I understand it may not be best practices or conventional but this is more of a personal challenge.
def initialize_dataset(source):
all_features = []
targets = []
for (sent, label) in source:
feature_list=[]
feature_list.append(avg_number_chars(sent))
feature_list.append(number_words(sent))
all_features.append(feature_list)
targets.append(0) if label=="austen" else targets.append(1)
return all_features, targets
Example of what I'm looking for. I understand that it might not be possible to get it down to one single list and or value. But something close to it. I'd like to kind of expand my thinking on writing list comprehensions.
def sample_function(data):
return [i for i in data ]

Success! My gaaaaaaaaaaaaaaawd its ugly! 🤣
def initialize_dataset(source):
all_features, targets = [],[]; [(all_features.append([avg_number_chars(sent), number_words(sent)]), targets.append(0) if label == "austen" else targets.append(1)) for (sent, label) in source]; return all_features, targets

Related

Write a recursive function to list all paths of parts.txt

Write a function list_files_recursive that returns a list of the paths of all the parts.txt files without using the os module's walk generator. Instead, the function should use recursion. The input will be a directory name.
Here is the code I have so far and I think it's basically right, but what's happening is that the output is not one whole list?
def list_files_recursive(top_dir):
rec_list_files = []
list_dir = os.listdir(top_dir)
for item in list_dir:
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
list_files_recursive(item_path)
else:
if os.path.basename(item_path) == 'parts.txt':
rec_list_files.append(os.path.join(item_path))
print(rec_list_files)
return rec_list_files
This is part of the output I'm getting (from the print statement):
['CarItems/Honda/Accord/1996/parts.txt']
[]
['CarItems/Honda/Odyssey/2000/parts.txt']
['CarItems/Honda/Odyssey/2002/parts.txt']
[]
So the problem is that it's not one list and that there's empty lists in there. I don't quite know why this isn't not working and have tried everything to work through it. Any help is much appreciated on this!
This is very close, but the issue is that list_files_recursive's child calls don't pass results back to the parent. One way to do this is to concatenate all of the lists together from each child call, or to pass a reference to a single list all the way through the call chain.
Note that in rec_list_files.append(os.path.join(item_path)), there's no point in os.path.join with only a single parameter. print(rec_list_files) should be omitted as a side effect that makes the output confusing to interpret--only print in the caller. Additionally,
else:
if ... :
can be more clearly written here as elif: since they're logically equivalent. It's always a good idea to reduce nesting of conditionals whenever possible.
Here's the approach that works by extending the parent list:
import os
def list_files_recursive(top_dir):
files = []
for item in os.listdir(top_dir):
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
files.extend(list_files_recursive(item_path))
# ^^^^^^ add child results to parent
elif os.path.basename(item_path) == "parts.txt":
files.append(item_path)
return files
if __name__ == "__main__":
print(list_files_recursive("foo"))
Or by passing a result list through the call tree:
import os
def list_files_recursive(top_dir, files=[]):
for item in os.listdir(top_dir):
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
list_files_recursive(item_path, files)
# ^^^^^ pass our result list recursively
elif os.path.basename(item_path) == "parts.txt":
files.append(item_path)
return files
if __name__ == "__main__":
print(list_files_recursive("foo"))
A major problem with these functions are that they only work for finding files named precisely parts.txt since that string literal was hard coded. That makes it pretty much useless for anything but the immediate purpose. We should add a parameter for allowing the caller to specify the target file they want to search for, making the function general-purpose.
Another problem is that the function doesn't do what its name claims: list_files_recursive should really be called find_file_recursive, or, due to the hardcoded string, find_parts_txt_recursive.
Beyond that, the function is a strong candidate for turning into a generator function, which is a common Python idiom for traversal, particularly for situations where the subdirectories may contain huge amounts of data that would be expensive to keep in memory all at once. Generators also allow the flexibility of using the function to cancel the search after the first match, further enhancing its (re)usability.
The yield keyword also makes the function code itself very clean--we can avoid the problem of keeping a result data structure entirely and just fire off result items on demand.
Here's how I'd write it:
import os
def find_file_recursive(top_dir, target):
for item in os.listdir(top_dir):
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
yield from find_file_recursive(item_path, target)
elif os.path.basename(item_path) == target:
yield item_path
if __name__ == "__main__":
print(list(find_file_recursive("foo", "parts.txt")))

How to create complex data structure with python hypothesis

I'm trying to use hypothesis to generate a text strategy with a complex format. I'm not sure how to build up this kind of data structure.
I've tried to build the various elements as composites to then use those as strategies for other composites. However the elements argument in the lists strategy requires a SearchStrategy instead of a composite like I had hoped. Looking through the docs I couldn't work out if the builds, mapping or flatmap would help in this case.
My (simplified) attempt is below.
#st.composite
def composite_coords(draw):
lat = draw(st.decimals(min_value=-10, max_value=-1, allow_nan=False, places=16))
long = draw(st.decimals(min_value=50, max_value=90, allow_nan=False, places=16))
return [float(long), float(lat)]
#st.composite
def composite_polygon_coords(draw):
polygon_coords = draw(st.lists(
elements=composite_coords, min_size=3
))
return polygon_coords.append(polygon_coords[0])
#st.composite
def composite_polygons(draw):
polygons = draw(st.lists(
elements=composite_polygon_coords, min_size=1
))
polygon = {
'type': 'Polygon',
'coordinates': polygons
}
return poly.dumps(polygon)
#given(composite_polygons())
def test_valid_polygon(polygon):
result = validate(polygon)
assert result == polygon
The #st.composite decorator gives you a function which returns a strategy - you just need to call them and you'll be good to go.

Multiprocessing a function that tests a given dataset against a list of distributions. Returning function values from each iteration through list

I am working on processing a dataset that includes dense GPS data. My goal is to use parallel processing to test my dataset against all possible distributions and return the best one with the parameters generated for said distribution.
Currently, I have code that does this in serial thanks to this answer https://stackoverflow.com/a/37616966. Of course, it is going to take entirely too long to process my full dataset. I have been playing around with multiprocessing, but can't seem to get it to work right. I want it to test multiple distributions in parallel, keeping track of sum of square error. Then I want to select the distribution with the lowest SSE and return its name along with the parameters generated for it.
def fit_dist(distribution, data=data, bins=200, ax=None):
#Block of code that tests the distribution and generates params
return(distribution.name, best_params, sse)
if __name__ == '__main__':
p = Pool()
result = p.map(fit_dist, DISTRIBUTIONS)
p.close()
p.join()
I need some help with how to actually make use of the return values on each of the iterations in the multiprocessing to compare those values. I'm really new to python especially multiprocessing so please be patient with me and explain as much as possible.
The problem I'm having is it's giving me an "UnboundLocalError" on the variables that I'm trying to return from my fit_dist function. The DISTRIBUTIONS list is 89 objects. Could this be related to the parallel processing, or is it something to do with the definition of fit_dist?
With the help of Tomerikoo's comment and some further struggling, I got the code working the way I wanted it to. The UnboundLocalError was due to me not putting the return statement in the correct block of code within my fit_dist function. To answer the question I did the following.
from multiprocessing import Pool
def fit_dist:
#put this return under the right section of this method
return[distribution.name, params, sse]
if __name__ == '__main__':
p = Pool()
result = p.map(fit_dist, DISTRIBUTIONS)
p.close()
p.join()
'''filter out the None object results. Due to the nature of the distribution fitting,
some distributions are so far off that they result in None objects'''
res = list(filter(None, result))
#iterates over nested list storing the lowest sum of squared errors in best_sse
for dist in res:
if best_sse > dist[2] > 0:
best_sse = dis[2]
else:
continue
'''iterates over list pulling out sublist of distribution with best sse.
The sublists are made up of a string, tuple with parameters,
and float value for sse so that's why sse is always index 2.'''
for dist in res:
if dist[2]==best_sse:
best_dist_list = dist
else:
continue
The rest of the code simply consists of me using that list to construct charts and plots with that best distribution overtop of a histogram of my raw data.

How to implement python dictionaries into code to do the same job as list functions

I need to be able to implement dictionaries into this code. Not all needs to be changed just were i can change it and it still does the same job.
In a test file I have a list of three strings (1, once),(2,twice).(2, twice).
I'm guessing the number will represent the value.
This code passes the tests but I am struggling to understand how I can use dictionaries to make it do the same job.
If any one can help it'll be grateful.
The current is:
The list items are in a test file elsewhere.
class Bag:
def __init__(self):
"""Create a new empty bag."""
self.items = []
def add(self, item):
"""Add one copy of item to the bag. Multiple copies are allowed."""
self.items.append(item)
def count(self, item):
"""Return the number of copies of item in the bag.
Return zero if the item doesn't occur in the bag.
"""
counter = 0
for an_item in self.items:
if an_item == item:
counter += 1
return counter
def clear(self, item):
"""Remove all copies of item from the bag.
Do nothing if the item doesn't occur in the bag.
"""
index = 0
while index < len(self.items):
if self.items[index] == item:
self.items.pop(index)
else:
index += 1
def size(self):
"""Return the total number of copies of all items in the bag."""
return len(self.items)
def ordered(self):
"""Return the items by decreasing number of copies.
Return a list of (count, item) pairs.
"""
result = set()
for item in self.items:
result.add((self.count(item), item))
return sorted(result, reverse=True)
I have been scratching my head over it for a while now. I can only use these also for dictionaries.
Items[key] = value
len(items)
dict()
items[key]
key in items
Del items[key]
Thank you
Start with the simplest possible problem. You have an empty bag:
self.items = {}
and now a caller is trying to add an item, with bag.add('twice').
Where shall we put the item?
Well, we're going to need some unique index.
Hmmm, different every time, different every time, what changes with each .add()?
Right, that's it, use the length!
n = len(self.items)
self.items[n] = new_item
So items[0] = 'twice'.
Now, does this still work after a 2nd call?
Yes. items[1] = 'twice'.
Following this approach you should be able to refactor the other methods to use the new scheme.
Use unit tests, or debug statements like print('after clear() items is: ', self.items), to help you figure out if the Right Thing happened.

Using if statements to terminate a for loop

I wasn't entirely sure how to word this question, so I'll stick with explaining my specific goal.
I'm trying to implement an 'eat' option to a user_input function for a text adventure. I want it to check if the verb is 'eat', check if the item is in the player's inventory, check if the item is consumable or not, and then - my goal - show that the player eats it (and only show it once, see below). The 'eat' option has no functionality as of yet, I'm only testing print statements.
I'm totally aware that the code below is not the most efficient way to handle user input (I'm still tweaking ways to handle input that is unexpected), so I'm 100% open to criticism. This was just my initial way of handling it and I'd like to try to make it work. I have a few other verbs ('go', 'look', 'take', etc.) that work, I'm just having a lot of trouble with 'eat'.
def user_input(self):
player = Player()
while True:
user_input = input("> ").lower()
words = user_input.split(" ")
wordlist = list(words)
verb = wordlist[0]
if len(wordlist) == 2:
noun = wordlist[1]
if noun not in items:
print("Somehow '{}' isn't the right word.".format(noun))
continue
else:
pass
# The above works fine because I don't have
# an issue with any of the other verbs regarding
# that bit of code.
# There's more code between these blocks to:
# handle if the user enters a noun that is > 1 word,
# define a few more verbs,
# then,
if verb == "eat":
for item in player.inventory:
if item.name == noun:
if isinstance(item, Consumable):
print("You eat the {}.".format(noun))
break
else:
print("You can't eat that!")
break
else:
print("You don't have '{}'.".format(noun))
I had to use the for loop (at least, I think I had to) because I'm iterating over a list that has objects in them, not strings, so I couldn't just use
if noun in player.inventory:
(I still tried it a million times though, it took forever to come up with a solution for that problem). Here's my specific example for the code above:
class Fists(Weapons):
def __init__(self):
self.name = "fists"
# more instance variables for Fists()
class LeatherWallet(Classy):
def __init__(self):
self.name = "leather wallet"
# more ...
class Pizza(Consumable):
def __init__(self):
self.name = "pizza"
# more ...
class Player():
def __init__(self):
self.inventory = [Fists(), LeatherWallet(), Pizza()]
# other stuff
*eat fists
You can't eat that!
*eat leather wallet
You don't have 'leather wallet'.
You can't eat that!
*eat pizza
You don't have 'pizza'.
You don't have 'pizza'.
You eat the pizza.
From the looks of it, it has got to be a simple fix, because it's clear what's happening as it iterates over the list. I just don't know how to (or if you can) wrangle a for loop to make it check conditions first then print later. At the risk of sounding dumb, I turn to you all for help!
Thank you, and please let me know if I can make this question/my goal any clearer.
Edit:
Tried to make the goal a little clearer in opening paragraphs.
You're not saying what you want the output to be, but my best guess is you don't want to say "you don't have X" more than once.
One way to do this is to use a flag, initialized to false, set to true when you find the item. When you get out of the loop the flag will tell you if you found it.
Something like this:
if verb == "eat":
found = false
for item in player.inventory:
if item.name == noun:
found = true
if isinstance(item, Consumable):
print("You eat the {}.".format(noun))
break
else:
print("You can't eat that!")
break
if !found:
print("You don't have '{}'.".format(noun))
for loops have else clauses that are run if the loop doesn't exit with a break. The idea is that the loop is looking for an item, will break when it finds it, and the else clause handles the default case where you don't find what you're looking for. The variables used inside the loop are still valid after the loop ends so the found (or defaulted) variable can be used in the next bit of code.
In your case, if you move that final else clause out one level, it will only run if the item is not found. In your example output, the two "you don't have 'pizza1" lines will no longer be printed.
for item in player.inventory:
if item.name == noun:
if isinstance(item, Consumable):
print("You eat the {}.".format(noun))
break
else:
print("You can't eat that!")
break
else:
print("You don't have '{}'.".format(noun))
item = None # <-- you can give a default value if you want to
# use the variable later
Looping through the list is okay until the list gets large. Then you are better off indexing the list with a dict for faster lookup.
self.inventory = dict((item.name, item)
for item in (Fists(), LeatherWallet(), Pizza()))
Then the for loop is replaced with
try:
item = player.inventory[noun]
if isinstance(item, Consumable):
print("You eat the {}.".format(noun))
else:
print("You can't eat that!")
except KeyError:
print("You don't have '{}'.".format(noun))
So, another approach is to just make this work, right?
if noun in player.inventory:
...
You have (at least) two ways to do it.
List comprehension
The first and easy one is using a list comprehension:
inventory_items = [item.name for name in player.inventory]
if noun in inventory_items:
...
That's it.
Custom class
The second one is creating an Inventory class inheriting from list and overriding the __contains__ method so you can compare them however you want.
Something like:
class Inventory(list):
def __contains__(self, item):
"""
Override `list.__contains__`.
"""
# First, check whether `item` is in the inventory.
#
# This is what a normal `list` would do.
if list.__contains__(self, item):
return True
# If we couldn't find the item in the list searching
# the "normal" way, then try comparing comparing it
# against the `name` attribute of the items in our
# inventory.
for inventory_item in self:
# Ensure we can do `inventory_item.name` so we don't get
# an AttributeError.
if 'name' not in dir(inventory_item):
continue
if inventory_item.name == item:
return True
return False
Then you instantiate your inventory like:
self.inventory = Inventory([Fists(), LeatherWallet(), Pizza()])
Remember that Inventory inherits from list, so if you can do list(x) then you should also be able to do Inventory(x) and get a very similar result (depending on how much you override in your class).
You could also get creative and make all items inherit from an Item class where you define the __eq__ method and make Pizza() == 'pizza' to simplify the Inventory comparisons.

Resources