join strings within a list of lists by 4 - python-3.x

Background
I have a list of lists as seen below
l = [['NAME',':', 'Mickey', 'Mouse', 'was', 'here', 'and', 'Micky', 'mouse', 'went', 'out'],
['Donal', 'duck', 'was','Date', 'of', 'Service', 'for', 'Donald', 'D', 'Duck', 'was', 'yesterday'],
['I', 'like','Pluto', 'the', 'carton','Dog', 'bc','he', 'is','fun']]
Goal
Join l by every 4 elements (when possible)
Problem
But sometimes 4 elements won't cleanly join as 4 as seen in my desired output
Desired Output
desired_l = [['NAME : Mickey Mouse', 'was here and Micky', 'mouse went out'],
['Donal duck was Date', 'of Service for Donald', 'D Duck was yesterday'],
['I like Pluto the', 'carton Dog bc he', 'is fun']]
Question
How do I achive desired_l?

itertools has some nifty functions, one of which can do this to do just this.
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
[[' '.join(filter(None, x)) for x in list(grouper(sentence, 4, fillvalue=''))] for sentence in l]
Result:
[['NAME : Mickey Mouse', 'was here and Micky', 'mouse went out'],
['Donal duck was Date', 'of Service for Donald', 'D Duck was yesterday'],
['I like Pluto the', 'carton Dog bc he', 'is fun']]

Related

Best python data structure to replace values in a column?

I am working with a dataframe where I need to replace values in 1 column. My natural instinct is to go towards a python dictionary HOWEVER, this is an example of what my data looks like (original_col):
original_col desired_col
cat animal
dog animal
bunny animal
cat animal
chair furniture
couch furniture
Bob person
Lisa person
A dictionary would look something like:
my_dict: {'animal': ['cat', 'dog', 'bunny'], 'furniture': ['chair', 'couch'], 'person': ['Bob', 'Lisa']}
I can't use the typical my_dict.get() since I am looking to retrieve corresponding KEY rather than the value. Is dictionary the best data structure? Any suggestions?
flip your dictionary:
my_new_dict = {v: k for k, vals in my_dict.items() for v in vals}
note, this will not work if you have values like: dog->animal, dog->person
DataFrame.replace already accepts a dictionary in a specific structure so you don't need to re-invent the wheel: {col_name: {old_value: new_value}}
df.replace({'original_col': {'cat': 'animal', 'dog': 'animal', 'bunny': 'animal',
'chair': 'furniture', 'couch': 'furniture',
'Bob': 'person', 'Lisa': 'person'}})
Alternatively you could use Series.replace, then only the inner dictionary is required:
df['original_col'].replace({'cat': 'animal', 'dog': 'animal', 'bunny': 'animal',
'chair': 'furniture', 'couch': 'furniture',
'Bob': 'person', 'Lisa': 'person'})
The pandas map() function uses a dictionary or another pandas Series to perform this kind of lookup, IIUC:
# original column / data
data = ['cat', 'dog', 'bunny', 'cat', 'chair', 'couch', 'Bob', 'Lisa']
# original dict
my_dict: {'animal': ['cat', 'dog', 'bunny'],
'furniture': ['chair', 'couch'],
'person': ['Bob', 'Lisa']
}
# invert the dictionary
new_dict = { v: k
for k, vs in my_dict.items()
for v in vs }
# create series and use `map()` to perform dictionary lookup
df = pd.concat([
pd.Series(data).rename('original_col'),
pd.Series(data).map(new_values).rename('desired_col')], axis=1)
print(df)
original_col desired_col
0 cat animal
1 dog animal
2 bunny animal
3 cat animal
4 chair furniture
5 couch furniture
6 Bob person
7 Lisa person

converting a file into dict

my_file = "The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again. "
Expected output:
{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'], 'itsy': ['bitsy', 'bitsy'], 'bitsy': ['spider', 'spider'], 'spider': ['went', 'out', 'went'], 'went': ['up', 'up'], 'up': ['the', 'all', 'the'], 'water': ['spout'], 'spout': ['down', 'again'], 'down': ['came'], 'came': ['the', 'the'], 'rain': ['washed', 'and'], 'washed': ['the'], 'out': ['out', 'came'], 'sun': ['dried'], 'dried': ['up'], 'all': ['the'], 'and': ['the'], 'again': []}
My code:
import string
words_set = {}
for line in my_file:
lower_text = line.lower()
for word in lower_text.split():
word = word.strip(string.punctuation + string.digits)
if word:
if word in words_set:
words_set[word] = words_set[word] + 1
else:
words_set[word] = 1
You can reproduce your expected results with a few concepts:
Given
import string
import itertools as it
import collections as ct
data = """\
The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again.
"""
Code
def clean_string(s:str) -> str:
"""Return a list of lowered strings without punctuation."""
table = str.maketrans("","", string.punctuation)
return s.lower().translate(table).replace(" ", " ").replace("\n", " ")
def get_neighbors(words:list) -> dict:
"""Return a dict of right-hand, neighboring words."""
dd = ct.defaultdict(list)
for word, nxt in it.zip_longest(words, words[1:], fillvalue=""):
dd[word].append(nxt)
return dict(dd)
Demo
words = clean_string(data).split()
get_neighbors(words)
Results
{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'],
'itsy': ['bitsy', 'bitsy'],
'bitsy': ['spider', 'spider'],
'spider': ['went', 'out', 'went'],
'went': ['up', 'up'],
'up': ['the', 'all', 'the'],
'water': ['spout'],
'spout': ['down', 'again'],
'down': ['came'],
'came': ['the', 'the'],
'rain': ['washed', 'and'],
'washed': ['the'],
'out': ['out', 'came'],
'sun': ['dried'],
'dried': ['up'],
'all': ['the'],
'and': ['the'],
'again': ['']}
Details
clean_string
You can use any number of ways to remove punctuation. Here we use a translation table to replace most of the punctuation. Others are directly removed via str.replace().
get_neighbors
A defaultdict makes a dict of lists. A new list value is made if a key is missing.
We make the dict by iterating two juxtaposed word lists, one ahead of the other.
These lists are zipped by the longest list, filling the shorter list with an empty string.
dict(dd) ensures a simply dict is returned.
If you solely wish to count words:
Demo
ct.Counter(words)
Results
Counter({'the': 8,
'itsy': 2,
'bitsy': 2,
'spider': 3,
'went': 2,
'up': 3,
'water': 1,
'spout': 2,
'down': 1,
'came': 2,
'rain': 2,
'washed': 1,
'out': 2,
'sun': 1,
'dried': 1,
'all': 1,
'and': 1,
'again': 1})

How to create a matrix (5 x 5) with strings, without using numpy? (Python 3)

So I am creating a memory matching game, in which a player will pair two words from a list of words.
I am trying to create a 2D 5x5 matrix with strings without using numpy.
I've tried with for i in range(x): for j in range(x), but I can't get it to work.
So how do I do?
Python doesn't have a built in matrix type like that, but you can pretty much emulate it with a list of lists, or with a dict keyed by ordered pairs.
Here's the list of lists approach using a list comprehension inside a list comprehension:
from pprint import pprint
matrix = [[c for c in line] for line in '12345 abcde ABCDE vwxyz VWXYZ'.split()]
pprint(matrix)
The result, pretty-printed.
[['1', '2', '3', '4', '5'],
['a', 'b', 'c', 'd', 'e'],
['A', 'B', 'C', 'D', 'E'],
['v', 'w', 'x', 'y', 'z'],
['V', 'W', 'X', 'Y', 'Z']]
You can split on different characters in the inner or outer loops.
matrix = [[word for word in line.split()] for line in 'foo bar;spam eggs'.split(';')]
You get and set elements with a double lookup, like matrix[2][3].
Results can vary with pprint depending on the width of the words. List of lists are pretty easy to print in matrix form though. .join() is the inverse of .split().
print('\n'.join('\t'.join(line) for line in matrix))
And the result in this case,
foo bar
spam eggs
This just uses a tab character '\t', which may or may not produce good results depending on your tab stops and word withs. You can control this more precisely by using the justify string methods or .format() or f-strings with specifiers.
Here's one with the pair-keyed dict. Recall that tuples of hashable types are hashable too.
{(i, j): 'x' for i in range(5) for j in range(5)}
You get and set elements with a pair lookup, like matrix[2, 3].
Again, you can use words.
{(i, j): word
for i, line in enumerate("""\
1 2 3 4 5
foo bar baz quux norlf
FOO BAR BAZ QUUX NORLF
spam eggs sausage bacon ham
SPAM EGGS SAUSAGE BACON HAM""".split('\n'))
for j, word in enumerate(line.split())}

How to choose randomly in Python

I have to write a code in python that chooses a word from 7 lists (total of 7 words) and then runs a requested number of lines to form a "poem". Each line of the "poem" is meant to be a different combination of the 7 lists. Any ideas of how to get the program to run different combinations? Mine just runs the same line the number of times I asked:
people=['Amir', 'Itai', 'Sheli','Gil','Jasmin','Tal','Nadav']
verbs = ['talks', 'smiles', 'sings', 'listens', 'eats', 'waves', 'plays', 'swims']
Adverbs =['slowly', 'quickly', 'solemnly', 'nicely', 'beautifully']
Prepositions=['to a', 'with a' ,'towards a', 'at a' ,'out of a']
Adjectives =['white', 'blue', 'green', 'small', 'large', 'yellow', 'pretty', 'sad']
Animated=['fish', 'parrot', 'flower', 'tree', 'snake']
Inanimated=['chair', 'lamp', 'car', 'ship', 'boat']
x=eval(input("How many lines are in the poem?"))
y=(random.choice(people), random.choice (verbs) ,random.choice(Adverbs) ,random.choice(Prepositions) ,random.choice(Adjectives) ,random.choice(Animated+Inanimated))
for i in range (x):
if (x< 10):
print (y)
You have the right idea, but you just need to re-evaluate the random choice each time:
people=['Amir', 'Itai', 'Sheli','Gil','Jasmin','Tal','Nadav']
verbs = ['talks', 'smiles', 'sings', 'listens', 'eats', 'waves', 'plays', 'swims']
Adverbs =['slowly', 'quickly', 'solemnly', 'nicely', 'beautifully']
Prepositions=['to a', 'with a' ,'towards a', 'at a' ,'out of a']
Adjectives =['white', 'blue', 'green', 'small', 'large', 'yellow', 'pretty', 'sad']
Animated=['fish', 'parrot', 'flower', 'tree', 'snake']
Inanimated=['chair', 'lamp', 'car', 'ship', 'boat']
x=eval(input("How many lines are in the poem?"))
for i in range (x):
y=(random.choice(people), random.choice (verbs) ,random.choice(Adverbs) ,random.choice(Prepositions) ,random.choice(Adjectives) ,random.choice(Animated+Inanimated))
if (i < 10):
print (y)
I think this can help you:
import random
people=['Amir', 'Itai', 'Sheli','Gil','Jasmin','Tal','Nadav']
verbs = ['talks', 'smiles', 'sings', 'listens', 'eats', 'waves', 'plays', 'swims']
Adverbs =['slowly', 'quickly', 'solemnly', 'nicely', 'beautifully']
Prepositions=['to a', 'with a' ,'towards a', 'at a' ,'out of a']
Adjectives =['white', 'blue', 'green', 'small', 'large', 'yellow', 'pretty', 'sad']
Animated=['fish', 'parrot', 'flower', 'tree', 'snake']
Inanimated=['chair', 'lamp', 'car', 'ship', 'boat']
while True:
x=eval(input("How many lines are in the poem?"))
if x == 0:
break
for i in range (x):
if (x< 10):
y = (random.choice(people), random.choice(verbs), random.choice(Adverbs), random.choice(Prepositions),random.choice(Adjectives), random.choice(Animated + Inanimated))
print (y)

Comparing like words between two dictionaries

I am using python 3.x,
I have 2 dictionaries (both very large but will substitute here). The values of the dictionaries contain more than one word:
dict_a = {'key1': 'Large left panel', 'key2': 'Orange bear rug', 'key3': 'Luxo jr. lamp'}
dict_a
{'key1': 'Large left panel',
'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'}
dict_b = {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain', 'keyZ': 'large bear musket'}
dict_b
{'keyX': 'titanium panel',
'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'}
I am looking for a way to compare the individual words contained in the values of dict_a to the words contained in the values of dict_b and return a dictionary or data-frame that contains the word, and the keys from dict_a and dict_b it is associated with:
My desired output (not formatted any certain way):
bear: key2 (from dict_a), keyZ(from dict_b)
Luxo: key3
orange: key2 (from dict_a), keyY (from dict_b)
I've got code that works for looking up a specific word in a single dictionary but it's not sufficient for what I need to accomplish here:
def search(myDict, lookup):
aDict = {}
for key, value in myDict.items():
for v in value:
if lookup in v:
aDict[key] = value
return aDict
print (key, value)
dicts = {'a': {'key1': 'Large left panel', 'key2': 'Orange bear rug',
'key3': 'Luxo jr. lamp'},
'b': {'keyX': 'titanium panel', 'keyY': 'orange Ball and chain',
'keyZ': 'large bear musket'} }
from collections import defaultdict
index = defaultdict(list)
for dname, d in dicts.items():
for key, words in d.items():
for word in words.lower().split(): # lower() to make Orange/orange match
index[word].append((dname, key))
index now contains:
{'and' : [('b', 'keyY')],
'ball' : [('b', 'keyY')],
'bear' : [('a', 'key2'), ('b', 'keyZ')],
'chain' : [('b', 'keyY')],
'jr.' : [('a', 'key3')],
'lamp' : [('a', 'key3')],
'large' : [('a', 'key1'), ('b', 'keyZ')],
'left' : [('a', 'key1')],
'luxo' : [('a', 'key3')],
'musket' : [('b', 'keyZ')],
'orange' : [('a', 'key2'), ('b', 'keyY')],
'panel' : [('a', 'key1'), ('b', 'keyX')],
'rug' : [('a', 'key2')],
'titanium': [('b', 'keyX')] }
Update to comments
Since your actual dictionary is a mapping from string to list (and not string to string) change your loops to
for dname, d in dicts.items():
for key, wordlist in d.items(): # changed "words" to "wordlist"
for words in wordlist: # added extra loop to iterate over wordlist
for word in words.split(): # removed .lower() since text is always uppercase
index[word].append((dname, key))
Since your lists have only one item you could just do
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
index[word].append((dname, key))
If you have words that you don't want to be added to your index you can skip adding them to the index:
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
Then filter them out with
if word in words_to_skip:
continue
I noticed that you have some words surrounded by parenthesis (such as (342) and (221)). If you want to get rid the parenthesis do
if word[0] == '(' and word[-1] == ')':
word = word[1:-1]
Putting this all together we get
words_to_skip = {'-', ';', '/', 'AND', 'TO', 'UP', 'WITH', ''}
for dname, d in dicts.items():
for key, wordlist in d.items():
for word in wordlist[0].split(): # assumes single item list
if word[0] == '(' and word[-1] == ')':
word = word[1:-1] # remove outer parenthesis
if word in words_to_skip: # skip unwanted words
continue
index[word].append((dname, key))
I think you can do what you want pretty easily. This code produces output in the format {word: {key: name_of_dict_the_key_is_in}}:
def search(**dicts):
result = {}
for name, dct in dicts.items():
for key, value in dct.items():
for word in value.split():
result.setdefault(word, {})[key] = name
return result
You call it with the input dictionaries as keyword arguments. The keyword you use for each dictionary will be the string used to describe it in the output dictionary, so use something like search(dict_a=dict_a, dict_b=dict_b).
If your dictionaries might have some of the same keys, this code might not work right, since the keys could collide if they have the same words in their values. You could make the outer dict contain a list of (key, name) tuples, instead of an inner dictionary, I suppose. Just change the assignment line to result.setdefault(word, []).append((key, name)). That would be less handy to search in though.

Resources