how to remove word that is split into characters from list of strings - python-3.x

I have a list of sentences, where some of them contain only one word but it is split into characters. How can I either merge the characters to make it one word or drop the whole row?
list = ['What a rollercoaster', 'y i k e s', 'I love democracy']

I try to avoid writing regular expressions as much as I can, but from what you told me, this one could work :
import re
a = ['What a rollercoaster', 'y i k e s', 'I love democracy']
regex = re.compile(r'^(\w ){2,}.')
result = list(filter(regex.search, a))
This captures strings having at least two groups of character and space, followed by anything else. This is assuming you wouldn't have a sentence beginning with something like 'a a foo'.

Related

Caesar Cipher in Python - how to replace characters

I'm trying to re-arrange long sentence from a puzzle that is encoded using a Caesar Cipher.
Here is my code.
sentence="g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle. sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj."
import string
a=string.ascii_lowercase[]
b=a[2:]+a[:2]
for i in range(26):
sentence.replace(sentence[sentence.find(a[i])],b[i])
Am I, missing anything in replace function?
When I tried sentence.replace(sentence[sentence.find(a[0])],b[0])
it worked but why I can't loop through?
Thanks.
sentence.replace
returns a new string, which you are immediately throwing away. Note that replacing each character repeatedly will cause duplicate replacements in your cipher. See #RemcoGerlich's answer for a better-detailed explanation of what is wrong. As for the solution, what about
import string
letters = string.ascii_lowercase
shifted = {l: letters[(i + 2) % len(letters)] for i, l in enumerate(letters)}
sentence = ''.join(shifted.get(c, c) for c in sentence.lower())
or if you really want the tabled way:
from string import ascii_lowercase
rotated_lowercase = ascii_lowercase[2:] + ascii_lowercase[:2]
translation_table = str.maketrans(ascii_lowercase, rotated_lowercase)
sentence = sentence.translate(translation_table)
There are a few problems:
One, sentence[sentence.find(a[i])] is strange. It tries to look up where in the sentence the character a[1] occurs, and then looks up which character is there. Well, you already know -- a[1]. Unless that character doesn't occur in the string, then .find will return -1, and sentence[-1] is the last character in the sentence. Probably not what you meant. So instead you meant sentence.replace(a[i], b[i]).
But, you don't save the result anywhere. You meant sentence = sentence.replace(a[i], b[i]).
But that still doesn't work! What if a should be changed into b, and then b into c? Then the original as are also changed into c! That's a fundamental problem with your approach.
Better solutions are given by modesitt. Mine would have been something like
lookupdict = {a_char: b_char for (a_char, b_char) in zip(a, b)}
sentence_translated = [lookupdict.get(s, '') for s in sentence]
sentence = ''.join(sentence_translated)

Overlapping values of strings in Python

I am building a puzzle word game in Python. I have the correct puzzle word, and the guessed puzzle word. I want to build a third string which shows the correct letters in the guessed puzzle in the correct puzzle word, and _ at the position of the incorrect letters.
For example, say the correct word is APPLE and the guessed word is APTLE
then i want to have a third string: AP_L_
The guessed word and correct word are guaranteed to be 3 to 5 characters long, but the guessed word is not guaranteed to be the same length as the correct word
For example, correct word is TEA and the guessed word is TEAKO, then the third string should be TEA__ because the players guessed the last two letters incorrectly.
Another example, correct word is APPLE and guessed word is POP, the third string should be:
_ _ P_ _ (without space separation)
I can successfully get the matched indexes of the correct and guessed word; however, I am having problems building the third string. I just learned that strings in Python are immutable and that i cannot assign something like str1[index] = str2[index]
I have tried many things, including using lists, but i am not getting the correct answer. The attached code is my most recent attempt, would you please help me solve this?
Thank you
find the match between puzzle_word and guess
def matcher(str_a, str_b):
#find indexes where letters overlap
matched_indexes = [i for i, (a, b) in enumerate(zip(str_a, str_b)) if a == b]
result = []
for i in str_a:
result.append('_')
for value in matched_indexes:
result[value].replace('_', str_a[value])
print(result)
matcher("apple", "allke")
the output result right now is list of five "_"
cases:
correct word is APPLE and the guessed word is APTLE third
string: AP_L_
correct word is TEA and the guessed word is TEAKO,
third string should be TEA__
correct word is APPLE and guessed
word is POP, third string should be _ _ P_ _
You can use itertools.zip_longest here to always make sure you pad out to the longest word provided and then create a new string by joining the matching characters or otherwise a _. eg:
from itertools import zip_longest
correct_and_guess = [
('APPLE', 'APTLE'),
('TEA', 'TEAKO'),
('APPLE', 'POP')
]
for correct, guess in correct_and_guess:
# If characters in same positions match - show character otherwise `_`
new_word = ''.join(c if c == g else '_' for c, g in zip_longest(correct, guess, fillvalue='_'))
print(correct, guess, new_word)
Will print the following:
APPLE APTLE AP_LE
TEA TEAKO TEA__
APPLE POP __P__
Couple of things here.
str.replace() does not replace inline; as you noted strings are immutable, so you have to assign the result of replace:
result[value] = result[value].replace('_', str_a[value])
However, there's no point doing this since you can just assign to the list element:
result[value] = str_a[value]
And finally you can assign a list of the length of str_a without the for loop, which might be more readable:
result = ['_'] * len(str_a)

How can I replace each letter in the sentence to sentence without breaking it?

Here's my problem.
sentence = "This car is awsome."
and what I want do do is
sentence.replace("a","<emoji:a>")
sentence.replace("b","<emoji:b>")
sentence.replace("c","<emoji:c>")
and so on...
But of course if I do it in that way the letters in "<emoji:>" will also be replaced as I go along. So how can I do it in other way?
As Carlos Gonzalez suggested:
create a mapping dict and apply it to each character in sequence:
sentence = "This car is awsome."
# mapping
up = {"a":"<emoji:a>",
"b":"<emoji:b>",
"c":"<emoji:c>",}
# apply mapping to create a new text (use up[k] if present else default to k)
text = ''.join( (up.get(k,k) for k in sentence) )
print(text)
Output:
This <emoji:c><emoji:a>r is <emoji:a>wsome.
The advantage of the generator expression inside the ''.join( ... generator ...) is that it takes each single character of sentence and either keeps it or replaces it. It only ever touches each char once, so there is no danger of multiple substitutions and it takes only one pass of sentence to convert the whole thing.
Doku: dict.get(key,default) and Why dict.get(key) instead of dict[key]?
If you used
sentence = sentence.replace("a","o")
sentence = sentence.replace("o","k")
you would first make o from a and then make k from any o (or a before) - and you would have to touch each character twice to make it happen.
Using
up = { "a":"o", "o":"k" }
text = ''.join( (up.get(k,k) for k in sentence) )
avoids this.
If you want to replace more then 1 character at a time, it would be easier to do this with regex. Inspired by Passing a function to re.sub in Python
import re
sentence = "This car is awsome."
up = {"is":"Yippi",
"ws":"WhatNot",}
# modified it to create the groups using the dicts key
text2 = re.sub( "("+'|'.join(up)+")", lambda x: up[x.group()], sentence)
print(text2)
Output:
ThYippi car Yippi aWhatNotome.
Doku: re.sub(pattern, repl, string, count=0, flags=0)
You would have to take extra care with your keys, if you wanted to use "regex" specific characters that have another meaning if used as regex-pattern - f.e. .+*?()[]^$

Print First Letter of Each Word in a String in Python (Keep Punctuation Marks)

First post to site so I apologize if I do something wrong. I have looked for an appropriate answer, but could not find one.
I am new to python and have been playing around trying to take a long string (passage in a book,) and printing all but the first letter of each word while keeping the punctuation marks (Though not apostrophe marks.) and have been unsuccessful so far.
Example:
input = "Hello, I'm writing a sentence. (Though not a good one.)"
Code....
output = H, I W A S. (T N A G O.)
--Note the ",", ".", "()", but not the " ' ".
Any tips? Thank you all so much for taking the time to look
To help you on your adventure, I'll give you like a step-by-step logic of it
In python first use the .split() to seperate it by spaces
Go through each string in the list
Go through every char in the string
Print any punctuation marks that you specify and the first alphabetical character you find

How can I remove repeated characters in a string with R?

I would like to implement a function with R that removes repeated characters in a string. For instance, say my function is named removeRS, so it is supposed to work this way:
removeRS('Buenaaaaaaaaa Suerrrrte')
Buena Suerte
removeRS('Hoy estoy tristeeeeeee')
Hoy estoy triste
My function is going to be used with strings written in spanish, so it is not that common (or at least correct) to find words that have more than three successive vowels. No bother about the possible sentiment behind them. Nonetheless, there are words that can have two successive consonants (especially ll and rr), but we could skip this from our function.
So, to sum up, this function should replace the letters that appear at least three times in a row with just that letter. In one of the examples above, aaaaaaaaa is replaced with a.
Could you give me any hints to carry out this task with R?
I did not think very carefully on this, but this is my quick solution using references in regular expressions:
gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte')
# [1] "Buena Suerte"
() captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.
To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.
I think you should pay attention to the ambiguities in your problem description. This is a first stab, but it clearly does not work with "Good Luck" in the manner you desire:
removeRS <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="")
removeRS('Buenaaaaaaaaa Suerrrrte')
#[1] "Buena Suerte"
Since you want to replace letters that appear AT LEAST 3 times, here is my solution:
gsub("([[:alpha:]])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
#[1] "Buenna Suertee"
As you can see the 4 "a" have been reduced to only 1 a, the 3 r have been reduced to 1 r but the 2 n and the 2 e have not been changed.
As suggested above you can replace the [[:alpha:]] by any combination of [a-zA-KM-Z] or similar, and even use the "or" operator | inside the squre brackets [y|Q] if you want your code to affect only repetitions of y and Q.
gsub("([a|e])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
# [1] "Buenna Suerrrtee"
# triple r are not affected and there are no triple e.

Resources