Making spaces between emojis in string [working with emot] - python-3.x

from emot.emo_unicode import UNICODE_EMOJI
tweet = "#homer #yolo good hello😂🚀bye evening :-) and :) you should've"
def add_spaces(tweet):
words = tweet.split()
print(words)
for i, w in enumerate(words):
for emot in UNICODE_EMOJI:
if w == emot:
words[i] = " " + w + " "
new_tweet = " ".join(words)
print(new_tweet)
result = add_spaces(tweet)
print(result)
With the function above i try to make spaces, but only between the emojis and the word before und behind. So the output should be:
#homer #yolo good hello 😂 🚀 bye evening :-) and :) you should've
when i run this function i get the following output:
for new_tweet = #homer #yolo good hello😂🚀bye evening :-) and :) you should've
for result = None
you see the new_tweet is the same as tweet. I hope somebody can tell me where i made a mistake.
fyi: i also tried it with this function:
def add_spaces(s):
# split the string into a list of words, emojis, and punctuation
words = re.findall(
r"(?:[\w’]+[\w']+|(?:[\U0001f300-\U0001f64f])|(?:[\U0001f680-\U0001f6ff])|(?:[\.,!?:;.##)(]))",
s,
)
# loop through each word in the list
for i, w in enumerate(words):
# check if the word is an emoji
if w.startswith("\\U") and not w.startwith("#"):
# add a space before and after the emoji
words[i] = " " + w + " "
# check if the word is an "#" symbol
elif w == "#":
# do not add a space after the "#" symbol
words[i] = w
# join the words back together
s = " ".join(words)
return s
This function works... BUT:
it also makes space between the # and the #
So the output here is:
# homer # yolo good hello 😂 🚀 bye evening : ) and : ) you should've
The program recognizes the # and the # also as some kind of emoji. Maybe it is some kind of encoding problem?!
i hope somebody can give me some nice advices to make this function work. ty :)

Related

how to remove special word from a string in function

Write a function fun(long string) with one string parameter that returns a string. The function should extract the words separated by a single space " ", exclude/drop empty words as well as words that are equal to "end" and "exit", convert the remaining words to upper case, join them with the joining token string ";" and return this newly joined string.
my code is......
def fun(long_string):
long_string = long_string.split(' ')
try:
if 'exit' in long_string:
long_string.remove('exit')
elif 'end' in long_string:
long_string.remove('end')
except ValueError:
pass
.....................
but it does not remove the "End or exit" .can someone pls help me to get it out. Im beginner in python and I stack here
You could try this and convert into the function as you wish - it's very straightforward.
Code did not fully test yet (but works for your inputs), so please try different inputs and you can learn to "improve" it to meet your requirement. Please ask if you have any questions.
inputs = "this is a long test exit string"
stop_words = ('end', 'exit')
outs = ''
for word in inputs.split():
if word in stop_words:
outs = inputs.replace(word, " ")
ans = ';'.join(w.upper() for w in outs.split()) # do the final conversion
Confirm it:
assert ans == "THIS;IS;A;LONG;TEST;STRING" # silent means True
Edit: add function:
def fun(long_string):
#s = "this is a long test exit string"
stop_words = ('end', 'exit')
outs = ''
for word in long_string.split():
if word in stop_words:
outs = long_string.replace(word, " ")
ans = ';'.join(w.upper() for w in outs.split())
return ans
text = "this is a long test exit string"
print(fun(text))

Capitalizing each words with chr and ord

First I have to receive a string from the user. The function would be capitalizing the introduced string object. It would make the words start with uppercased characters and all remaining characters have lower case. Here is what I did:
ssplit = s.split()
for z in s.split():
if ord(z[0]) < 65 or ord(z[0])>90:
l=(chr(ord(z[0])-32))
new = l + ssplit[1:]
print(new)
else:
print(s)
I can't see what I am doing wrong.
Using str.title() as suggested by #Pyer is nice. If you need to use chr and ord you should get your variables right - see comments in code:
s = "this is a demo text"
ssplit = s.split()
# I dislike magic numbers, simply get them here:
small_a = ord("a") # 97
small_z = ord("z")
cap_a = ord("A") # 65
delta = small_a - cap_a
for z in ssplit : # use ssplit here - you created it explicitly
if small_a <= ord(z[0]) <= small_z:
l = chr(ord(z[0])-delta)
new = l + z[1:] # need z here - not ssplit[1:]
print(new)
else:
print(s)
Output:
This
Is
A
Demo
Text
There are many python methods that could solve this easily for you. For example, the str.title() will capitalize the start of every word in the given string. If you wanted to ensure that all others were lowercase, you could first do str.lower() and then str.title().
s = 'helLO how ARE YoU'
s.lower()
s.capitalize()
# s = 'Hello How Are You'

Print all words in a String without split() function

I want to print out all words in a string, line by line without using split() funcion in Python 3.
The phrase is a str(input) by the user, and it has to print all the words in the string, no matter it's size.Here's my code:
my_string = str(input("Phrase: "))
tam = len(my_string)
s = my_string
ch = " "
cont = 0
for i, letter in enumerate(s):
if letter == ch:
#print(i)
print(my_string[cont:i])
cont+=i+1
The output to this is:
Phrase: Hello there my friend
Hello
there
It is printing only two words in the string, and I need it to print all the words , line by line.
My apologies, if this isn't a homework question, but I will leave you to figure out the why.
a = "Hello there my friend"
b = "".join([[i, "\n"][i == " "] for i in a])
print(b)
Hello
there
my
friend
Some variants you can add to the process which you can't get easily with if-else syntax:
print(b.Title()) # b.lower() or b.upper()
Hello
There
My
Friend
def break_words(x):
x = x + " " #the extra space after x is nessesary for more than two word strings
strng = ""
for i in x: #iterate through the string
if i != " ": #if char is not a space
strng = strng+i #assign it to another string
else:
print(strng) #print that new string
strng = "" #reset new string
break_words("hell o world")
output:
hell
o
world

cutting a string at spaces python

I am beginning to learn python and want to cut a string at the spaces; so 'hello world' becomes 'hello' and 'world'. To do this i want to save the locations of the spaces in a list, but i can't figure out how to do this. In order to find the spaces i do this:
def string_splitting(text):
i = 0
for i in range(len(text)):
if (text[i]==' '):
After saving them in the list i want to display them with text[:list[1]] (or something like that)
Can anyone help me with the saving it in a list part; and is this even possible?
(Another way to cut the string is welcome to :-) )
Thanks.
Use split:
"hello world my name is".split(' ')
It will give you a list of strings
thanks, i tried to do it without the split option, should have said that in the question..
anyways this worked;
def split_stringj(text):
a = text.count(' ')
p = len(text)
l = [-1]
x = 0
y = 1
for x in range(p):
if (text[x]==' '):
l.append(x)
y += 1
l.append(p)
print l
for x in range(len(l)-1):
print text[l[x]+1:l[x+1]]

Find string between two substrings [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?
My current method is like this:
>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis
However, this seems very inefficient and un-pythonic. What is a better way to do something like this?
Forgot to mention:
The string might not start and end with start and end. They may have more characters before and after.
import re
s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))
s = "123123STRINGabcabc"
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
def find_between_r( s, first, last ):
try:
start = s.rindex( first ) + len( first )
end = s.rindex( last, start )
return s[start:end]
except ValueError:
return ""
print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )
gives:
123STRING
STRINGabc
I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).
start = 'asdf=5;'
end = '123jasd'
s = 'asdf=5;iwantthis123jasd'
print s[s.find(start)+len(start):s.rfind(end)]
gives
iwantthis
s[len(start):-len(end)]
String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.
import re
s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'
result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)
Just converting the OP's own solution into an answer:
def find_between(s, start, end):
return (s.split(start))[1].split(end)[0]
If you don't want to import anything, try the string method .index():
text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'
# Output: 'string'
print(text[text.index(left)+len(left):text.index(right)])
source='your token _here0#df and maybe _here1#df or maybe _here2#df'
start_sep='_'
end_sep='#df'
result=[]
tmp=source.split(start_sep)
for par in tmp:
if end_sep in par:
result.append(par.split(end_sep)[0])
print result
must show:
here0, here1, here2
the regex is better but it will require additional lib an you may want to go for python only
Here is one way to do it
_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result
Another way using regexp
import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]
or
print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)
Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.
def GetListOfSubstrings(stringSubject,string1,string2):
MyList = []
intstart=0
strlength=len(stringSubject)
continueloop = 1
while(intstart < strlength and continueloop == 1):
intindex1=stringSubject.find(string1,intstart)
if(intindex1 != -1): #The substring was found, lets proceed
intindex1 = intindex1+len(string1)
intindex2 = stringSubject.find(string2,intindex1)
if(intindex2 != -1):
subsequence=stringSubject[intindex1:intindex2]
MyList.append(subsequence)
intstart=intindex2+len(string2)
else:
continueloop=0
else:
continueloop=0
return MyList
#Usage Example
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y68")
for x in range(0, len(List)):
print(List[x])
output:
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","3")
for x in range(0, len(List)):
print(List[x])
output:
2
2
2
2
mystring="s123y123o123pp123y6"
List = GetListOfSubstrings(mystring,"1","y")
for x in range(0, len(List)):
print(List[x])
output:
23
23o123pp123
To extract STRING, try:
myString = '123STRINGabc'
startString = '123'
endString = 'abc'
mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]
You can simply use this code or copy the function below. All neatly in one line.
def substring(whole, sub1, sub2):
return whole[whole.index(sub1) : whole.index(sub2)]
If you run the function as follows.
print(substring("5+(5*2)+2", "(", "("))
You will pobably be left with the output:
(5*2
rather than
5*2
If you want to have the sub-strings on the end of the output the code must look like below.
return whole[whole.index(sub1) : whole.index(sub2) + 1]
But if you don't want the substrings on the end the +1 must be on the first value.
return whole[whole.index(sub1) + 1 : whole.index(sub2)]
These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():
def extractstring(line,flag='$'):
if flag in line: # $ is the flag
dex1=line.index(flag)
subline=line[dex1+1:-1] #leave out flag (+1) to end of line
dex2=subline.index(flag)
string=subline[0:dex2].strip() #does not include last flag, strip whitespace
return(string)
Example:
lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',
'afafoaltat $I GOT BETTER!$ derpity derp derp']
for line in lines:
string=extractstring(line,flag='$')
print(string)
Gives:
A NEWT?
I GOT BETTER!
This is essentially cji's answer - Jul 30 '10 at 5:58.
I changed the try except structure for a little more clarity on what was causing the exception.
def find_between( inputStr, firstSubstr, lastSubstr ):
'''
find between firstSubstr and lastSubstr in inputStr STARTING FROM THE LEFT
http://stackoverflow.com/questions/3368969/find-string-between-two-substrings
above also has a func that does this FROM THE RIGHT
'''
start, end = (-1,-1)
try:
start = inputStr.index( firstSubstr ) + len( firstSubstr )
except ValueError:
print ' ValueError: ',
print "firstSubstr=%s - "%( firstSubstr ),
print sys.exc_info()[1]
try:
end = inputStr.index( lastSubstr, start )
except ValueError:
print ' ValueError: ',
print "lastSubstr=%s - "%( lastSubstr ),
print sys.exc_info()[1]
return inputStr[start:end]
from timeit import timeit
from re import search, DOTALL
def partition_find(string, start, end):
return string.partition(start)[2].rpartition(end)[0]
def re_find(string, start, end):
# applying re.escape to start and end would be safer
return search(start + '(.*)' + end, string, DOTALL).group(1)
def index_find(string, start, end):
return string[string.find(start) + len(start):string.rfind(end)]
# The wikitext of "Alan Turing law" article form English Wikipeida
# https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886
string = """..."""
start = '==Proposals=='
end = '==Rival bills=='
assert index_find(string, start, end) \
== partition_find(string, start, end) \
== re_find(string, start, end)
print('index_find', timeit(
'index_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('partition_find', timeit(
'partition_find(string, start, end)',
globals=globals(),
number=100_000,
))
print('re_find', timeit(
're_find(string, start, end)',
globals=globals(),
number=100_000,
))
Result:
index_find 0.35047444528454114
partition_find 0.5327825636197754
re_find 7.552149639286381
re_find was almost 20 times slower than index_find in this example.
My method will be to do something like,
find index of start string in s => i
find index of end string in s => j
substring = substring(i+len(start) to j-1)
This I posted before as code snippet in Daniweb:
# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
before,_,a = s.partition(left)
a,_,after = a.partition(right)
return before,a,after
s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)
""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""
Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:
nuke = '~~~'
start = '|*'
stop = '*|'
julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))
keep = [chunk for chunk in julien if start in chunk and stop in chunk]
logging.info('keep: %s',keep)
Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):
version: '3.1'
services:
ui:
image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1
#network_mode: host
ports:
- 443:9999
ulimits:
nofile:test
and this is how it worked for me (python script):
import re, sys
f = open('docker-compose.yml', 'r')
lines = f.read()
result = re.search('ui:(.*)-', lines)
print result.group(1)
Result:
0.0.2
This seems much more straight forward to me:
import re
s = 'asdf=5;iwantthis123jasd'
x= re.search('iwantthis',s)
print(s[x.start():x.end()])

Resources