Python 3 unicode ZWJ error with String replace - string

I need to replace ANSII characters with UNICODE (Sinhala). I use lists with a loop to do that as follows,
for i in range (len(charansi)):
for j in range (len(charUni)):
s = charansi[i] + ansimod[j]
v = charUni[i] + modUni[j]
textSource = textSource.replace(s, v)
if we use n + uu as ANSII input, it should give නූ as Unicode out put. But instead of that, it gives න ූ
to clarify more,
charansi = n
ansimod = uu
charUni = න
modUni = ූ
this න and ූ must join without spaces. I think ZWJ (\u200D) plays a role here. so i tried
v = u"\u200D".join((consonantsUni[i], vowelModifiersUni[j]))
gives same result.
How do I fix this issue?

Your question is a bit confusing, but this simply works:
#coding:utf8
charansi = 'n'
ansimod = 'uu'
charUni = 'න'
modUni = 'ූ'
v = s.replace(charansi+ansimod,charUni+modUni)
print(v)
Output:
නූ
Create a working example of the problem if this isn't what you want.
You could also use the following to make the characters more clear. At least on my browser, the modifier didn't display very well.
charUni = '\N{SINHALA LETTER DANTAJA NAYANNA}'
modUni = '\N{SINHALA VOWEL SIGN DIGA PAA-PILLA}'

Related

LUA: Generating Unique Mac from given Number Value

I am trying to generate a unique MAC id from given a number value. The length on the number is between 1 to 5 digit. I have formatted the MAC table to place each digit starting from first value of MAC.
local MacFormat ={[1] = "0A:BC:DE:FA:BC:DE",[2] = "00:BC:DE:FA:BC:DE",[3] = "00:0C:DE:FA:BC:DE",[4] = "00:00:DE:FA:BC:DE",[5] = "00:00:0E:FA:BC:DE"}
local idNumbers = {[1] = "1",[2]="12",[3]="123",[4]="1234",[5]="12345"}
for w in string.gfind(idNumbers[3], "(%d)") do
print(w)
str = string.gsub(MacFormat[3],"0",tonumber(w))
end
print(str)
---output 33:3C:DE:FA:BC:DE
--- Desired Output 12:3C:DE:FA:BC:DE
I have tried multiple Patterns with *, +, ., but none is working.
for w in string.gfind(idNumbers[3], "(%d)") do
print(w)
str = string.gsub(MacFormat[3],"0",tonumber(w))
end
print(str)
Your loop body is equivalent to
str = string.gsub("00:0C:DE:FA:BC:DE", "0",1)
str = string.gsub("00:0C:DE:FA:BC:DE", "0", 2)
str = string.gsub("00:0C:DE:FA:BC:DE", "0", 3)
So str is "33:3C:DE:FA:BC:DE"
MacFormat[3] is never altered and the result of gsub is overwritten in each line.
You can build the pattern and replacement dynamically:
local MacFormat ={[1] = "0A:BC:DE:FA:BC:DE",[2] = "00:BC:DE:FA:BC:DE",[3] = "00:0C:DE:FA:BC:DE",[4] = "00:00:DE:FA:BC:DE",[5] = "00:00:0E:FA:BC:DE"}
local idNumbers = {[1] = "1",[2]="12",[3]="123",[4]="1234",[5]="12345"}
local p = "^" .. ("0"):rep(string.len(idNumbers[3])):gsub("(..)", "%1:")
local repl = idNumbers[3]:gsub("(..)", "%1:")
local str = MacFormat[3]:gsub(p, repl)
print(str)
-- => 12:3C:DE:FA:BC:DE
See the online Lua demo.
The pattern is "^" .. ("0"):rep(string.len(idNumbers[3])):gsub("(..)", "%1:"): ^ matches the start of string, then a string of zeros (of the same size a idNumbers, see ("0"):rep(string.len(idNumbers[3]))) follows with a : after each pair of zeros (:gsub("(..)", "%1:")).
The replacement is the idNumbers item with a colon inserted after every second char with idNumbers[3]:gsub("(..)", "%1:").
In this current case, the pattern will be ^00:0 and the replacement will be 12:3.
See the full demo here.

How to convert a string looking like a list to list of floats?

I have this list:
s = '[ 0.00889175 -0.04808848 0.06218296 0.06312469 -0.00700571\n -0.08287739]'
it contains a '\n' character, I want to convert it to a list of float like this:
l = [0.00889175, -0.04808848, 0.06218296, 0.06312469, -0.00700571, -0.08287739]
I tried this code, which is close to what I want:
l = [x.replace('\n','').strip(' []') for x in s.split(',')]
but it still keeps quotes that I didn't manage to remove (i tried str.replace("'","") but it didn't work), this is what I get:
['0.00889175 -0.04808848 0.06218296 0.06312469 -0.00700571 -0.08287739']
You were quite close. This will work:
s = '[ 0.00889175 -0.04808848 0.06218296 0.06312469 -0.00700571\n -0.08287739]'
l = [float(n) for n in s.strip("[]").split()]
print(l)
Output:
[0.00889175, -0.04808848, 0.06218296, 0.06312469, -0.00700571, -0.08287739]
First thing needs to cleared that if you are keeping the str then there will be quotes unless you typecast each of element of your str by splitting it.
Following is my solution to your problem:
s='[ 0.00889175 -0.04808848 0.06218296 0.06312469 -0.00700571\n -0.08287739]'
#removing newline \n
new_str = s.replace('\n', '')
#stripping the brackets and extra space
new_str = new_str.strip(' []')
#splitting elements into a list
list_of_floats = new_str.split()
#typecasting from str to float
for _i, element in enumerate(list_of_floats):
list_of_floats[_i] = float(element)
print(list_of_floats)
#output
#[0.00889175, -0.04808848, 0.06218296, 0.06312469, -0.00700571, -0.08287739]

Getting the largest and smallest word at a string

when I run this codes the output is (" "," "),however it should be ("I","love")!!!, and there is no errors . what should I do to fix it ??
sen="I love dogs"
function Longest_word(sen)
x=" "
maxw=" "
minw=" "
minl=1
maxl=length(sen)
p=0
for i=1:length(sen)
if(sen[i]!=" ")
x=[x[1]...,sen[i]...]
else
p=length(x)
if p<min1
minl=p
minw=x
end
if p>maxl
maxl=p
maxw=x
end
x=" "
end
end
return minw,maxw
end
As #David mentioned, another and may be better solution can be achieved by using split function:
function longest_word(sentence)
sp=split(sentence)
len=map(length,sp)
return (sp[indmin(len)],sp[indmax(len)])
end
The idea of your code is good, but there are a few mistakes.
You can see what's going wrong by debugging a bit. The easiest way to do this is with #show, which prints out the value of variables. When code doesn't work like you expect, this is the first thing to do -- just ask it what it's doing by printing everything out!
E.g. if you put
if(sen[i]!=" ")
x=[x[1]...,sen[i]...]
#show x
and run the function with
Longest_word("I love dogs")
you will see that it is not doing what you want it to do, which (I believe) is add the ith letter to the string x.
Note that the ith letter accessed like sen[i] is a character not a string.
You can try converting it to a string with
string(sen[i])
but this gives a Unicode string, not an ASCII string, in recent versions of Julia.
In fact, it would be better not to iterate over the string using
for i in 1:length(sen)
but iterate over the characters in the string (which will also work if the string is Unicode):
for c in sen
Then you can initialise the string x as
x = UTF8String("")
and update it with
x = string(x, c)
Try out some of these possibilities and see if they help.
Also, you have maxl and minl defined wrong initially -- they should be the other way round. Also, the names of the variables are not very helpful for understanding what should happen. And the strings should be initialised to empty strings, "", not a string with a space, " ".
#daycaster is correct that there seems to be a min1 that should be minl.
However, in fact there is an easier way to solve the problem, using the split function, which divides a string into words.
Let us know if you still have a problem.
Here is a working version following your idea:
function longest_word(sentence)
x = UTF8String("")
maxw = ""
minw = ""
maxl = 0 # counterintuitive! start the "wrong" way round
minl = length(sentence)
for i in 1:length(sentence) # or: for c in sentence
if sentence[i] != ' ' # or: if c != ' '
x = string(x, sentence[i]) # or: x = string(x, c)
else
p = length(x)
if p < minl
minl = p
minw = x
end
if p > maxl
maxl = p
maxw = x
end
x = ""
end
end
return minw, maxw
end
Note that this function does not work if the longest word is at the end of the string. How could you modify it for this case?

Change Letters in A String One at a Time (Pandas,Python3)

I have a list of words in Pandas (DF)
Words
Shirt
Blouse
Sweater
What I'm trying to do is swap out certain letters in those words with letters from my dictionary one letter at a time.
so for example:
mydict = {"e":"q,w",
"a":"z"}
would create a new list that first replaces all the "e" in a list one at a time, and then iterates through again replacing all the "a" one at a time:
Words
Shirt
Blouse
Sweater
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
I've been looking around at solutions here: Mass string replace in python?
and have tried the following code but it changes all instances "e" instead of doing so one at a time -- any help?:
mydict = {"e":"q,w"}
s = DF
for k, v in mydict.items():
for j in v:
s['Words'] = s["Words"].str.replace(k, j)
DF["Words"] = s
this doesn't seem to work either:
s = DF.replace({"Words": {"e": "q","w"}})
This answer is very similar to Brian's answer, but a little bit sanitized and the output has no duplicates:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
newwords = []
for word in words:
newwords.append(word)
for c in md:
occ = word.count(c)
pos = 0
for _ in range(occ):
pos = word.find(c, pos)
for r in md[c]:
tmp = word[:pos] + r + word[pos+1:]
newwords.append(tmp)
pos += 1
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Blousq', 'Blousw', 'Sweater', 'Swqater', 'Swwater', 'Sweatqr', 'Sweatwr', 'Swezter']
Prettyprint:
Words
Shirt
Blouse
Blousq
Blousw
Sweater
Swqater
Swwater
Sweatqr
Sweatwr
Swezter
Any errors are a result of the current time. ;)
Update (explanation)
tl;dr
The main idea is to find the occurences of the character in the word one after another. For each occurence we are then replacing it with the replacing-char (again one after another). The replaced word get's added to the output-list.
I will try to explain everything step by step:
words = ["Words", "Shirt", "Blouse", "Sweater"]
md = {"e": "q,w", "a": "z"}
Well. Your basic input. :)
md = {k: v.split(',') for k, v in md.items()}
A simpler way to deal with replacing-dictionary. md now looks like {"e": ["q", "w"], "a": ["z"]}. Now we don't have to handle "q,w" and "z" differently but the step for replacing is just the same and ignores the fact, that "a" only got one replace-char.
newwords = []
The new list to store the output in.
for word in words:
newwords.append(word)
We have to do those actions for each word (I assume, the reason is clear). We also append the world directly to our just created output-list (newwords).
for c in md:
c as short for character. So for each character we want to replace (all keys of md), we do the following stuff.
occ = word.count(c)
occ for occurrences (yeah. count would fit as well :P). word.count(c) returns the number of occurences of the character/string c in word. So "Sweater".count("o") => 0 and "Sweater".count("e") => 2.
We use this here to know, how often we have to take a look at word to get all those occurences of c.
pos = 0
Our startposition to look for c in word. Comes into use in the next loop.
for _ in range(occ):
For each occurence. As a continual number has no value for us here, we "discard" it by naming it _. At this point where c is in word. Yet.
pos = word.find(c, pos)
Oh. Look. We found c. :) word.find(c, pos) returns the index of the first occurence of c in word, starting at pos. At the beginning, this means from the start of the string => the first occurence of c. But with this call we already update pos. This plus the last line (pos += 1) moves our search-window for the next round to start just behind the previous occurence of c.
for r in md[c]:
Now you see, why we updated mc previously: we can easily iterate over it now (a md[c].split(',') on the old md would do the job as well). So we are doing the replacement now for each of the replacement-characters.
tmp = word[:pos] + r + word[pos+1:]
The actual replacement. We store it in tmp (for debug-reasons). word[:pos] gives us word up to the (current) occurence of c (exclusive c). r is the replacement. word[pos+1:] adds the remaining word (again without c).
newwords.append(tmp)
Our so created new word tmp now goes into our output-list (newwords).
pos += 1
The already mentioned adjustment of pos to "jump over c".
Additional question from OP: Is there an easy way to dictate how many letters in the string I want to replace [(meaning e.g. multiple at a time)]?
Surely. But I have currently only a vague idea on how to achieve this. I am going to look at it, when I got my sleep. ;)
words = ["Words", "Shirt", "Blouse", "Sweater", "multipleeee"]
md = {"e": "q,w", "a": "z"}
md = {k: v.split(',') for k, v in md.items()}
num = 2 # this is the number of replaces at a time.
newwords = []
for word in words:
newwords.append(word)
for char in md:
for r in md[char]:
pos = multiples = 0
current_word = word
while current_word.find(char, pos) != -1:
pos = current_word.find(char, pos)
current_word = current_word[:pos] + r + current_word[pos+1:]
pos += 1
multiples += 1
if multiples == num:
newwords.append(current_word)
multiples = 0
current_word = word
Content of newwords:
['Words', 'Shirt', 'Blouse', 'Sweater', 'Swqatqr', 'Swwatwr', 'multipleeee', 'multiplqqee', 'multipleeqq', 'multiplwwee', 'multipleeww']
Prettyprint:
Words
Shirt
Blouse
Sweater
Swqatqr
Swwatwr
multipleeee
multiplqqee
multipleeqq
multiplwwee
multipleeww
I added multipleeee to demonstrate, how the replacement works: For num = 2 it means the first two occurences are replaced, after them, the next two. So there is no intersection of the replaced parts. If you would want to have something like ['multiplqqee', 'multipleqqe', 'multipleeqq'], you would have to store the position of the "first" occurence of char. You can then restore pos to that position in the if multiples == num:-block.
If you got further questions, feel free to ask. :)
Because you need to replace letters one at a time, this doesn't sound like a good problem to solve with pandas, since pandas is about doing everything at once (vectorized operations). I would dump out your DataFrame into a plain old list and use list operations:
words = DF.to_dict()["Words"].values()
for find, replace in reversed(sorted(mydict.items())):
for word in words:
occurences = word.count(find)
if not occurences:
print word
continue
start_index = 0
for i in range(occurences):
for replace_char in replace.split(","):
modified_word = list(word)
index = modified_word.index(find, start_index)
modified_word[index] = replace_char
modified_word = "".join(modified_word)
print modified_word
start_index = index + 1
Which gives:
Words
Shirt
Blousq
Blousw
Swqater
Swwater
Sweatqr
Sweatwr
Words
Shirt
Blouse
Swezter
Instead of printing the words, you can append them to a list and re-create a DataFrame if that's what you want to end up with.
If you are looping, you need to update s at each cycle of the loop. You also need to loop over v.
mydict = {"e":"q,w"}
s=deduped
for k, v in mydict.items():
for j in v:
s = s.replace(k, j)
Then reassign it to your dataframe:
df["Words"] = s
If you can write this as a function that takes in a 1d array (list, numpy array etc...), you can use df.apply to apply it to any column, using df.apply().

MATLAB generate combination from a string

I've a string like this "FBECGHD" and i need to use MATLAB and generate all the required possible permutations? In there a specific MATLAB function that does this task or should I define a custom MATLAB function that perform this task?
Use the perms function. A string in matlab is a list of characters, so it will permute them:
A = 'FBECGHD';
perms(A)
You can also store the output (e.g. P = perms(A)), and, if A is an N-character string, P is a N!-by-N array, where each row corresponds to a permutation.
If you are interested in unique permutations, you can use:
unique(perms(A), 'rows')
to remove duplicates (otherwise something like 'ABB' would give 6 results, instead of the 3 that you might expect).
As Richante answered, P = perms(A) is very handy for this. You may also notice that P is of type char and it's not convenient to subset/select individual permutation. Below worked for me:
str = 'FBECGHD';
A = perms(str);
B = cellstr(reshape(A,7,[])');
C = unique(B);
It also appears that unique(A, 'rows') is not removing duplicate values:
>> A=[11, 11];
>> unique(A, 'rows')
ans =
11 11
However, unique(A) would:
>> unique(A)
ans =
11
I am not a matlab pro by any means and I didn't investigate this exhaustively but at least in some cases it appears that reshape is not what you want. Notice that below gives 999 and 191 as permutations of 199 which isn't true. The reshape function as written appears to operate "column-wise" on A:
>> str = '199';
A = perms(str);
B = cellstr(reshape(A,3,[])');
C = unique(B);
>> C
C =
'191'
'199'
'911'
'919'
'999'
Below does not produce 999 or 191:
B = {};
index = 1;
while true
try
substring = A(index,:);
B{index}=substring;
index = index + 1;
catch
break
end
end
C = unique(B)
C =
'199' '919' '991'

Resources