How get all combinations at python with repeat - python-3.x

Code example
from itertools import *
from collections import Counter
from tqdm import *
#for i in tqdm(Iterable):
for i in combinations_with_replacement(['1','2','3','4','5','6','7','8'], 8):
b = (''.join(i))
if b == '72637721':
print (b)
when i try profuct i have
for i in product(['1','2','3','4','5','7','6','8'], 8):
TypeError: 'int' object is not iterable
How can i get all combinations ? ( i was belive it before not test , so now all what i was do wrong)
i was read about combinations_with_replacement return all , but how i see it's lie
i use python 3.8
Out put for ask
11111111 11111112 11111113 11111114 11111115 11111116 11111117
11111118 11111122 11111123 11111124 11111125 11111126 11111127
11111128 11111133 11111134 11111135 11111136 11111137 11111138
11111144 11111145 11111146 11111147 11111148 11111155 11111156
11111157 11111158 11111166 11111167 11111168 11111177 11111178
11111188 11111222 11111223 11111224 11111225 11111226 11111227
11111228 11111233 11111234 11111235 11111236 11111237 11111238
11111244 11111245 11111246 11111247 11111248 11111255 11111256
11111257 11111258 11111266 11111267 11111268 11111277 11111278
11111288
what it start give at end
56666888 56668888 56688888 56888888 58888888 77777777 77777776
77777778 77777766 77777768 77777788 77777666 77777668 77777688
77777888 77776666 77776668 77776688 77776888 77778888 77766666
77766668 77766688 77766888 77768888 77788888 77666666 77666668
77666688 77666888 77668888 77688888 77888888 76666666 76666668
76666688 76666888 76668888 76688888 76888888 78888888 66666666
66666668 66666688 66666888 66668888 66688888 66888888 68888888
88888888
more cleare think it how it be count from 1111 1111 to 8888 8888 ( but for characters , so this why i use try do it at permutation/combine with repitions...
it miss some possible combinations of that symbols.
as example what i try do , make all permutatuion of possible variants of hex numbers , like from 0 to F , but make it not only for them , make this possible for any charater.
this only at example ['1','2','3','4','5','6','7','8']
this can be ['a','b','x','c','d','g','r','8'] etc.

solition is use itertools.product instead combinations_with_replacement
from itertools import *
for i in product(['1','2','3','4','5','6','7','8'],repeat = 8):
b = (''.join(i))
if b == '72637721':
print (b)
:
itertools.product ('ABCD', 'ABCD') AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD # full multiplication with duplicates and mirrored pairs
itertools.permutations ('ABCD', 2) -> AB AC AD BA BC BD CA CB CD DA DB DC # full multiplication without duplicates and mirrored pairs
itertools.combinations_with_replacement ('ABCD', 2) -> AA AB AC AD BB BC BD CC CD DD # no mirror pairs with duplicates
itertools.combinations ('ABCD', 2) -> AB AC AD BC BD CD # no mirrored pairs and no duplicates

Here's the updated code that will print you all the combinations. It does not matter if your list has strings and numbers.
To ensure that you are doing a combination only for the specific number of elements, I recommend that you do:
comb_list = [1, 2, 3, 'a']
comb_len = len(comb_list)
and replace the line with:
comb = combinations_with_replacement(comb_list, comb_len)
from itertools import combinations_with_replacement
comb = combinations_with_replacement([1, 2, 3, 'a'], 4)
for i in list(comb):
print (''.join([str(j) for j in i]))
This will result as follows:
1111
1112
1113
111a
1122
1123
112a
1133
113a
11aa
1222
1223
122a
1233
123a
12aa
1333
133a
13aa
1aaa
2222
2223
222a
2233
223a
22aa
2333
233a
23aa
2aaa
3333
333a
33aa
3aaa
aaaa
I don't know what you are trying to do. Here's an attempt to start a dialogue to get to the final answer:
samples = [1,2,3,4,5,'a','b']
len_samples = len(samples)
for elem in samples:
print (str(elem)*len_samples)
The output of this will be as follows:
1111111
2222222
3333333
4444444
5555555
aaaaaaa
bbbbbbb
Is this what you want? If not, explain your question section what you expect as an output.

Related

Python : Split string every three words in dataframe

I've been searching around for a while now, but I can't seem to find the answer to this small problem.
I have this code that is supposed to split the string after every three words:
import pandas as pd
import numpy as np
df1 = {
'State':['Arizona AZ asdf hello abc','Georgia GG asdfg hello def','Newyork NY asdfg hello ghi','Indiana IN asdfg hello jkl','Florida FL ASDFG hello mno']}
df1 = pd.DataFrame(df1,columns=['State'])
df1
def splitTextToTriplet(df):
text = df['State'].str.split()
n = 3
grouped_words = [' '.join(str(text[i:i+n]) for i in range(0,len(text),n))]
return grouped_words
splitTextToTriplet(df1)
Currently the output is as such:
['0 [Arizona, AZ, asdf, hello, abc]\n1 [Georgia, GG, asdfg, hello, def]\nName: State, dtype: object 2 [Newyork, NY, asdfg, hello, ghi]\n3 [Indiana, IN, asdfg, hello, jkl]\nName: State, dtype: object 4 [Florida, FL, ASDFG, hello, mno]\nName: State, dtype: object']
But I am actually expecting this output in 5 rows, one column on dataframe:
['Arizona AZ asdf', 'hello abc']
['Georgia GG asdfg', 'hello def']
['Newyork NY asdfg', 'hello ghi']
['Indiana IN asdfg', 'hello jkl']
['Florida FL ASDFG', 'hello mno']
how can I change the regex so it produces the expected output?
For efficiency, you can use a regex and str.extractall + groupby/agg:
(df1['State']
.str.extractall(r'((?:\w+\b\s*){1,3})')[0]
.groupby(level=0).agg(list)
)
output:
0 [Arizona AZ asdf , hello abc]
1 [Georgia GG asdfg , hello def]
2 [Newyork NY asdfg , hello ghi]
3 [Indiana IN asdfg , hello jkl]
4 [Florida FL ASDFG , hello mno]
regex:
( # start capturing
(?:\w+\b\s*) # words
{1,3} # the maximum, up to three
) # end capturing
You can do:
def splitTextToTriplet(row):
text = row['State'].split()
n = 3
grouped_words = [' '.join(text[i:i+n]) for i in range(0,len(text),n)]
return grouped_words
df1.apply(lambda row: splitTextToTriplet(row), axis=1)
which gives as output the following Dataframe:
0
0
['Arizona AZ asdf', 'hello abc']
1
['Georgia GG asdfg', 'hello def']
2
['Newyork NY asdfg', 'hello ghi']
3
['Indiana IN asdfg', 'hello jkl']
4
['Florida FL ASDFG', 'hello mno']

How to make the escape for variable whose value is already a string?

Create a sample.csv for the discussion.
cat > sample.csv <<EOF
class;grade
tom:class(3+2);80
tom:class(2+2);90
marry:class(3+2);85
marry:class(2+2);70
EOF
Show the data in sample.csv.
cat sample.csv
class;grade
tom:class(3+2);80
tom:class(2+2);90
marry:class(3+2);85
marry:class(2+2);70
Let's read it with pandas:
import pandas as pd
df = pd.read_csv('sample.csv',sep=';')
df
class grade
0 tom:class(3+2) 80
1 tom:class(2+2) 90
2 marry:class(3+2) 85
3 marry:class(2+2) 70
Now i want to select all such records whose field class contains string class(3+2) as below:
tom:class(3+2) 80
marry:class(3+2) 85
Get it this way:
classname = 'class\(3\+2\)'
df[df['class'].str.contains(pat=classname]
class grade
0 tom:class(3+2) 80
2 marry:class(3+2) 85
The difficult thing is that classname is already assigned value as class(3+2),
classname='class(3+2)'
df[df['class'].str.contains(pat=classname)]
The above code can't work now,how to make the escape for variable classname whose value is already a string class(3+2) ?
Note:you can't write classname = 'class\(3\+2\)' ,its value is classname='class(3+2)'.
Turn regex to False
classname='class(3+2)' # this is regex () , we need turn it off just match the string
df[df['class'].str.contains(pat=classname, regex=False)]
Out[166]:
class grade
0 tom:class(3+2) 80
2 marry:class(3+2) 85
If you insist on using regex for the search, you need to escape the + as well, and use a raw string, like so:
classname = r'class\(3\+2\)'

Create multiple possible email addresses based on names in Python

Given a dataframe as follows:
firstname lastname email_address \
0 Doug Watson douglas.watson#dignityhealth.org
1 Nick Holekamp nick.holekamp#rankenjordan.org
2 Rob Schreiner rob.schriener#wellstar.org
3 Austin Phillips austin.phillips#precmed.com
4 Elise Geiger egeiger#puracap.com
5 Paul Urick purick#diplomatpharmacy.com
6 Michael Obringer michael.obringer#lashgroup.com
7 Craig Heneghan cheneghan#west-ward.com
8 Kathy Hirst kathleen.hirst#sunovion.com
9 Stefan Bluemmers stefan.bluemmers#grunenthal.com
companyname
0 Dignity Health
1 Ranken Jordan Pediatric Bridge Hospital
2 WellStar Health System
3 Precision Medical Products, Inc.
4 puracap.com
5 Diplomat Specialty Pharmacy
6 Lash Group
7 West-Ward Pharmaceuticals
8 Sunovion Pharmaceuticals
9 GrĂ¼nenthal Group
How could I create possible email addresses using common email patterns as such: firstlast#example.com, first.last#example.com, f.last#example.com, lastF#example.com, first_last#example.com, firstL#example.com, etc.
df['email1'] = df.firstname.str.lower() + '.' + df.lastname.str.lower() + '#' + df.companyname.str.replace('\s+', '').str.lower() + '.com'
print(df['email1'])
Out:
0 doug.watson#dignityhealth.com
1 nick.holekamp#rankenjordanpediatricbridgehospi... --->problematic
2 rob.schreiner#wellstarhealthsystem.com
3 austin.phillips#precisionmedicalproducts,inc..com --->problematic
4 elise.geiger#puracap.com.com --->problematic
...
9995 terry.hanley#kempersportsmanagement.com
9996 christine.marks#geocomp.com
9997 darryl.rickner#doe.com
9998 lalit.sharma#lovelylifestyle.com
9999 parul.dutt#infibeam.com
Some of them seems quite problematic, anyone could help to solve this issue? Thanks a lot.
EDITED:
print(df) after applying #Sajith Herath's solution:
Out:
firstname lastname companyname \
0 Nick Holekamp Ranken ...
email
0 nick. ...
You can use a method to create permutations of username with different separators and define a max length that simplify the domain using company name as follows
import pandas as pd
import random
data = {"firstname":["Nick"],"lastname":["Holekamp"],"companyname":["Ranken \
Jordan Pediatric Bridge Hospital"]}
df = pd.DataFrame(data=data)
max_char = 5
emails = []
def simplify_domain(text):
if len(text)>max_char:
text = ''.join([c for c in text if c.isupper()])
return text.lower()
return text.replace("\s+","").lower()
def username_permutations(first_name,last_name):
# define separators
separators = [".", "_", "-"]
#lower case
combinations = list(map(lambda x:f"{first_name.lower()}{x} \
{last_name.lower()}",separators))
#append a random number to tail
n = random.randint(1, 100)
combinations.extend(list(map(lambda x:f"{x}{n}",combinations)))
return combinations
for index,row in df.iterrows():
usernames = username_permutations(row["firstname"],row["lastname"])
email_permutations = list(map(lambda x: f" \
{x}#{simplify_domain(row['companyname'])}.com",usernames))
emails.append(','.join(email_permutations))
df["email"] = emails
Final result will be nick.holekamp#rjpbh.com,nick_holekamp#rjpbh.com,nick-holekamp#rjpbh.com,nick.holekamp66#rjpbh.com,nick_holekamp66#rjpbh.com,nick-holekamp66#rjpbh.com
you can modify simplify_domain method to validate given string such as removing inc or .com values

In a comma delimited String, keep all but second part

I have a bunch of addresses:
123 Main Street, PO Box 345, Chicago, IL 92921
1992 Super Way, Bakersfield, CA
234 Wonderland Lane, Attn: Daffy Duck, Orlando, FL 09922
How could I cut out the second string in there, when I do myStr.split(',') on each?
The idea is that I want to return:
123 Main Street, Chicago, IL 92921
1992 Super Way, CA
234 Wonderland Lane, Orlando, FL 09922
I could loop through each part, and build yet another string, skipping the second index, but was wondering if there's a better way to do so.
What I have now:
def filter_address(address):
print("Filtering address on",address)
updated_addr = ""
indx = 0
for section in address.split(","):
if indx != 1:
updated_addr = updated_addr + "," + section
indx += 1
updated_addr = updated_addr[1:] # This is to remove the leading `,`
new_address = filter_address("123 Main Street, Chicago, IL 92921")
You could use del in python and glue back the components of the string with ", " after splitting them.
For example:
address = "123 Main Street, PO Box 345, Chicago, IL 92921".split(",")
del address[1]
pretty_address = ", ".join(address)
print(pretty_address) # Gives 123 Main Street, Chicago, IL 92921

How to split text file like this in python?

N-Heptane 100.20
Hexane 86.17
Hydrochloric Acid 36.47
Hydrogen, H2 2.016
Hydrogen Chloride 36.461
Hydrogen Sulfide 34.076
Hydroxyl, OH 17.01
Krypton 83.80
Methane, CH4 16.044
Methyl Alcohol 32.04
Methyl Butane 72.15
Methyl Chloride 50.488
Natural Gas 19.00
Neon, Ne 20.179
Nitric Oxide, NO 30.006
Nitrogen, N2 28.0134
Nitrous Oxide, NO2 44.012
N-Octane 114.22
Oxygen, O2 31.9988
Ozone 47.998
N-Pentane 72.15
Iso-Pentane 72.15
Propane, C3H8 44.097
Propylene 42.08
the text content like this, i'd like to split the string in Molecular Formula and Molecular weight
e.g
{"Hydrogen, H2":2.016, "Hydrogen Chloride":36.461, etc........}
You simply iterate over each row and use rsplit to retrieve last white-space separated value as your dictionary value. Rest of line goes to it as a key.
d = {}
with open(filename) as f:
for line in f:
key, value = line.rsplit(None, 1)
d[key] = float(value)

Resources