Generate custom alpha numeric sequence - python-3.x

I am trying to generate custom alpha numeric sequence.
The sequence would be like this :
AA0...AA9 AB0...AB9 AC0...AC9..and so on..
In short, there are 3 places to fill..
On the first place, the values can go from A to Z.
On the second place, the values can go from A to Z.
On the last place, the value can go from 0 to 9.
Code :
s= list('AA0')
for i in range(26):
for j in range(26):
for k in range(10):
if k<10:
print(s[0]+s[1]+str(k))
s[1]= chr(ord(s[1])+1)
s[0]= chr(ord(s[0])+1)
I was able to generate sequence till AZ9 and then I am getting below sequence..
it should be BA0...BZ9..
B[0
B[1
B[2
B[3
B[4
B[5
B[6
B[7
B[8
B[9
B\0
B\1
B\2
B\3
B\4
B\5
B\6

this is a way to do just that:
from itertools import product
from string import ascii_uppercase, digits
for a, b, d in product(ascii_uppercase, ascii_uppercase, digits):
print(f'{a}{b}{d}')
string.ascii_uppercase is just 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; string.digits is '0123456789' and itertools.product then iterates over all combinations.
instead of digits you could use range(10) just as well.

You can use itertools.product:
>>> letters = [chr(x) for x in range(ord('A'), ord('Z')+1)]
>>> letters
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
>>> combinations = ["".join(map(str, x)) for x in itertools.product(letters, letters, range(10))]
>>> combinations
['AA0', 'AA1', 'AA2', 'AA3', 'AA4', 'AA5', 'AA6', 'AA7', 'AA8', 'AA9', 'AB0', 'AB1', 'AB2', 'AB3', 'AB4', 'AB5', 'AB6', 'AB7', 'AB8', 'AB9', 'AC0', 'AC1', 'AC2', 'AC3', 'AC4', 'AC5', 'AC6', 'AC7', 'AC8', 'AC9', 'AD0', 'AD1', 'AD2', 'AD3', 'AD4', 'AD5', 'AD6', 'AD7', 'AD8', 'AD9', 'AE0', 'AE1', 'AE2', 'AE3', 'AE4', 'AE5', 'AE6', 'AE7', 'AE8', 'AE9', 'AF0', 'AF1', 'AF2', 'AF3', 'AF4', 'AF5', 'AF6', 'AF7', 'AF8', 'AF9', 'AG0', 'AG1', 'AG2', 'AG3', 'AG4', 'AG5', 'AG6', 'AG7', 'AG8', 'AG9', 'AH0', 'AH1', 'AH2', 'AH3', 'AH4', 'AH5', 'AH6', 'AH7', 'AH8', 'AH9', 'AI0', 'AI1', 'AI2', 'AI3', 'AI4', 'AI5', 'AI6', 'AI7', 'AI8', 'AI9', 'AJ0', 'AJ1', 'AJ2', 'AJ3', 'AJ4', 'AJ5', 'AJ6', 'AJ7', 'AJ8', 'AJ9', 'AK0'...]

Related

Remove redundant sublists within list in python

Hello everyone I have a list of lists values such as :
list_of_values=[['A','B'],['A','B','C'],['D','E'],['A','C'],['I','J','K','L','M'],['J','M']]
and I would like to keep within that list, only the lists where I have the highest amount of values.
For instance in sublist1 : ['A','B'] A and B are also present in the sublist2 ['A','B','C'], so I remove the sublist1.
The same for sublist4.
the sublist6 is also removed because J and M were present in a the longer sublist5.
at the end I should get:
list_of_no_redundant_values=[['A','B','C'],['D','E'],['I','J','K','L','M']]
other exemple =
list_of_values=[['A','B'],['A','B','C'],['B','E'],['A','C'],['I','J','K','L','M'],['J','M']]
expected output :
[['A','B','C'],['B','E'],['I','J','K','L','M']]
Does someone have an idea ?
mylist=[['A','B'],['A','C'],['A','B','C'],['D','E'],['I','J','K','L','M'],['J','M']]
def remove_subsets(lists):
outlists = lists[:]
for s1 in lists:
for s2 in lists:
if set(s1).issubset(set(s2)) and (s1 is not s2):
outlists.remove(s1)
break
return outlists
print(remove_subsets(mylist))
This should result in [['A', 'B', 'C'], ['D', 'E'], ['I', 'J', 'K', 'L', 'M']]

Python program that extracts most frequently found names from .csv file

I have created a program that generates 5000 random names, ssn, city, address, and email and stored them in fakeprofile.csv file. I am trying to extract the most common names from the file. I was able to get the program to work syntactically but fail to extract frequent names.
Here's the code:
import re
import statistics
file_open = open('fakeprofile.csv').read()
frequent_names = re.findall('[A-Z][a-z]*', file_open)
print(frequent_names)
Sample in the file:
Alicia Walters 419-52-4141 Yorkstad 66616 Schultz Extensions Suite 225
Reynoldsmouth, VA 72465 stevenserin#stein.biz
Nicole Duffy 212-38-9009 West Timothy 51077 Phillips Ports Apt. 314
Hubbardville, IN 06723 kaitlinthomas#bennett-carter.com
Stephanie Lewis 442-20-1279 Jacquelineshire 650 Gutierrez Forge Apt. 839
West Christianbury, TN 13654 ukelley#gmail.com
Michael Harris 108-81-3733 East Toddberg 14387 Douglas Mission Suite 038
Garciaview, WI 58624 kshields#yahoo.com
Aaron Moreno 171-30-7715 Port Taraburgh 56672 Wagner Path
Lake Christopher, VA 37884 lucasscott#nguyen.info
Alicia Zimmerman 286-88-9507 Barberstad 5365 Heath Extensions Apt. 731
South Randyburgh, NJ 79367 daniellewebb#yahoo.com
Brittney Mcmillan 334-44-0321 Lisahaven PSC 3856, Box 2428
APO AE 03215 kevin95#hotmail.com
Amanda Perkins 327-31-6610 Perryville 8750 Hurst Harbor Apt. 929
Sample output:
', 'Lake', 'Brianna', 'P', 'A', 'Michael', 'Smith', 'Harveymouth', 'Patricia', 'Tunnel', 'West', 'William', 'G', 'A', 'Charles', 'Perkins', 'Lake', 'Marie', 'Lisa', 'Overpass', 'Suite', 'Kennedymouth', 'C', 'A', 'Barbara', 'Perez', 'Billyshire', 'Joshua', 'Village', 'Cindymouth', 'W', 'I', 'Curtis', 'Simmons', 'North', 'Mitchellport', 'Gordon', 'Crest', 'Suite', 'Jacksonburgh', 'C', 'O', 'Cameron', 'Berg', 'South', 'Dean', 'Christina', 'Coves', 'Williamton', 'T', 'N', 'Maria', 'Williams', 'North', 'Judith', 'Carson', 'Overpass', 'Apt', 'West', 'Amandastad', 'N', 'M', 'Hannah', 'Dennis', 'Rodriguezmouth', 'P', 'S', 'C', 'Box', 'A', 'P', 'O', 'A', 'E', 'Laura', 'Richardson', 'Lake', 'Kayla', 'Johnson', 'Place', 'Suite', 'Port', 'Jennifermouth', 'N', 'H', 'John', 'Lawson', 'Hintonhaven', 'Thomas', 'Via', 'Mossport', 'N', 'J', 'Jennifer', 'Hill', 'East', 'Phillip', 'P', 'S', 'C', 'Box', 'A', 'P', 'O', 'A', 'E', 'Cody', 'Jackson', 'Lake', 'Jessicamouth', 'Snyder', 'Ways', 'Apt', 'New', 'Stacey', 'M', 'E', 'Ryan', 'Friedman', 'Shahburgh', 'Jerry', 'Pike', 'Suite', 'Toddfort', 'N', 'V', 'Kathleen', 'Fox', 'Ferrellmouth', 'P', 'S', 'C', 'Box', 'A', 'P', 'O', 'A', 'P', 'Michael', 'Thompson', 'Port', 'Jessica', 'Boone', 'Spurs', 'Suite', 'Port', 'Ashleyland', 'C', 'O', 'Christopher', 'Marsh', 'North', 'Catherine', 'Scott', 'Trail', 'Apt', 'Baileyburgh', 'F', 'L', 'Richard', 'Rangel', 'New', 'Anna', 'Ray', 'Drive', 'Apt', 'Nunezland', 'I', 'A', 'Connor', 'Stanton', 'Troyshire', 'Rodgers', 'Hill', 'West', 'Annmouth', 'N', 'H', 'James', 'Medina',
My issue here is being unable to extract most frequently found first names as well as avoiding those capital letters. Instead, I have extracted all names (including the unnecessary capital letters) and the one seen above is a small sample of all names extracted. I noticed that the first names are always on the odd rows in the output, and I am trying to capture the most frequent first names in those odd rows.
The fakeprofile.csv file was created by this program:
import csv
import faker
from faker import Faker
fake = Faker()
name = fake.name(); print(name)
ssn = fake.ssn(); print(ssn)
city = fake.city(); print(city)
address = fake.address(); print(address)
email = fake.email(); print(email)
profile = fake.simple_profile()
for i,j in profile.items():
print('{}: {}'.format(i,j))
print('Name: {}, SSN: {}, City: {}, Address: {}, Email: {}'.format(name,ssn,city,address,email))
with open('fakeprofile.csv', 'w') as f:
for i in range(0,5001):
print(f'{fake.name()} {fake.ssn()} {fake.city()} {fake.address()} {fake.email()}', file=f)
Does this achieve what you want?
import collections, re
# Read in all lines into a list
with open('fakeprofile.csv') as f:
lines = f.readlines()
# Throw out every other line
lines = [line for i, line in enumerate(lines) if i%2 == 0]
# Keep only first word of each line
names = [line.split()[0] for line in lines]
# Find most common names
n = 3
frequent_names = collections.Counter(names).most_common(n)
# Display most common names
for name, count in frequent_names:
print(name, count)
To do the counting it uses collections.Counter together with its most_common() method.
I think It would have been better if you use pandas library, for the CSV manipulation (collecting the desire information ), and then apply python collection like counter(df ['name'] ) into it, or else could you give us more information about the CSV file.
thank you
So the main problem you have is that you use a regexp that will capture every letter.
You are interested in the first world in the odd line.
you can do something on those lines:
# either use a dict to count or a list to transform as counter.
dico_count = {}
with open('fakeprofile.csv') as file_open: # use of context manager
line_number = 1
for line in file_open: #iterates all the lines
if line_number % 2 != 0 : # odd line
spt = line.strip().split()
dico_count[spt[0]] = dico_count.get(spt[0], 0) + 1
frequent_name_counter = [(k,v) for k,v in sorted(dico_count.items(), key=lambda x: x[1], reverse=True)]

Any reason not to convert string to list this way?

I'm using PyCharm on Windows (and very new to Python)
I'm a 'what happens when I try this?' person and so I tried:
alist = []
alist += 'wowser'
which returns ['w', 'o', 'w', 's', 'e', 'r']
Is there any reason not to convert a string to a list of individual characters like this? I know I could use For loop method OR I could .append or +concatenate (both seem to be too tedious!!), but I can't find anything that mentions using += to do this. So, since I'm new, I figure I should ask why not to do it this way before I develop a bad habit that will get me into trouble in the future.
Thanks for your help!
I think this would help: Why does += behave unexpectedly on lists?
About the question "Is there any reason not to convert a string to a list of individual characters like this". I think it depends on your purpose. It will be quite convenient if you need to split the letters. If you don't want to split the letters, just don't use it.
String is a type of array so it behaves like an array as lists do.
>>> # This way you would do it with a list:
>>> list('wowser')
['w', 'o', 'w', 's', 'e', 'r']
>>> lst=list('wowser')
>>> a='w'
>>> a is lst[0]
True
>>> # The String Version:
>>> strng = 'wowser'
>>> a is strng[0]
True
>>> # Iterate over the string like doing it with lists:
>>> [print(char) for char in 'wowser']
w
o
w
s
e
r
>>> [print(char) for char in ['w', 'o', 'w', 's', 'e', 'r']]
w
o
w
s
e
r
w3schools.com
docs.python.org

python3 split comma separated string ignoring comma within quotes [duplicate]

I have some input that looks like the following:
A,B,C,"D12121",E,F,G,H,"I9,I8",J,K
The comma-separated values can be in any order. I'd like to split the string on commas; however, in the case where something is inside double quotation marks, I need it to both ignore commas and strip out the quotation marks (if possible). So basically, the output would be this list of strings:
['A', 'B', 'C', 'D12121', 'E', 'F', 'G', 'H', 'I9,I8', 'J', 'K']
I've had a look at some other answers, and I'm thinking a regular expression would be best, but I'm terrible at coming up with them.
Lasse is right; it's a comma separated value file, so you should use the csv module. A brief example:
from csv import reader
# test
infile = ['A,B,C,"D12121",E,F,G,H,"I9,I8",J,K']
# real is probably like
# infile = open('filename', 'r')
# or use 'with open(...) as infile:' and indent the rest
for line in reader(infile):
print line
# for the test input, prints
# ['A', 'B', 'C', 'D12121', 'E', 'F', 'G', 'H', 'I9,I8', 'J', 'K']

Is there any way to force ipython to interpret utf-8 symbols?

I'm using ipython notebook.
What I want to do is search a literal string for any spanish accented letters (ñ,á,é,í,ó,ú,Ñ,Á,É,Í,Ó,Ú) and change them to their closest representation in the english alphabet.
I decided to write down a simple function and give it a go:
def remove_accent(n):
listn = list(n)
for i in range(len(listn)):
if listn[i] == 'ó':
listn[i] =o
return listn
Seemed simple right simply compare if the accented character is there and change it to its closest representation so i went ahead and tested it getting the following output:
in []: remove_accent('whatever !## ó')
out[]: ['w',
'h',
'a',
't',
'e',
'v',
'e',
'r',
' ',
'!',
'#',
'#',
' ',
'\xc3',
'\xb3']
I've tried to change the default encoding from ASCII (I presume since i'm getting two positions for te accented character instead of one '\xc3','\xb3') to UTF-8 but this didnt work. what i would like to get is:
in []: remove_accent('whatever !## ó')
out[]: ['w',
'h',
'a',
't',
'e',
'v',
'e',
'r',
' ',
'!',
'#',
'#',
' ',
'o']
PD: this wouldn't be so bad if the accented character yielded just one position instead of two I would just require to change the if condition but I haven't find a way to do that either.
Your problem is that you are getting two characters for the 'ó' character instead of one. Therefore, try to change it to unicode first so that every character has the same length as follows:
def remove_accent(n):
n_unicode=unicode(n,"UTF-8")
listn = list(n_unicode)
for i in range(len(listn)):
if listn[i] == u'ó':
listn[i] = 'o'.encode('utf-8')
else:
listn[i]=listn[i].encode('utf-8')
return listn

Resources