python3 split comma separated string ignoring comma within quotes [duplicate] - python-3.x

I have some input that looks like the following:
A,B,C,"D12121",E,F,G,H,"I9,I8",J,K
The comma-separated values can be in any order. I'd like to split the string on commas; however, in the case where something is inside double quotation marks, I need it to both ignore commas and strip out the quotation marks (if possible). So basically, the output would be this list of strings:
['A', 'B', 'C', 'D12121', 'E', 'F', 'G', 'H', 'I9,I8', 'J', 'K']
I've had a look at some other answers, and I'm thinking a regular expression would be best, but I'm terrible at coming up with them.

Lasse is right; it's a comma separated value file, so you should use the csv module. A brief example:
from csv import reader
# test
infile = ['A,B,C,"D12121",E,F,G,H,"I9,I8",J,K']
# real is probably like
# infile = open('filename', 'r')
# or use 'with open(...) as infile:' and indent the rest
for line in reader(infile):
print line
# for the test input, prints
# ['A', 'B', 'C', 'D12121', 'E', 'F', 'G', 'H', 'I9,I8', 'J', 'K']

Related

Duplicating one element. of a list

I'm trying to duplicate just one element of a list.
list=['a','b','c','d','e','f']
So let's say I just want to duplicate the letter 'a'.
It would look like this:
list=['a','a','b','c','d','e','f']
Try to use insert:
list=['a','b','c','d','e','f']
list.insert(1,'a')
print(list)
1 : index where you want to insert the element
You can use list.index() to find out where the element is located and then list.insert() to insert it.
For example:
lst=['a','b','c','d','e','f']
def duplicate_element(lst, elem):
lst.insert(lst.index(elem), elem)
duplicate_element(lst, 'a')
print(lst)
Prints:
['a', 'a', 'b', 'c', 'd', 'e', 'f']

Any reason not to convert string to list this way?

I'm using PyCharm on Windows (and very new to Python)
I'm a 'what happens when I try this?' person and so I tried:
alist = []
alist += 'wowser'
which returns ['w', 'o', 'w', 's', 'e', 'r']
Is there any reason not to convert a string to a list of individual characters like this? I know I could use For loop method OR I could .append or +concatenate (both seem to be too tedious!!), but I can't find anything that mentions using += to do this. So, since I'm new, I figure I should ask why not to do it this way before I develop a bad habit that will get me into trouble in the future.
Thanks for your help!
I think this would help: Why does += behave unexpectedly on lists?
About the question "Is there any reason not to convert a string to a list of individual characters like this". I think it depends on your purpose. It will be quite convenient if you need to split the letters. If you don't want to split the letters, just don't use it.
String is a type of array so it behaves like an array as lists do.
>>> # This way you would do it with a list:
>>> list('wowser')
['w', 'o', 'w', 's', 'e', 'r']
>>> lst=list('wowser')
>>> a='w'
>>> a is lst[0]
True
>>> # The String Version:
>>> strng = 'wowser'
>>> a is strng[0]
True
>>> # Iterate over the string like doing it with lists:
>>> [print(char) for char in 'wowser']
w
o
w
s
e
r
>>> [print(char) for char in ['w', 'o', 'w', 's', 'e', 'r']]
w
o
w
s
e
r
w3schools.com
docs.python.org

Generate custom alpha numeric sequence

I am trying to generate custom alpha numeric sequence.
The sequence would be like this :
AA0...AA9 AB0...AB9 AC0...AC9..and so on..
In short, there are 3 places to fill..
On the first place, the values can go from A to Z.
On the second place, the values can go from A to Z.
On the last place, the value can go from 0 to 9.
Code :
s= list('AA0')
for i in range(26):
for j in range(26):
for k in range(10):
if k<10:
print(s[0]+s[1]+str(k))
s[1]= chr(ord(s[1])+1)
s[0]= chr(ord(s[0])+1)
I was able to generate sequence till AZ9 and then I am getting below sequence..
it should be BA0...BZ9..
B[0
B[1
B[2
B[3
B[4
B[5
B[6
B[7
B[8
B[9
B\0
B\1
B\2
B\3
B\4
B\5
B\6
this is a way to do just that:
from itertools import product
from string import ascii_uppercase, digits
for a, b, d in product(ascii_uppercase, ascii_uppercase, digits):
print(f'{a}{b}{d}')
string.ascii_uppercase is just 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; string.digits is '0123456789' and itertools.product then iterates over all combinations.
instead of digits you could use range(10) just as well.
You can use itertools.product:
>>> letters = [chr(x) for x in range(ord('A'), ord('Z')+1)]
>>> letters
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
>>> combinations = ["".join(map(str, x)) for x in itertools.product(letters, letters, range(10))]
>>> combinations
['AA0', 'AA1', 'AA2', 'AA3', 'AA4', 'AA5', 'AA6', 'AA7', 'AA8', 'AA9', 'AB0', 'AB1', 'AB2', 'AB3', 'AB4', 'AB5', 'AB6', 'AB7', 'AB8', 'AB9', 'AC0', 'AC1', 'AC2', 'AC3', 'AC4', 'AC5', 'AC6', 'AC7', 'AC8', 'AC9', 'AD0', 'AD1', 'AD2', 'AD3', 'AD4', 'AD5', 'AD6', 'AD7', 'AD8', 'AD9', 'AE0', 'AE1', 'AE2', 'AE3', 'AE4', 'AE5', 'AE6', 'AE7', 'AE8', 'AE9', 'AF0', 'AF1', 'AF2', 'AF3', 'AF4', 'AF5', 'AF6', 'AF7', 'AF8', 'AF9', 'AG0', 'AG1', 'AG2', 'AG3', 'AG4', 'AG5', 'AG6', 'AG7', 'AG8', 'AG9', 'AH0', 'AH1', 'AH2', 'AH3', 'AH4', 'AH5', 'AH6', 'AH7', 'AH8', 'AH9', 'AI0', 'AI1', 'AI2', 'AI3', 'AI4', 'AI5', 'AI6', 'AI7', 'AI8', 'AI9', 'AJ0', 'AJ1', 'AJ2', 'AJ3', 'AJ4', 'AJ5', 'AJ6', 'AJ7', 'AJ8', 'AJ9', 'AK0'...]

how to allow user input both str and int [duplicate]

This question already has answers here:
converting individual digits to string
(4 answers)
Closed 7 years ago.
I have a function that is supposed to take a user input both string and int such as (555-GOT-FOOD) and print the correct digit for each string character. How can i get my function to take the user input (555-GOT-EATS)
Code:
def num(n):
chars = []
for char in n:
if char.isalpha():
if char.lower() in ['A', 'B', 'C']:
chars.append('2')
elif char.lower() in ['D', 'E', 'F']:
chars.append('3')
elif char.lower() in ['G', 'H', 'I']:
chars.append('4')
elif char.lower() in ['J', 'K', 'L']:
chars.append('5')
elif char.lower() in ['M', 'N', 'O']:
chars.append('6')
elif char.lower() in ['P', 'R', 'S']:
chars.append('7')
elif char.lower() in ['T', 'U', 'V']:
chars.append('8')
else:
chars.append('9')
else:
chars.append(char)
return ''.join(chars)
num (555-"got"-"food")
Either you take in a char array and convert the first three numbers in to an int afterwards. Alternatlively you can take in a string and after pull out the 3numbers and convert to an int.
If you are getting user input, you shouldn't need to worry about the 5's being ints because they aren't ints! Python gets all user input as a string. So they are numbers, yes, but they are of the type string, not int. Think of them as just the symbols representing numbers.
If you modify the end of your code like this, it should work.
elif char.lower() in ['T', 'U', 'V']:
chars.append('8')
elif char.lower() in ['X', 'Y', 'Z']:
chars.append('9')
else:
chars.append(char)
return ''.join(chars)
You will also want to use
n.replace("-", "")
to get rid of all "-", giving you "555GOTFOOD" instead of "555-GOT-FOOD".
Also, I'm not sure if it will work as is, or if you need to change char.lower() to char.upper() since your lists are all upper case letters.

Is there any way to force ipython to interpret utf-8 symbols?

I'm using ipython notebook.
What I want to do is search a literal string for any spanish accented letters (ñ,á,é,í,ó,ú,Ñ,Á,É,Í,Ó,Ú) and change them to their closest representation in the english alphabet.
I decided to write down a simple function and give it a go:
def remove_accent(n):
listn = list(n)
for i in range(len(listn)):
if listn[i] == 'ó':
listn[i] =o
return listn
Seemed simple right simply compare if the accented character is there and change it to its closest representation so i went ahead and tested it getting the following output:
in []: remove_accent('whatever !## ó')
out[]: ['w',
'h',
'a',
't',
'e',
'v',
'e',
'r',
' ',
'!',
'#',
'#',
' ',
'\xc3',
'\xb3']
I've tried to change the default encoding from ASCII (I presume since i'm getting two positions for te accented character instead of one '\xc3','\xb3') to UTF-8 but this didnt work. what i would like to get is:
in []: remove_accent('whatever !## ó')
out[]: ['w',
'h',
'a',
't',
'e',
'v',
'e',
'r',
' ',
'!',
'#',
'#',
' ',
'o']
PD: this wouldn't be so bad if the accented character yielded just one position instead of two I would just require to change the if condition but I haven't find a way to do that either.
Your problem is that you are getting two characters for the 'ó' character instead of one. Therefore, try to change it to unicode first so that every character has the same length as follows:
def remove_accent(n):
n_unicode=unicode(n,"UTF-8")
listn = list(n_unicode)
for i in range(len(listn)):
if listn[i] == u'ó':
listn[i] = 'o'.encode('utf-8')
else:
listn[i]=listn[i].encode('utf-8')
return listn

Resources