Why is the last element of the List not sorting as i'd expect [duplicate] - python-3.x

Not looking for a work around. Looking to understand why Python sorts this way.
>>> a = ['aaa','Bbb']
>>> a.sort()
>>> print(a)
['Bbb', 'aaa']
>>> a = ['aaa','bbb']
>>> a.sort()
>>> print(a)
['aaa', 'bbb']

This is because upper case chars have an ASCII value lower than that of lower case. And hence if we sort them in increasing order, the upper case will come before the lower case
ASCII of A is 65
ASCII of a is 97
65<97
And hence A < a if you sort in increasing order

str is sorted based on the raw byte values (Python 2) or Unicode ordinal values (Python 3); in ASCII and Unicode, all capital letters have lower values than all lowercase letters, so they sort before them:
>>> ord('A'), ord('Z')
(65, 90)
>>> ord('a'), ord('z')
(97, 112)
Some locales (e.g. en_US) will change this sort ordering; if you pass locale.strxfrm as the key function, you'll get case-insensitive sorts on those locales, e.g.
>>> import locale
>>> locale.setlocale(locale.LC_COLLATE, 'en_US.utf-8')
>>> a.sort(key=locale.strxfrm)
>>> a
['aaa', 'Bbb']

Python treats uppercase letters as lower than lowercase letters. If you want to sort ignoring the case sensitivity. You can do something like this:
a = ['aaa','Bbb']
a.sort(key=str.lower)
print(a)
Outputs:
['aaa', 'Bbb']
Which ignores the case sensitivity. The key parameter "str.lower" is what allows you to do this. The following documentation should help. https://docs.python.org/3/howto/sorting.html

Related

How to separate words with digits without changing the order of the input elements in python

I have a string of words and digits divided by comma. I need a program that separates words with digits without changing the order of the input elements. The output list (words and digits) should be separated by a pipe ('|').
e.g INPUT SAMPLE:
'8,33,21,0,16,50,37,0,melon,7,apricot,peach,pineapple,17,21,24,13,14,43,41'
OUTPUT SAMPLE:
'melon,apricot,peach,pineapple|8,33,21,0,16,50,37,0,7,17,21,24,13,14,43,41'
Then if the input elements are all digits or words, the pipe ('|') should be omitted.
e.g INPUT SAMPLE
'23,40,2,8'
OUTPUT SAMPLE
'23,40,2,8'
I have a clue of how to do it in C# however, I want a python script and I'm not familiar with python language
Basically, the logic would be that you iterate over the input and make two lists, one for the strings and one for the digits. Now, for every string you encounter, you check if the string only has digits, if so, you append it to the digit list, otherwise you append it to a string list. Since python iterates over list elements sequentially, the order will be preserved. At the end, you simply join them together with a pipe.
all_elements = input().split(",")
strings, digits = [], []
for element in all_elements:
digits.append(element) if element.isdigit() else strings.append(element)
print("|".join([",".join(strings) + ",".join(digits)]))
Do keep in mind that although you could simply use a try/except block to get the job done, Refrain from using try/except blocks if you're going to be running into an exception more than 50% of the time, as a rule of thumb.
below code should do the trick.. Basically create two lists which have number and strings in and join them at last..
In [21]: num_list = []
In [22]: string_list = []
In [23]: input_string = '8,33,21,0,16,50,37,0,melon,7,apricot,peach,pineapple,17,21,24,13,14,43,41'
In [24]: for elem in input_string.split(','):
...: try:
...: int(elem)
...: except:
...: string_list.append(elem)
...: else:
...: num_list.append(elem)
...:
In [25]: string_list
Out[25]: ['melon', 'apricot', 'peach', 'pineapple']
In [26]: num_list
Out[26]:
['8',
'33',
'21',
'0',
'16',
'50',
'37',
'0',
'7',
'17',
'21',
'24',
'13',
'14',
'43',
'41']
In [28]: '|'.join([','.join(string_list), ','.join(num_list)])
Out[28]: 'melon,apricot,peach,pineapple|8,33,21,0,16,50,37,0,7,17,21,24,13,14,43,41'

Why does Python list comprehension print a list of "None"s in the end?

From Python 3.6 prompt:
>>> [print(i) for i in range(3)]
0
1
2
[None, None, None]
A list comprehension creates a list containing the results of evaluating the expression for each iteration. As well as the "side-effect" of printing out whatever you give it, the print function returns None, so that is what gets stored in the list.
Since you're in an interactive console, the return value is printed out at the end. If you ran this as a script, the output wouldn't include [None, None, None].
If you're wanting to loop over 0, 1, 2 and print them out, I'm afraid you have to do it as a proper for-loop:
for i in range(3)
print(i)
Though I fully empathise with the desire to use the neat one-line syntax of a list comprehension! 😞
As other people have commented, you can achieve the same effect by constructing the list (using a list comprehension) and then printing that out:
print(*[i for i in range(3)], sep="\n")
You need the sep argument if you want them on a new line each; without it they'll be on one line, each separated by a space.
When you put the print statement there, it prints each of the value, then prints the list which has no values in it.
>>> this = print([(i) for i in range(3)])
[0, 1, 2]
>>> print(this)
None
>>>
This should show you the problem. What you want to use is
>>> [(i) for i in range(3)]
[0, 1, 2]

Which character comes first?

So the input is word and I want to know if a or b comes first.
I can use a_index = word.find('a') and compare this to b_index = word.find('b') and if a is first, a is first is returned. But if b isn't in word, .find() will return -1, so simply comparing b_index < a_index would return b is first. This could be accomplished by adding more if-statements, but is there a cleaner way?
function description:
input: word, [list of characters]
output: the character in the list that appears first in the word
Example: first_instance("butterfly", ['a', 'u', 'e'] returns u
You can create a function that takes word and a list of chars - convert those chars into a set for fast lookup and looping over word take the first letter found, eg:
# Chars can be any iterable whose elements are characters
def first_of(word, chars):
# Remove duplicates and get O(1) lookup time
lookup = set(chars)
# Use optional default argument to next to return `None` if no matches found
return next((ch for ch in word if ch in lookup), None)
Example:
>>> first_of('bob', 'a')
>>> first_of('bob', 'b')
'b'
>>> first_of('abob', 'ab')
'a'
>>> first_of("butterfly", ['a', 'u', 'e'])
'u'
This way you're only ever iterating over word once and short-circuit on the first letter found instead of running multiple finds, storing the results and then computing the lowest index.
Make a list without the missing chars and then sort it by positions.
def first_found(word, chars):
places = [x for x in ((word.find(c), c) for c in chars) if x[0] != -1]
if not places:
# no char was found
return None
else:
return min(places)[1]
In any case you need to check the type of the input:
if isinstance(your_input, str):
a_index = your_input.find('a')
b_index = your_input.find('b')
# Compare the a and b indexes
elif isinstance(your_input, list):
a_index = your_input.index('a')
b_index = your_input.index('b')
# Compare the a and b indexes
else:
# Do something else
EDIT:
def first_instance(word, lst):
indexes = {}
for c in lst:
if c not in indexes:
indexes[c] = word.find(c)
else:
pass
return min(indexes, key=indexes.get)
It will return the character from list lst which comes first in the word.
If you need to return the index of this letter then replace the return statement with this:
return min_value = indexes[min(indexes, key=indexes.get)]

numpy ndarray throw exception when truncating string

I'm having an ndarray of ascii strings of different lengths. Until now I used dtype=object for this. However profiling showed that this is actually a bottleneck in my program. Using dtype=np.string_ is faster, however it has the downside that it silently truncates the set values. Since this is a perfect recipe for hard to find bugs I wonder if there is a possibility to either rescale (I know that this could be costly in case of entire reallocation) the array or raise an Exception in case of truncation?
I couldn't change ndarray.__setitem__ since its an read-only attribute. Here is some code to demonstrate what I mean:
import numpy as np
def Foo(vec):
vec[1] = 'FAIL'
print('{:6s}: {}'.format(str(vec.dtype), vec))
VALUES = ['OK', 'OK', 'OK']
Foo(np.array(VALUES, dtype=object)) # Slow but it works
Foo(np.array(VALUES, dtype=np.string_)) # Fast but may fail silently
Resulting in:
object: ['OK' 'FAIL' 'OK']
|S2 : [b'OK' b'FA' b'OK']
Let's see I can explain what's going on
In [32]: ll=['one','two','three']
In [33]: a1=np.array(ll,dtype=object)
In [34]: a1
Out[34]: array(['one', 'two', 'three'], dtype=object)
In [35]: a1[1]='eleven'
In [36]: a1
Out[36]: array(['one', 'eleven', 'three'], dtype=object)
a1 just like ll consists of pointers - pointers to strings that reside elsewhere in memory. I can change any of those pointers, just as I could in a list. In most ways a1 behaves just like a list - except that it is possible to reshape, and do some other basic array things.
In [37]: a1.reshape(3,1)
Out[37]:
array([['one'],
['eleven'],
['three']], dtype=object)
But if I make a string array
In [38]: a2=np.array(ll)
In [39]: a2
Out[39]:
array(['one', 'two', 'three'],
dtype='<U5')
In [42]: a1.itemsize
Out[42]: 4
In [43]: a2.itemsize
Out[43]: 20
the values are stored in the array's data buffer. Here it made an array with 5 unicode characters per element (Python3) (5*4 bytes each).
Now if I replace an element of a2 I can get truncation
In [44]: a2[1]='eleven'
In [45]: a2
Out[45]:
array(['one', 'eleve', 'three'],
dtype='<U5')
because only 5 of the characters of the new value fit in the allocated space.
So there's a trade off - faster access speed because the bytes are stored in a fixed, known size array, but you can't store bigger things.
You could allocate more space per element:
In [46]: a3=np.array(ll,dtype='|U10')
In [47]: a3
Out[47]:
array(['one', 'two', 'three'],
dtype='<U10')
In [48]: a3[1]='eleven'
In [49]: a3
Out[49]:
array(['one', 'eleven', 'three'],
dtype='<U10')
genfromtxt is a common tool for creating arrays with string dtypes. That waits until it has read all the file before setting the string length (at least if using dtype=None). And the string fields are often part of a multi-field structured array. The string fields are usually labels or ids, not something you commonly change.
I can imagine writing a function that would check string length against the dtype and raise an error if the truncation would happen. But that's going to slow down the action.
def foo(A, i, astr):
if A.itemsize/4<len(astr):
raise ValueError('too long str')
A[i] = astr
In [69]: foo(a2,1,'four')
In [70]: a2
Out[70]:
array(['one', 'four', 'three'],
dtype='<U5')
In [72]: foo(a2,1,'eleven')
...
ValueError: too long str
but is it worth the extra work?
I found a non-flexible solution by inheriting from ndarray. I will not accept this answer till Friday maybe somebody comes up with something better. It fulfills its duties, even on views (eg. StringArray(...)[1:4])
import numpy as np
class StringArray(np.ndarray):
def __new__(cls, val):
field_length = max(map(len, val))
# Could also be <U for unicode
vec = super().__new__(cls, len(val), dtype='|S' + str(field_length))
vec[:] = val[:]
return vec
def __setitem__(self, key, val):
if isinstance(val, (list, tuple, nd.array)):
if max(map(len, val)) > self.dtype.itemsize:
raise ValueError('Itemsize too big')
elif isinstance(val, str):
if len(val) > self.dtype.itemsize:
raise ValueError('Itemsize too big')
else:
raise ValueError('Unknown type')
super().__setitem__(key, val)
val = StringArray(['a', 'ab', 'abc'])
print(val)
val[0] = 'xy'
print(val)
try:
val[0] = 'xyze'
except ValueError:
print('Catch')
try:
val[1:2] = ['xyze', 'sd']
except ValueError:
print('Catch')
produces:
[b'a' b'ab' b'abc']
[b'xy' b'ab' b'abc']
Catch
Catch

How to turn a string lists into a lists?

There are other threads about turning strings inside a lists into different data types. I want to turn a string that is in the form of a lists into a lists. Like this: "[5,1,4,1]" = [5,1,4,1]
I need this because I am writing a program that requires the user to input a lists
Example of problem:
>>> x = input()
[3,4,1,5]
>>> x
'[3,4,1,5]'
>>> type(x)
<class 'str'>
If you mean evaluate python objects like this:
x = eval('[3,4,1,5]');
print (x);
print(type(x) is list)
[3, 4, 1, 5]
True
Use this with caution as it can execute anything user will input. Better use a parser to get native lists. Use JSON for input and parse it.
Use eval() for your purpose. eval() is used for converting code within a string to real code:
>>> mystring = '[3, 5, 1, 2, 3]'
>>> mylist = eval(mystring)
>>> mylist
[3, 5, 1, 2, 3]
>>> mystring = '{4: "hello", 2:"bye"}'
>>> eval(mystring)[4]
'hello'
>>>
Use exec() to actually run functions:
>>> while True:
... inp = raw_input('Enter your input: ')
... exec(inp)
...
Enter your input: print 'hello'
hello
Enter your input: x = 1
Enter your input: print x
1
Enter your input: import math
Enter your input: print math.sqrt(4)
2.0
In your scenario:
>>> x = input()
[3,4,1,5]
>>> x = eval(x)
>>> x
[3, 4, 1, 5]
>>> type(x)
<type 'list'>
>>>
Thanks for your input guys, but I would prefer not to eval() because it is unsafe.
Someone actually posted the answer that allowed me to solve this but then they deleted it. I am going to reposts that answer:
values = input("Enter values as lists here")
l1 = json.loads(values)
You can use ast.literal_eval for this purpose.
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, bytes, numbers, tuples,
lists, dicts, sets, booleans, and None.
This can be used for safely evaluating strings containing Python
expressions from untrusted sources without the need to parse the
values oneself.
>>> import ast
>>> val = ast.literal_eval('[1,2,3]')
>>> val
[1, 2, 3]
Just remember to check that it's actually a list:
>>> isinstance(val, list)
True

Resources