If Python dictionaries are ordered, why can't I index them? - python-3.x

The question says it all. Python dictionaries are insertion ordered since Python 3.6 (in the CPython implementation). It is a language feature in 3.7. I can even d.popitem(). Why can't I index them, i.e. d[3]? Or can I?

Related

How did numpy add the # operator?

How did they do it? Can I also add my own new operators to Python 3? I searched on google but I did not find any information on this.
No, you can't add your own. The numpy team cooperated with the core Python team to add # to the core language, It's in the core Python docs (for example, in the operator precedence table), although core Python doesn't use it for anything in the standard CPython distribution. The core distribution nevertheless recognizes the operator symbol, and generates an appropriate BINARY_MATRIX_MULTIPLY opcode for it:
>>> import dis
>>> def f(a, b):
... return a # b
>>> dis.dis(f)
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_MATRIX_MULTIPLY
6 RETURN_VALUE
Answering your second question,
Can I also add my own new operators to Python 3?
A similar question with some very interesting answers can be found here, Python: defining my own operators?
Recently in PyCon US 2022, Sebastiaan Zeeff delivered a talk showing how to implement a new operator. He warns that the implementation is purely educational though. However, it turns out you actually can implement a new operator yourself! Locally, of course :). You can find his talk here, and his code repository here. And if you think your operator could enhance Python Language, why not propose a PEP for it?

Python can store different data types in a list or linkedlist

Python can store multiple data types object inside a list while we can not do the same for Java and C++.
What are the additional functions used by python to do that? And from where we can study about the same.
To be fair, you are allowed to do the same in Java. Just you need to have a more generic type.
For example,
In python you can write
array = []
array.append(1)
array.append("hi")
The equivalent code in Java would be:
List<Object> list = new ArrayList<>();
list.add(1);
list.add("hii");
This two pieces of code are functionally equivalent, however in Java we require a cast to the desired type we want when we fetch from the list. In Python, the type is deduced in runtime so we don't have to do any explicit casting.
Python is a fully object oriented language, and also dynamically typed. So most things in Python behave the same way. Maps, Sets, and more complex data structures allow you to mix value types easily. Even key types for collections can be a mix of different data types as long the proper contract is implemented. To learn more about the python type system, look here; https://blog.daftcode.pl/first-steps-with-python-type-system-30e4296722af
I think everything in Python is dealt as an object and while we are storing data in Python, we are basically storing the references of the data. This may be the reason.

How to efficiently write raw bytes to numpy array data in python 3

While migrating some old python 2 code to python 3, I ran into some problems populating structured numpy arrays from bytes objects.
I have a parser that defines a specific dtype for each type of data structure I might encounter. Since, in general, a given data structure may have variable-length or variable-type fields, these have been represented in the numpy array as fields of object dtype (np.object #alternatively np.dtype('O')).
The array is obtained from bytes (or a bytearray) by first populating the fixed-dtype fields. After this, the dtype of any sub-arrays (contained in 'object' fields) can be built using information from the fixed fields that precede it.
Here is a partial example of this process (dealing only with the fixed-dtype fields) that works in python 2. Note that we have a field named 'nSamples', which will presumably tell us the length of the array pointed to by the 'samples' field of the array, which would be interpreted as a numpy array with shape (2,) and dtype sampleDtype:
fancyDtype = np.dtype([('blah', '<u4'),
('bleh', 'S5'),
('nSamples', '<u8'),
('samples', 'O')])
sampleDtype = np.dtype([('sampleId', '<u2'),
('val', '<f4')])
bytesFromFile = bytearray(
b'*\x00\x00\x00hello\x02\x00\x00\x00\x00\x00\x00\x00\xd0\xb5'
b'\x14_\xa1\x7f\x00\x00"\x00\x00\x00\x80?]\x00\x00\x00\xa0#')
arr = np.zeros((1,), dtype=fancyDtype)
numBytesFixedPortion = 17
# Start out by just reading the fixed-type portion of the array
arr.data[:numBytesFixedPortion] = bytesFromFile[:numBytesFixedPortion]
memoryview(arr.data)[:numBytesFixedPortion] = bytesFromFile[:numBytesFixedPortion]
Both of the last two statements here that work in python 2.7.
Of note is that if I type
arr.data
I get <read-write buffer for 0x7f7a93bb7080, size 25, offset 0 at 0x7f7a9339cf70>, which tells me this is a buffer. Obviously, memoryview(arr.data) returns a memoryview object.
Both of these statements raise the following exception in python 3.6:
NotImplementedError: memoryview: unsupported format T{I:blah:5s:bleh:=Q:nSamples:O:samples:}
This tells me that numpy is returning a different type with its data attribute access, a memoryview rather than a buffer. It also tells me that memoryviews worked in python 2.7 but don't in python 3.6 for this purpose.
I found a similar issue in numpy's issue tracker: https://github.com/numpy/numpy/issues/13617
However, the issue was closed quickly, with the numpy developer indicating that it is a bug in ctypes. Since ctypes is a builtin, I kind of gave up hope on just updating it to get a fix.
I did finally stumble upon a solution that works, though it takes roughly twice as long as the python 2.7 method. It is:
import struct
struct.pack_into(
'B' * numBytesFixedPortion, # fmt
arr.data, # buffer
0, # offset
*buf[:numBytesFixedPortion] # unpacked byte values
)
A coworker also suggested attempting to use this solution:
arrView = arr.view('u1')
arrView[:numBytesFixedPortion] = buf[:numBytesFixedPortion]
However, on doing this, I get the exception:
File "/home/tintedFrantic/anaconda2/envs/py3/lib/python3.6/site-packages/numpy/core/_internal.py", line 461, in _view_is_safe
raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.
Note that I get this exception in both python 2.7 and 3.6. It appears numpy disallows views on arrays with any object fields. (Aside: I was able to get numpy to do this correctly by commenting out the check for object-type fields in the numpy code, though that seems a dangerous solution (and not a very portable one either)).
I've also tried creating separate arrays, one with the fixed-dtype fields and one with the object-dtype field and then using numpy.lib.recfunctions.merge_arrays to merge them. That fails with a cryptic message that I can't remember.
I am at a bit of a loss. I just want to write some arbitrary bytes to the numpy array's underlying memory and do it efficiently. This doesn't seem like it should be too hard to do, but I haven't come across a good way to do it. I would like a solution that isn't a hack either, as this is going into systems that need high reliability. If nothing better exists, I will use the struct.pack_into() solution, but I am hoping someone out there knows a better way. By the way, NOT using object-dtype fields is NOT a viable option, as the cost of doing so would be prohibitive.
If it matters, I am using numpy 1.16.2 in python 2.7 and 1.17.4 for python 3.6.
Per the suggestion of #nawsleahcimnoraa, I found out that in python 3.3+ (so not in python 2.7), the memoryview object, which is returned by arr.data in my python 3 environment, has a cast() method. Thus, I can do
arr.data.cast('B')[startIdx:endIdx] = buf[:numBytes]
This is much more like what I had in python 2.7. It is a lot more concise and also performs a little better than the struct method above.
One thing I noticed in testing these solutions is that, in general, the python 3 solutions were slower than the python 2 versions. For example, I tried the struct solution both using python 2 and python 3 and found a significant increase in processing time for python 3.
I also found fairly sizable discrepancies between different python environments of the same version. For example, I found that my system install of python 3.6 performed better than a virtual environment install of python 3.6, so it seems that the results will likely depend largely on a given environment's configuration.
Overall, I am happy with the results of using the cast() method of the memoryview object returned by arr.data and will use that for now. However, if someone discovers something that works better, I would still love to hear about it.

Way to use bisect module for sets in python

I was looking for something similar to lower_bound() function for sets in
python, as we have in C++.
Task is to have a ds, which inserts element in sorted manner, storing only single instance of each distinct value, and returns the left neighbor of a given value, both operations in O(logn) worst time in python.
python: something similar to bisect module for lists, with efficient insertion may work.
sets are unordered, and the standard lib does not offer tree structures.
Maybe you could look at sorted containers (3rd party lib): http://www.grantjenks.com/docs/sortedcontainers/ it might offer a good approach to your problem.

k-combinations in Python 3.x

What is the most efficient way to generate k-tuples with all k-combinations from a given python set? Is there an appropriate built-in function? Something tells me it should be possible with a two-line for loop.
P.S. I did conduct a search and found various entries to the topic "combinations from lists, etc in Python", but all proposed solutions seem rather 'un-python'. I am hoping for a mind-blowing, idiomatic python expression.
itertools has all of those types of functions:
import itertools
for combination in itertools.combinations(iterable, k):
print(combination)

Resources