bitwise operations in python (or,and, |) [duplicate] - python-3.x

Consider this code:
x = 1 # 0001
x << 2 # Shift left 2 bits: 0100
# Result: 4
x | 2 # Bitwise OR: 0011
# Result: 3
x & 1 # Bitwise AND: 0001
# Result: 1
I can understand the arithmetic operators in Python (and other languages), but I never understood 'bitwise' operators quite well. In the above example (from a Python book), I understand the left-shift but not the other two.
Also, what are bitwise operators actually used for? I'd appreciate some examples.

Bitwise operators are operators that work on multi-bit values, but conceptually one bit at a time.
AND is 1 only if both of its inputs are 1, otherwise it's 0.
OR is 1 if one or both of its inputs are 1, otherwise it's 0.
XOR is 1 only if exactly one of its inputs are 1, otherwise it's 0.
NOT is 1 only if its input is 0, otherwise it's 0.
These can often be best shown as truth tables. Input possibilities are on the top and left, the resultant bit is one of the four (two in the case of NOT since it only has one input) values shown at the intersection of the inputs.
AND | 0 1 OR | 0 1 XOR | 0 1 NOT | 0 1
----+----- ---+---- ----+---- ----+----
0 | 0 0 0 | 0 1 0 | 0 1 | 1 0
1 | 0 1 1 | 1 1 1 | 1 0
One example is if you only want the lower 4 bits of an integer, you AND it with 15 (binary 1111) so:
201: 1100 1001
AND 15: 0000 1111
------------------
IS 9 0000 1001
The zero bits in 15 in that case effectively act as a filter, forcing the bits in the result to be zero as well.
In addition, >> and << are often included as bitwise operators, and they "shift" a value respectively right and left by a certain number of bits, throwing away bits that roll of the end you're shifting towards, and feeding in zero bits at the other end.
So, for example:
1001 0101 >> 2 gives 0010 0101
1111 1111 << 4 gives 1111 0000
Note that the left shift in Python is unusual in that it's not using a fixed width where bits are discarded - while many languages use a fixed width based on the data type, Python simply expands the width to cater for extra bits. In order to get the discarding behaviour in Python, you can follow a left shift with a bitwise and such as in an 8-bit value shifting left four bits:
bits8 = (bits8 << 4) & 255
With that in mind, another example of bitwise operators is if you have two 4-bit values that you want to pack into an 8-bit one, you can use all three of your operators (left-shift, and and or):
packed_val = ((val1 & 15) << 4) | (val2 & 15)
The & 15 operation will make sure that both values only have the lower 4 bits.
The << 4 is a 4-bit shift left to move val1 into the top 4 bits of an 8-bit value.
The | simply combines these two together.
If val1 is 7 and val2 is 4:
val1 val2
==== ====
& 15 (and) xxxx-0111 xxxx-0100 & 15
<< 4 (left) 0111-0000 |
| |
+-------+-------+
|
| (or) 0111-0100

One typical usage:
| is used to set a certain bit to 1
& is used to test or clear a certain bit
Set a bit (where n is the bit number, and 0 is the least significant bit):
unsigned char a |= (1 << n);
Clear a bit:
unsigned char b &= ~(1 << n);
Toggle a bit:
unsigned char c ^= (1 << n);
Test a bit:
unsigned char e = d & (1 << n);
Take the case of your list for example:
x | 2 is used to set bit 1 of x to 1
x & 1 is used to test if bit 0 of x is 1 or 0

what are bitwise operators actually used for? I'd appreciate some examples.
One of the most common uses of bitwise operations is for parsing hexadecimal colours.
For example, here's a Python function that accepts a String like #FF09BE and returns a tuple of its Red, Green and Blue values.
def hexToRgb(value):
# Convert string to hexadecimal number (base 16)
num = (int(value.lstrip("#"), 16))
# Shift 16 bits to the right, and then binary AND to obtain 8 bits representing red
r = ((num >> 16) & 0xFF)
# Shift 8 bits to the right, and then binary AND to obtain 8 bits representing green
g = ((num >> 8) & 0xFF)
# Simply binary AND to obtain 8 bits representing blue
b = (num & 0xFF)
return (r, g, b)
I know that there are more efficient ways to acheive this, but I believe that this is a really concise example illustrating both shifts and bitwise boolean operations.

I think that the second part of the question:
Also, what are bitwise operators actually used for? I'd appreciate some examples.
Has been only partially addressed. These are my two cents on that matter.
Bitwise operations in programming languages play a fundamental role when dealing with a lot of applications. Almost all low-level computing must be done using this kind of operations.
In all applications that need to send data between two nodes, such as:
computer networks;
telecommunication applications (cellular phones, satellite communications, etc).
In the lower level layer of communication, the data is usually sent in what is called frames. Frames are just strings of bytes that are sent through a physical channel. This frames usually contain the actual data plus some other fields (coded in bytes) that are part of what is called the header. The header usually contains bytes that encode some information related to the status of the communication (e.g, with flags (bits)), frame counters, correction and error detection codes, etc. To get the transmitted data in a frame, and to build the frames to send data, you will need for sure bitwise operations.
In general, when dealing with that kind of applications, an API is available so you don't have to deal with all those details. For example, all modern programming languages provide libraries for socket connections, so you don't actually need to build the TCP/IP communication frames. But think about the good people that programmed those APIs for you, they had to deal with frame construction for sure; using all kinds of bitwise operations to go back and forth from the low-level to the higher-level communication.
As a concrete example, imagine some one gives you a file that contains raw data that was captured directly by telecommunication hardware. In this case, in order to find the frames, you will need to read the raw bytes in the file and try to find some kind of synchronization words, by scanning the data bit by bit. After identifying the synchronization words, you will need to get the actual frames, and SHIFT them if necessary (and that is just the start of the story) to get the actual data that is being transmitted.
Another very different low level family of application is when you need to control hardware using some (kind of ancient) ports, such as parallel and serial ports. This ports are controlled by setting some bytes, and each bit of that bytes has a specific meaning, in terms of instructions, for that port (see for instance http://en.wikipedia.org/wiki/Parallel_port). If you want to build software that does something with that hardware you will need bitwise operations to translate the instructions you want to execute to the bytes that the port understand.
For example, if you have some physical buttons connected to the parallel port to control some other device, this is a line of code that you can find in the soft application:
read = ((read ^ 0x80) >> 4) & 0x0f;
Hope this contributes.

I didn't see it mentioned above but you will also see some people use left and right shift for arithmetic operations. A left shift by x is equivalent to multiplying by 2^x (as long as it doesn't overflow) and a right shift is equivalent to dividing by 2^x.
Recently I've seen people using x << 1 and x >> 1 for doubling and halving, although I'm not sure if they are just trying to be clever or if there really is a distinct advantage over the normal operators.

I hope this clarifies those two:
x | 2
0001 //x
0010 //2
0011 //result = 3
x & 1
0001 //x
0001 //1
0001 //result = 1

Think of 0 as false and 1 as true. Then bitwise and(&) and or(|) work just like regular and and or except they do all of the bits in the value at once. Typically you will see them used for flags if you have 30 options that can be set (say as draw styles on a window) you don't want to have to pass in 30 separate boolean values to set or unset each one so you use | to combine options into a single value and then you use & to check if each option is set. This style of flag passing is heavily used by OpenGL. Since each bit is a separate flag you get flag values on powers of two(aka numbers that have only one bit set) 1(2^0) 2(2^1) 4(2^2) 8(2^3) the power of two tells you which bit is set if the flag is on.
Also note 2 = 10 so x|2 is 110(6) not 111(7) If none of the bits overlap(which is true in this case) | acts like addition.

Sets
Sets can be combined using mathematical operations.
The union operator | combines two sets to form a new one containing items in either.
The intersection operator & gets items only in both.
The difference operator - gets items in the first set but not in the second.
The symmetric difference operator ^ gets items in either set, but not both.
Try It Yourself:
first = {1, 2, 3, 4, 5, 6}
second = {4, 5, 6, 7, 8, 9}
print(first | second)
print(first & second)
print(first - second)
print(second - first)
print(first ^ second)
Result:
{1, 2, 3, 4, 5, 6, 7, 8, 9}
{4, 5, 6}
{1, 2, 3}
{8, 9, 7}
{1, 2, 3, 7, 8, 9}

This example will show you the operations for all four 2 bit values:
10 | 12
1010 #decimal 10
1100 #decimal 12
1110 #result = 14
10 & 12
1010 #decimal 10
1100 #decimal 12
1000 #result = 8
Here is one example of usage:
x = raw_input('Enter a number:')
print 'x is %s.' % ('even', 'odd')[x&1]

Another common use-case is manipulating/testing file permissions. See the Python stat module: http://docs.python.org/library/stat.html.
For example, to compare a file's permissions to a desired permission set, you could do something like:
import os
import stat
#Get the actual mode of a file
mode = os.stat('file.txt').st_mode
#File should be a regular file, readable and writable by its owner
#Each permission value has a single 'on' bit. Use bitwise or to combine
#them.
desired_mode = stat.S_IFREG|stat.S_IRUSR|stat.S_IWUSR
#check for exact match:
mode == desired_mode
#check for at least one bit matching:
bool(mode & desired_mode)
#check for at least one bit 'on' in one, and not in the other:
bool(mode ^ desired_mode)
#check that all bits from desired_mode are set in mode, but I don't care about
# other bits.
not bool((mode^desired_mode)&desired_mode)
I cast the results as booleans, because I only care about the truth or falsehood, but it would be a worthwhile exercise to print out the bin() values for each one.

Bit representations of integers are often used in scientific computing to represent arrays of true-false information because a bitwise operation is much faster than iterating through an array of booleans. (Higher level languages may use the idea of a bit array.)
A nice and fairly simple example of this is the general solution to the game of Nim. Take a look at the Python code on the Wikipedia page. It makes heavy use of bitwise exclusive or, ^.

There may be a better way to find where an array element is between two values, but as this example shows, the & works here, whereas and does not.
import numpy as np
a=np.array([1.2, 2.3, 3.4])
np.where((a>2) and (a<3))
#Result: Value Error
np.where((a>2) & (a<3))
#Result: (array([1]),)

i didnt see it mentioned, This example will show you the (-) decimal operation for 2 bit values: A-B (only if A contains B)
this operation is needed when we hold an verb in our program that represent bits. sometimes we need to add bits (like above) and sometimes we need to remove bits (if the verb contains then)
111 #decimal 7
-
100 #decimal 4
--------------
011 #decimal 3
with python:
7 & ~4 = 3 (remove from 7 the bits that represent 4)
001 #decimal 1
-
100 #decimal 4
--------------
001 #decimal 1
with python:
1 & ~4 = 1 (remove from 1 the bits that represent 4 - in this case 1 is not 'contains' 4)..

Whilst manipulating bits of an integer is useful, often for network protocols, which may be specified down to the bit, one can require manipulation of longer byte sequences (which aren't easily converted into one integer). In this case it is useful to employ the bitstring library which allows for bitwise operations on data - e.g. one can import the string 'ABCDEFGHIJKLMNOPQ' as a string or as hex and bit shift it (or perform other bitwise operations):
>>> import bitstring
>>> bitstring.BitArray(bytes='ABCDEFGHIJKLMNOPQ') << 4
BitArray('0x142434445464748494a4b4c4d4e4f50510')
>>> bitstring.BitArray(hex='0x4142434445464748494a4b4c4d4e4f5051') << 4
BitArray('0x142434445464748494a4b4c4d4e4f50510')

the following bitwise operators: &, |, ^, and ~ return values (based on their input) in the same way logic gates affect signals. You could use them to emulate circuits.

To flip bits (i.e. 1's complement/invert) you can do the following:
Since value ExORed with all 1s results into inversion,
for a given bit width you can use ExOR to invert them.
In Binary
a=1010 --> this is 0xA or decimal 10
then
c = 1111 ^ a = 0101 --> this is 0xF or decimal 15
-----------------
In Python
a=10
b=15
c = a ^ b --> 0101
print(bin(c)) # gives '0b101'

You can use bit masking to convert binary to decimal;
int a = 1 << 7;
int c = 55;
for(int i = 0; i < 8; i++){
System.out.print((a & c) >> 7);
c = c << 1;
}
this is for 8 digits you can also do for further more.

Related

How does this code check for parity of a number? (even or odd)

I came across this piece of code on reddit
1 - ((num & 1) << 1) as i32
This code returns 1 for even numbers and -1 for odd numbers.
It takes less instructions than other ways of calculating the same thing, and is presumably, quite fast. So, how does it work? (A step-by-step breakdown would be helpful)
Note: I found What is the fastest way to find if a number is even or odd?, but don't understand how that works either.
Let's break this down from the inside out.
num & 1
This "masks" all but the least significant bit using a bitwise and. Since the least significant bit is the "ones" place, it will evaluate to 1 if the number is odd or 0 if the number is even.
(result1) << 1
This bitshifts that left by 1. This has the effect of multiplying by two. If num was odd, this will evaluate to 2 or still 0 if num was even. (0 * 2 = 0)
(result2) as i32
This casts the resulting unsigned integer (2 or 0) into a signed integer, allowing us to subtract it in the next operation. This is only for the compiler, it has no effect on the value in memory.
1 - result3
This subtracts the previous number from 1. If num was even, we get 1 - 0 which results in a final answer of 1. If num was odd, we get 1 - 2 which results in a final answer of -1.

Why does this hash calculating bit hack work?

For practice I've implemented the qoi specification in rust. In it there is a small hash function to store recently used pixels:
index_position = (r * 3 + g * 5 + b * 7 + a * 11) % 64
where r, g, b, and a are the red, green, blue and alpha channels respectively.
I assume this works as a hash because it creates a unique prime factorization for the numbers with the mod to limit the number of bytes. Anyways I implemented it naively in my code.
While looking at other implementations I came across this bit hack to optimize the hash calculation:
fn hash(rgba:[u8:4]) -> u8 {
let v = u32::from_ne_bytes(rgba);
let s = (((v as u64) << 32) | (v as u64)) & 0xFF00FF0000FF00FF;
s.wrapping_mul(0x030007000005000Bu64.to_le()).swap_bytes() as u8 & 63
}
I think I understand most of what's going on but I'm confused about the magic number (the multiplicand). To my understanding it should be flipped. As a step by step example:
let rgba = [0x12, 0x34, 0x56, 0x78].
On my machine (little endian) this gives v the value 0x78563412.
The bit shifting spreads the values, giving s = 0x7800340000560012.
Now here's where I get confused. The magic number has the values that should be multiplied aligned in a 64 bit field (3, 5, 7, 11), spaced the same way that the original values are. However they seem to be in reverse order from the values:
0x7800340000560012
0x030007000005000B
When multiplying it would seem that the highest value, the alpha channel (0x78), is being multiplied by 3, while the lowest value, the red channel (0x12), is being multiplied by 11. I'm also not entirely sure why this multiplication works anyway, after multiplying the values by various powers of 2.
I understand that the bytes are then swapped to big endian and trimmed, but that's not until after the multiplication step which loses me.
I know that the code produces the correct hash, but I don't understand why that's the case. Can anyone explain to me what I'm missing?
If you think about the way the math works, you want this flipped order, because it means all the results from each of the "logical" multiplications cluster in the same byte. The highest byte in the first value multiplied by the lowest byte in the second produces a result in the highest byte. The lowest byte in the first value's product with the highest byte in the second value produces a result in the same highest byte, and the same goes for the intermediate bytes.
Yes, the 0x78... and 0x03... are also multiplied by each other, but they overflow way past the top of the value and are lost. Having the order "backwards" means the result of the multiplications we care about all ends up summed in the uppermost byte (the total shift of the results we want is always 56 bits, because the 56th bit offset value is multiplied by the 0th, the 40th by the 16th, the 16th by the 40th, and the 0th by the 56th), with the rest of the multiplications we don't want having their results either overflow (and being lost) or appearing in lower bytes (which we ignore). If you flipped the bytes in the second value, the 0x78 * 0x0B (alpha value & multiplier) component would be lost to overflow, while the 0x12 * 0x03 (red value & multiplier) component wouldn't reach the target byte (every component we cared about would end up somewhat that wasn't the uppermost byte).
For a possibly more intuitive example, imagine doing the same work, but where all the bytes of one input except a single component are zero. If you multiply:
0x7800000000000000 * 0x030007000005000B
the logical result is:
0x1680348000258052800000000000000
but removing the overflow reduces that to:
0x2800000000000000
//^^ result we care about (actual product of 0x78 and 0x0B is 0x528, but only keeping low byte)
Similarly,
0x0000340000000000 * 0x030007000005000B
produces:
0x9c016c000104023c0000000000
overflowing to:
0x04023c0000000000
//^^ result we care about (actual product of 0x34 and 0x5 was 0x104, but only 04 kept)
In that case, the other multiplications did leave data in result (not all overflowed), but since we only look at the high byte, the rest gets ignored.
If you keep doing this math step by step and adding the results, you'll find that the high byte ends up the correct answer to the four individual multiplications you expected (mod 256); flip the order, and it won't work out that way.
The advantage to putting all the results in that high byte is that it allows you to use swap_bytes to move it cheaply to the low byte, and read the value directly (no need to even mask it on many architectures).

How do I order strings of length N and 2 bits?

I have a string of length N with 2 bits. I am trying to find a function to order these strings. For example:
F(110) = 1
F(101) = 2
F(011) = 3
The strategy I adopted was labeling the bits by their position, so that for the first case K=1 and L=2 and hence
F(1,2) = 1
F(1,3) = 2
F(2,3) = 3
Does anyone have an idea of what this function might be?
If you are dealing with actual strings, sort them alphabetically ascending. If you are dealing with integers, there are some workarounds:
Convert the integer into bit-strings and sort alphabetically ascending.
or
Reverse the bits in the integer (011 becomes 110) and sort numerically ascending.
However, these workarounds might be slow. The function F described by you turns out to be pretty simple (assuming you are given the positions of the 1-bits) and is therefore a good solution.
To come up with an implementation of F we first look at the sequence of all bit-strings with exactly two 1-bits. Here we don't care about the length of the bit-string. We simply increment the bit-strings from left to right (opposed to the usual Arabic interpretation of numbers where you increment from right to left).
Next to the actual bit-string I replaced all 0 by ., the left 1 by l, and the right 1 by r. This makes it easier to see the pattern.
1: 11 lr
2: 101 l.r
3: 011 .lr
4: 1001 l..r
5: 0101 .l.r
6: 0011 ..lr
7: 10001 l...r
8: 01001 .l..r
9: 00101 ..l.r
10: 00011 ...lr
11: 100001 l....r
… … …
The function F is supposed to count the steps needed to increment to a given bit-string.
In the following, L is the index of the left 1-bit and R is the index of the right 1-bit. As in your question, we use 1-based indices. That is, the leftmost character in a string has index 1
For the right 1-bit to move one position to the right, the left 1-bit has to "catch up". If the left 1-bit starts at L=1 then catching up takes R-1 steps (when counting the start step L=1 too). For the right 1-bit to reach a high position, the left 1-bit has to catch up multiple times, as it is returned to the start each time the right 1-bit moves one to the right. Each time, catching up takes a little bit longer as the right 1-bit is further away from the start. The first time it takes 1 step, then 2, then 3, and so on. Thus, For the right 1-bit to reach position R it takes 1+2+3+…+(R-1) steps = (R-1)·(R-2)/2 steps. After that, we only have to move the left 1-bit to its position, which takes L steps. Therefore the function is
F(L,R) := (R-1)·(R-2) / 2 + L
Note that this function only is easy to implement if you know L and R. If you have an integer and would need to determine L and R first, it might be easier and faster to reverse the integer instead and sort numerically ascending. Determining L and R might be slower than reversing the bits in the integer.

What is the bitwise negation for an integer?

I have homework assignment with a piece to compute the bitwise negation of integer value. It say 512 go into -513.
I have a solution that does x = 512 y = 512*(-1)+(-1).
Is that correct way?
I think you need to first negate and add 1.
-x = ~x + 1
consequently
~x= -x -1
This property is based on the way negative number are represented in two's complement. To represent a negative number $A$ on n bits, one uses the complement of |A| to 2n, i.e. the number 2n-|A|
It is easy to see that A+~A=111...11 as bits in the addition will always be 0 and 1 and 111...111 is the number just before 2n, or 2n-1.
As -|A| is coded by 2n-|A|, and A +~A=2n-1, we can say that -A=~A+1 or equivalently ~A=-A-1
This is true for any number, positive or negative. And ~512=-512-1=-513
val = 512
print (~val)
output:
-513
~ bitwise complement
Sets the 1 bits to 0 and 1 to 0.
For example ~2 would result in -3.
This is because the bit-wise operator would first represent the number in sign and magnitude which is 0000 0010 (8 bit operator) where the MSB is the sign bit.
Then later it would take the negative number of 2 which is -2.
-2 is represented as 1000 0010 (8 bit operator) in sign and magnitude.
Later it adds a 1 to the LSB (1000 0010 + 1) which gives you 1000 0011.
Which is -3.
Otherwise:
y = -(512+1)
print (y)
output:
-513

Python 3 - What is ">>"

This is the confusing line: x_next = (x_next + (a // x_prev)) >> 1
It is bit-wise shift. The next will give you some intuitions:
>>> 16 >> 1
8
>>> 16 >> 2
4
>>> 16 >> 3
2
>>> bin(16)
'0b10000'
>>> bin(16 >> 1)
'0b1000'
>>> bin(16 >> 2)
'0b100'
The >> operator is the same operator as it is in C and many other languages.
A bitshift to the right. If your number is like this in binary: 0100 than it will be 0010 after >> 1. With >> 2 it will be 0001.
So basically it's a nice way to divide your number by 2 (while flooring the remainder) ;)
It is the right shift operator.
Here it is being used to divide by 2. It would be far more clear to write this as
x_next = (x_next + (a // x_prev)) // 2
Sadly a lot of people try to be clever and use shift operators in place of multiplication and division. Typically this just leads to lots of confusion for the poor individuals who have to read the code at a later date.
Most newer/younger programmers do not worry about efficiency because the computers are so fast.
But if you are working on a 8-bit or 16-bit processor that may or may not have a hardware multiply and rarely has a hardware divide, then shifting integers takes one machine cycle while a multiply may take 16 or more and a divide may take 50-200 machine cycles. When your processor clock is in the GHz range you do not notice the difference, but if your instruction rate is 8 MHz or less it adds up very quickly.
So for efficiency people shift to multiply or divide for powers of two, especially in C which is the most common language for small processors and controllers.
I see it so often that I do not even think about it anymore.
Some of the things I do in C:
x = y >> 3; // is the same ad the floor of a divide by 8
if (y & 0x04) x++; // adds the rounding so the answer is rounded
For most microcontrollers, the compiler lets you see the resulting machine code generated and you can see what different statements generate. With that type of feedback, after awhile you just start writing more efficient code.
It means "right shift". It works the same as floor division by 2:
>>> a = 7
>>> a >> 1
3
>>> a // 2
3

Resources