Replace same character from python string but different replacement values [duplicate] - python-3.x

This question already has answers here:
Python Number Limit [duplicate]
(4 answers)
Closed 2 years ago.
Let's say I have this python string
>>> s = 'dog /superdog/ | cat /thundercat/'
Is there a way to like replace the character / (first one) with [ & second / with ].
I was thinking like an about like this.
Output:
'dog [superdog] | cat [thundercat]'
I tried doing like this but did not quite get that well.
>>> s = 'dog /superdog/ | cat /thundercat/'
>>> s.replace('/','[')
'dog [superdog[ | cat [thundercat['
I was thinking to know the best and pythonic way as possible. Thank you!

Python can handle arbitrarily large numbers because python has built-in arbitrary-precision integers. The limit is related to the amount of RAM memory Python can access. These built-in Long Integers arithmetic is implemented as an Integer object which is initially set to 32 bits for speed, and then start allocating memory on demand.
Integers are commonly stored using a word of memory, which is 4 bytes or 32 bits, so integers from 0 up to 4,294,967,295 (2e32 -1) can be stored.
But if your system has 1GB available to a python process, it will have 8589934592 bits to represent numbers, and you can use numbers like (2e8589934592 -1).

Computers can only handle numbers up to a certain size, but this is to be taken with some caveats.
2147483648 through 2147483647 are the limits of 32 bit numbers.
Most of todays computers can handle numbers of 64
bits, i.e. numbers from -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807, or from −(2^63) to 2^63 − 1
It is possible to create a software that can handle arbitrary large numbers, as long as RAM or storage suffice. Those solutions are rather slow, but e.g. SSL encryption is based on numbers thousands of digits long.
As a side note, you are doubling your initial million in every iteration, not adding a million.

Related

How can I shorten a hexadecimal string further?

I am using MongoDB's build in id fields to label products and for ease of usage/typability, I would like to compress the _id field down from a hexadecimal string that looks like 5b69c35ac2cc78c8979a8a9b to something shorter and involving all letters of the alphabet (both uppercase and lowercase) and numbers. preferably it would involve no more than 10 or 12 characters. Are there any common methods of accomplishing this in Node.JS/MongoDB?
You could convert them to base64, that would make them 16 characters long.
Example:
Buffer.from('5b69c35ac2cc78c8979a8a9b', 'hex').toString('base64') // W2nDWsLMeMiXmoqb
It's better if you can directly access the Buffer - converting many ObjectIds from string could be costly.
The code 5b69c35ac2cc78c8979a8a9b is 24 bytes long (in hex), which means the absolute minimum number of bytes needed to represent this value without losing information is 12, ranging from 0-255 which is not what we want.
If we take a look at the ObjectId we could (maybe) eliminate some bytes:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
Removing machine identifier and process id (if all id's are generated by the same process) would leave us with 7 bytes (0-255), which is still not ideal to encode in base64 or even base32.
So it would probably be better to just use a 32 bit unsigned integer for the product codes and display it as hex using 8 bytes (the leading zeros could be removed).
Encoding those 4 bytes in base64 wouldn't help much (every 3 bytes become 4 bytes), and personally I would prefer case insensitive id's for use in url's which would leave us only with base32.
For better ease of usage/typability than hexadecimal, those 4 bytes could be encoded in z-base-32 and would fit in 7 bytes without padding (7 * 5 bits = 35 bits).

How can I calculate a block number of where a word is located on disk?

I want to find out what block number the word "word" is using. I know blocks start at 0, so I thought that adding 2114+1 would be my answer, but it's not...
user#host:~$ strings -td dump.dd|grep "word"
2114 __strtsuper your word is stored here
I know -td brings me back the offset in decimal, but how can I calculate the block number? What do I need to do with the 2114 number?
strings offsets are in bytes.
A disk block is composed by a group of bits, commonly 512 (and on bigger disks 4096).
So, you must know the block size of your disk, convert to bytes (1 byte = 8 bits), and divide the strings offset by that number.

Perl: string length limitations in real life

While, for example, perldata documents that scalar strings in Perl are limited only by available memory, I'm strongly suspecting in real life there would be some other limits.
I'm considering the following ideas:
I'm not sure how strings are implemented in Perl — is there some sort of byte/character counter? If there is, then probably it's implemented as a platform-dependent integer (i.e. 32-bit or 64-bit), so effectively it would limit strings to something like 2 ** 31, 2 ** 32, 2 ** 63 or 2 ** 64 bytes.
If Perl doesn't use a counter and instead uses some byte to terminate the string (which would be strange, as it's perfectly ok to have a string like "foo\0bar" in Perl), then all operations would inevitably get much slower as string length increases.
Most string functions that Perl deals with strings, such as length, for example, return normal scalar integer, and I strongly suspect that it would be platform-limited integer too.
So, what would be the other factors that limit Perl string length in real life? What should be considered an okay string length for practical purposes?
It keep track of the size of the buffer and the number of bytes therein.
$ perl -MDevel::Peek -e'$x="abcdefghij"; Dump($x);'
SV = PV(0x9222b00) at 0x9222678
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x9238220 "abcdefghij"\0
CUR = 10 <-- 10 bytes used
LEN = 12 <-- 12 bytes allocated
On a 32-bit build of Perl, it uses 32-bit unsigned integer for these values. This is (exactly) large enough to create a string that uses up your process's entire 4 GiB address space.
On a 64-bit build of Perl, it uses 64-bit unsigned integer for those values. This is (exactly) large enough to create a string that uses up your process's entire 16 EiB address space.
The docs are correct. The size of the string is limited only by available memory.

Why is string manipulation more expensive?

I've heard this so many times, that I have taken it for granted. But thinking back on it, can someone help me realize why string manipulation, say comparison etc, is more expensive than say an integer, or some other primitive?
8bit example:
1 bit can be 1 or 0. With 2 bits you can represent 0, 1, 2, and 3. And so on.
With a byte you have 2^8 possibilities, from 0 to 255.
In a string a single letter is stored in a byte, so "Hello world" is 11 bytes.
If I want to do 100 + 100, 100 is stored in 1 byte of memory, I need only two bytes to sum two numbers. The result will need again 1 byte.
Now let's try with strings, "100" + "100", this is 3 bytes plus 3 bytes and the result, "100100" needs 6 bytes to be stored.
This is over-simplified, but more or less it works in this way.
The int data type in C# was carefully selected to be a good match with processor design. Which can store an int in a cpu register, a storage location that's an easy factor of 3 faster than memory. And a single cpu instruction to compare values of type int. The CMP instruction runs in less than a single cpu cycle, a fraction of a nano-second.
That doesn't work nearly as well for a string, it is a variable length data type and every single char in the string must be compared to test for equality. So it is automatically proportionally slower by the size of the string. Furthermore, string comparison is afflicted by culture dependent comparison rules. The kind that make "ss" and "ß" equal in German and "Aa" and "Å" equal in Danish. Nothing subtle to deal with, taken care of by highly optimized table-driven code inside the CLR. It can't beat CMP.
I've always thought it was because of the immutability of strings. That is, every time you make a change to the string, it requires allocating memory for a whole new string (rather than modifying the original in place).
Probably a woefully naive understanding but perhaps someone else can expound further.
There are several things to consider when looking at the "cost" of manipulating strings.
There is the cost in terms of memory usage, there is the cost in terms of CPU cycles used, and there is a cost associated with the complexity of the code involved.
Integer manipulation (Add, Subtract, Multipy, Divide, Compare) is most often done by the CPU at the hardware level, in few (or even 1) instruction. When the manipulation is done, the answer fits back in the same size chunk of memory.
Strings are stored in blocks of memory, which have to be manipulated a byte or word at a time. Comparing two 100 character long strings may require 100 separate comparison operations.
Any manipulation that makes a string longer will require, either moving the string to a bigger block of memory, or moving other stuff around in memory to allow growing the existing block.
Any manipulation that leaves the string the same, or smaller, could be done in place, if the language allows for it. If not, then again, a new block of memory has to be allocated and contents moved.

Variable substitution faster than in-line integer in Vic-20 basic?

The following two (functionally equivalent) programs are taken from an old issue of Compute's Gazette. The primary difference is that program 1 puts the target base memory locations (7680 and 38400) in-line, whereas program 2 assigns them to a variable first.
Program 1 runs about 50% slower than program 2. Why? I would think that the extra variable retrieval would add time, not subtract it!
10 PRINT"[CLR]":A=0:TI$="000000"
20 POKE 7680+A,81:POKE 38400+A,6:IF A=505 THEN GOTO 40
30 A=A+1:GOTO 20
40 PRINT TI/60:END
Program 1
10 PRINT "[CLR]":A=0:B=7600:C=38400:TI$="000000"
20 POKE B+A,81:POKE C+A,6:IF A=505 THEN GOTO 40
30 A=A+1:GOTO 20
40 PRINT TI/60:END
Program 2
The reason is that BASIC is fully interpreted here, so the strings "7680" and "38400" need to be converted to binary integers EVERY TIME line 20 is reached (506 times in this program). In program 2, they're converted once and stored in B. So as long as the search-for-and-fetch of B is faster than convert-string-to-binary, program 2 will be faster.
If you were to use a BASIC compiler (not sure if one exists for VIC-20, but it would be a cool retro-programming project), then the programs would likely be the same speed, or perhaps 1 might be slightly faster, depending on what optimizations the compiler did.
It's from page 76 of this issue: http://www.scribd.com/doc/33728028/Compute-Gazette-Issue-01-1983-Jul
I used to love this magazine. It actually says a 30% improvement. Look at what's happening in program 2 and it becomes clear, because you are looping a lot using variables the program is doing all the memory allocation upfront to calculate memory addresses. When you do the slower approach each iteration has to allocate memory for the highlighted below as part of calculating out the memory address:
POKE 7680+A,81:POKE 38400+A
This is just the nature of the BASIC Interpreter on the VIC.
Accessing the first defined variable will be fast; the second will be a little slower, etc. Parsing multi-digit constants requires the interpreter to perform repeated multiplication by ten. I don't know what the exact tradeoffs are between variables and constants, but short variable names use less space than multi-digit constants. Incidentally, the constant zero may be parsed more quickly if written as a single decimal point (with no digits) than written as a digit zero.

Resources