Why are labels in BASIC increments of 10? - basic

In BASIC, tags are in increments of 10. For example, mandlebrot.bas from github/linguist:
10 REM Mandelbrot Set with ANSI Colors in BASIC
20 REM https://github.com/telnet23
30 REM 20 November 2020
40 CLS
50 MAXK = 32
60 MINRE = -2.5
70 MAXRE = 1.5
80 MINIM = -1.5
90 MAXIM = 1.5
100 FOR X = 1 TO WIDTH
110 FOR Y = 1 TO HEIGHT
120 LOCATE Y, X
130 REC = MINRE + (MAXRE - MINRE) / (WIDTH - 1) * (X - 1)
140 IMC = MINIM + (MAXIM - MINIM) / (HEIGHT - 1) * (Y - 1)
150 K = 0
160 REF = 0
170 IMF = 0
180 K = K + 1
190 REF = REC + REF * REF - IMF * IMF
200 IMF = IMC + REF * IMF + REF * IMF
210 IF REF * REF + IMF * IMF > 4 THEN GOTO 230
220 IF K < MAXK THEN GOTO 180
230 M = 40 + INT(8 / MAXK * (K - 1))
240 PRINT CHR$(27) + "[" + STR$(M) + "m";
250 PRINT " ";
260 PRINT CHR$(27) + "[49m";
270 NEXT Y
280 NEXT X
Why isn't it just increments in 1? That would make more sense.

The short answer is that BASIC numbering is in increments of one, but programmers can and do skip some of the increments. BASIC grew out of Fortran, which also used numeric labels, and often used increments of 10. Unlike Fortran, early BASIC required numbering all lines, so that they changed from labels to line numbers.
BASIC is numbered in increments greater than one to allow adding new lines between existing lines.
Most early home computer BASIC implementations did not have a built-in means of renumbering lines.
Code execution in BASIC implementations with line numbers happened in order of line number.
This meant that if you wanted to add new lines, you needed to leave numbers free between those lines. Even on computers with a RENUM implementation, renumbering could take time. So if you wanted standard increments you’d still usually only RENUM at the end of a session or when you thought you were mostly finished.
Speculation: Programmers use increments of 10 specifically for BASIC line numbers for at least two reasons. First, tradition. Fortran code from the era appears to use increments of 10 for its labels when it uses any standard increments at all. Second, appearance. On the smaller screens of the era it is easier to see where BASIC lines start if they all end in the same symbol, and zero is a very useful symbol for that purpose. Speaking from personal experience, I followed the spotty tradition of starting different routines on hundreds boundaries and thousands boundaries to take advantage of the multiple zeroes at the beginning of the line. This made it easier to recognize the starts of those routines later when reading through the code.
BASIC grew from Fortran, which also used numbers, but as labels. Fortran lines only required a label if they needed to be referred to, such as with a GO TO, to know where a loop can be exited, or as a FORMAT for a WRITE. Such lines were also often in increments greater than 1—and commonly also 10—so as to allow space to add more in between if necessary. This wasn’t technically necessary. Since they were labels and not line numbers, they didn’t need to be sequential. But most programmers made them sequential for readability.
In his commonly-used Fortran 77 tutorial, Erik Boman writes:
Typically, there will be many loops and other statements in a single program that require a statement label. The programmer is responsible for assigning a unique number to each label in each program (or subprogram). The numerical value of statement labels have no significance, so any integer numbers can be used. Typically, most programmers increment labels by 10 at a time.
BASIC required that all lines have numbers and that the line numbers be sequential; that was part of the purpose of having line numbers: a BASIC program could be entered out of order. This allowed for later edits. Thus, line 15 could be added after lines 10 and 20 had been added. This made leaving potential line numbers between existing line numbers even more useful.
If you look at magazines with BASIC program listings, such as Rainbow Magazine or Creative Computing, you’ll often see numbers sandwiched somewhat randomly between the tens. And depending on style, many people used one less than the line number at the start of a routine or subroutine to comment the routine. Routines and DATA sections might also start on even hundreds or even thousands.
Programmers who used conventions like this might not even want to renumber a program, as it would mess up their conventions. BASIC programs were often a mass of text; any convention that improved readability was savored.
Ten was a generally accepted spacing even before the home computer era. In his basic basic, second edition (1978, and expecting that the user would be using “a remote terminal”), James S. Coan writes (page 2):
It is conventional although not required to use intervals of 10 for the numbers of adjacent lines in a program. This is because any modification in the program must also have line numbers. So you can use the in-between numbers for that purpose. It should be comforting to know at this point that the line numbers do not have to be typed in order. No matter what order they are typed in, the computer will follow the numerical order in executing the program.
There are examples of similar patterns in Coan’s Basic Fortran. For example, page 46 has a simple program to “search for pythagorean triples”; while the first label is 12, the remaining labels are 20, 30, and 40, respectively.
He used similar patterns without increments of 10; for example, on page 132 of Basic Fortran, Coan uses increments of 2 for his labels, and keeps the calculation section of the program in the hundreds with the display section of the program in the two hundreds. The END statement uses label 9900.
Similarly, in their 1982 Elementary BASIC, Henry Ledgard and Andrew Singer write (page 27):
Depending on the version of Basic you are using, a line number can consist of 1 to 4 or 5 digits. Here, all line numbers will consist of 4 digits, a common practice accepted by almost every version of Basic. The line numbers must be in sequential order. Increasing line numbers are often given in increments of 10, a convention we will also follow. This convention allows you to make small changes to a program without changing all the line numbers.
And Jerald R. Brown’s 1982 Instant BASIC: 2nd Astounding Edition (p. 7):
You don’t have to enter or type in a program in line number order. That is, you don’t have to enter line 10 first, then line 20, and then line 30. If we type in a program out of line number order, the computer doesn’t care. It follows the line numbers not the order they were entered or typed in. This makes it easy to insert more statements in a program already stored in the computer’s memory. You may have noticed how we cleverly number the statements in our programs by 10's. This makes it easy to add more statements between the existing line numbers -- up to nine more statements between lines 10 and 20, for example.
Much of the choice of how to number lines in a BASIC program was based on tradition and a vague sense of what worked. This was especially true in the home computer era where most users didn’t take classes on how to use BASIC but rather learned by reading other people’s programs, typing them in from the many books and magazines that provided program listings. The tradition of incrementing by 10 and inserting new features between those increments was an obvious one.
You can see it scanning through old books of code, such as 101 BASIC Computer Games. The very first program, “Amazin” increments its line numbers by 10. But at some point, a user/coder decided they needed an extra space after the code prints out how many dollars the player has; so that extra naked PRINT is on line 195. And the display of the instructions for the game are all kept between lines 100 and 109, another common pattern.
The program listing on page 30 for Basket displays the common habit of starting separate routines at even hundreds and thousands. Line numbers within those routines continue to increment by 10. The pattern is fairly obvious even though new features (and possibly other patterns) have added several lines outside the pattern.
As BASIC implementations began to get RENUM commands, more BASIC code listings appeared with increments of one. This is partly because using an increment of one used less memory. While the line number itself used a fixed amount of RAM (with the result that the maximum line number was often somewhere around FFFF, or 65525), references to line numbers did not tend to use a fixed length. Thus, smaller line numbers used less RAM overall.
Depending on how large the program was, and how much branching it used, this could be significant compared to the amount of RAM the machine itself had.
For example, I recently typed in the SKETCH.BAS program from the October 1984 Rainbow Magazine, page 97. This is a magazine, and a program, for the TRS-80 Color Computer. This program uses increments of 1 for its line numbering. On CLOADing the program in, free memory stands at 17049. After using RENUM 10,1,10 to renumber it in increments of 10, free memory stands at 16,953.
A savings of 96 bytes may not sound like much, but this is a very small program; and it’s still half a percent of available RAM. The difference could be the difference between a program fitting into available RAM or not fitting. This computer only has 22823 bytes of RAM free even with no program in memory at all.

Related

How long does it take to crack a hash?

I want to calculate the time it will take to break a SHA-256 hash. So I research and found the following calculation. If I have a password in lower letter with a length of 6 chars, I would have 26^6passwords right?
To calculate the time I have to divide this number by a hashrate, I guess. So if I had one RTX 3090, the hashrate would be 120 MH/s (1.2*10^8 H/s) and than I need to calculate 26^6/(1.2*10^8) to get the time in seconds right?
Is this idea right or wrong?
Yes, but a lowercase-latin 6 character string is also short enough that you would expect to compute this one time and put it into a database so that you could look it up in O(1). It's only a bit over 300M entries. That said, given you're 50% likely to find the answer in the first half of your search, it's so fast to crack that you might not even bother unless you were doing this often. You don't even need a particularly fancy GPU for something on this scale.
Note that in many cases a 6 character string can also be a 5 character string, so you need to add 26^6 + 26^5 + 26^4 + ..., but all of these together only raises this to around 320M hashes. It's a tiny space.
Adding uppercase, numbers and the easily typed symbols gets you up to 96^6 ~ 780B. On the other hand, adding just 3 more lowercase-letters (9 total) gets you to 26^9 ~ 5.4T. For brute force on random strings, longer is much more powerful than complicated.
To your specific question, note that it does matter how you implement this. You won't get these kinds of hash rates if you don't write your code in a way to maximize the GPU. For example, writing simple code that sends one value to the GPU to hash at a time, and then compares the result on the CPU could be incredibly slow (in some cases slower than just doing all the work on a CPU). Setting up your memory efficiently and maximizing things the GPU can do in parallel are very important. If you're not familiar with this kind of programming, I recommend using or studying a tool like John the Ripper.

ZPL 2 GS1 128 barcodes - When to use/switch subsets / maximum characters

I have several questions according to the ZPL and GS1 128 Barcodes.
I thought using subset B is always possible but sometimes it extends the
width of the barcode more then subset C (if there are only numeric values).
So I started switching between the subsets. But when does it makes sense to switch? One example:
Plain Barcode: (02)12345678901234(10)00TestTest00
Could be:'>;>802123456789012311000>6TestTest00'
or
'>;>802123456789012311000>6TestTest>500'
What are the advantages of Subset A?
I also didn't find any information about the maximum of characters which can be part of a GS1 128 barcode for a specific label size (like DIN A5).
As a rule of thumb, I stick to Code128B with two exceptions:
I switch to Code128C when I know I am going to have at least 6 contiguous numbers embedded in a barcode.
I use Code128A when I can't get around embedding tabs or carriage returns in a single symbol (when I'm trying to simulate a user filling out multiple fields on a form with one scan), but seldom for access to other control codes..
Maximum characters for GS1 fields can be found here: https://en.wikipedia.org/wiki/GS1-128
It appears that most fields are limited, but many allow up to 30 characters. With one exception (Extended Packaging URL), which allows up to 70 characters.
As far as label size, that's all about bar density. My tightest scannable 70 character symbol is about 4 inches long, assuming the use of Code128B.

Bitwise operations Python

This is a first run-in with not only bitwise ops in python, but also strange (to me) syntax.
for i in range(2**len(set_)//2):
parts = [set(), set()]
for item in set_:
parts[i&1].add(item)
i >>= 1
For context, set_ is just a list of 4 letters.
There's a bit to unpack here. First, I've never seen [set(), set()]. I must be using the wrong keywords, as I couldn't find it in the docs. It looks like it creates a matrix in pythontutor, but I cannot say for certain. Second, while parts[i&1] is a slicing operation, I'm not entirely sure why a bitwise operation is required. For example, 0&1 should be 1 and 1&1 should be 0 (carry the one), so binary 10 (or 2 in decimal)? Finally, the last bitwise operation is completely bewildering. I believe a right shift is the same as dividing by two (I hope), but why i>>=1? I don't know how to interpret that. Any guidance would be sincerely appreciated.
[set(), set()] creates a list consisting of two empty sets.
0&1 is 0, 1&1 is 1. There is no carry in bitwise operations. parts[i&1] therefore refers to the first set when i is even, the second when i is odd.
i >>= 1 shifts right by one bit (which is indeed the same as dividing by two), then assigns the result back to i. It's the same basic concept as using i += 1 to increment a variable.
The effect of the inner loop is to partition the elements of _set into two subsets, based on the bits of i. If the limit in the outer loop had been simply 2 ** len(_set), the code would generate every possible such partitioning. But since that limit was divided by two, only half of the possible partitions get generated - I couldn't guess what the point of that might be, without more context.
I've never seen [set(), set()]
This isn't anything interesting, just a list with two new sets in it. So you have seen it, because it's not new syntax. Just a list and constructors.
parts[i&1]
This tests the least significant bit of i and selects either parts[0] (if the lsb was 0) or parts[1] (if the lsb was 1). Nothing fancy like slicing, just plain old indexing into a list. The thing you get out is a set, .add(item) does the obvious thing: adds something to whichever set was selected.
but why i>>=1? I don't know how to interpret that
Take the bits in i and move them one position to the right, dropping the old lsb, and keeping the sign. Sort of like this
Except of course that in Python you have arbitrary-precision integers, so it's however long it needs to be instead of 8 bits.
For positive numbers, the part about copying the sign is irrelevant.
You can think of right shift by 1 as a flooring division by 2 (this is different from truncation, negative numbers are rounded towards negative infinity, eg -1 >> 1 = -1), but that interpretation is usually more complicated to reason about.
Anyway, the way it is used here is just a way to loop through the bits of i, testing them one by one from low to high, but instead of changing which bit it tests it moves the bit it wants to test into the same position every time.

Security: longer keys versus more available characters

I apologize if this has been answered before, but I was not able to find anything. This question was inspired by a comment on another security-related question here on SO:
How to generate a random, long salt for use in hashing?
The specific comment is as follows (sixth comment of accepted answer):
...Second, and more importantly, this will only return hexadecimal
characters - i.e. 0-9 and A-F. It will never return a letter higher
than an F. You're reducing your output to just 16 possible characters
when there could be - and almost certainly are - many other valid
characters.
– AgentConundrum Oct 14 '12 at 17:19
This got me thinking. Say I had some arbitrary series of bytes, with each byte being randomly distributed over 2^(8). Let this key be A. Now suppose I transformed A into its hexadecimal string representation, key B (ex. 0xde 0xad 0xbe 0xef => "d e a d b e e f").
Some things are readily apparent:
len(B) = 2 len(A)
The symbols in B are limited to 2^(4) discrete values while the symbols in A range over 2^(8)
A and B represent the same 'quantities', just using different encoding.
My suspicion is that, in this example, the two keys will end up being equally as secure (otherwise every password cracking tool would just convert one representation to another for quicker attacks). External to this contrived example, however, I suspect there is an important security moral to take away from this; especially when selecting a source of randomness.
So, in short, which is more desirable from a security stand point: longer keys or keys whose values cover more discrete symbols?
I am really interested in the theory behind this, so an extra bonus gold star (or at least my undying admiration) to anyone who can also provide the math / proof behind their conclusion.
If the number of different symbols usable in your password is x, and the length is y, then the number of different possible passwords (and therefore the strength against brute-force attacks) is x ** y. So you want to maximize x ** y. Both adding to x or adding to y will do that, Which one makes the greater total depends on the actual numbers involved and what your practical limits are.
But generally, increasing x gives only polynomial growth while adding to y gives exponential growth. So in the long run, length wins.
Let's start with a binary string of length 8. The possible combinations are all permutations from 00000000 and 11111111. This gives us a keyspace of 2^8, or 256 possible keys. Now let's look at option A:
A: Adding one additional bit.
We now have a 9-bit string, so the possible values are between 000000000 and 111111111, which gives us a keyspace size of 2^9, or 512 keys. We also have option B, however.
B: Adding an additional value to the keyspace (NOT the keyspace size!):
Now let's pretend we have a trinary system, where the accepted numbers are 0, 1, and 2. Still assuming a string of length 8, we have 3^8, or 6561 keys...clearly much higher.
However! Trinary does not exist!
Let's look at your example. Please be aware I will be clarifying some of it, which you may have been confused about. Begin with a 4-BYTE (or 32-bit) bitstring:
11011110 10101101 10111110 11101111 (this is, btw, the bitstring equivalent to 0xDEADBEEF)
Since our possible values for each digit are 0 or 1, the base of our exponent is 2. Since there are 32 bits, we have 2^32 as the strength of this key. Now let's look at your second key, DEADBEEF. Each "digit" can be a value from 0-9, or A-F. This gives us 16 values. We have 8 "digits", so our exponent is 16^8...which also equals 2^32! So those keys are equal in strength (also, because they are the same thing).
But we're talking about REAL passwords, not just those silly little binary things. Consider an alphabetical password with only lowercase letters of length 8: we have 26 possible characters, and 8 of them, so the strength is 26^8, or 208.8 billion (takes about a minute to brute force). Adding one character to the length yields 26^9, or 5.4 trillion combinations: 20 minutes or so.
Let's go back to our 8-char string, but add a character: the space character. now we have 27^8, which is 282 billion....FAR LESS than adding an additional character!
The proper solution, of course, is to do both: for instance, 27^9 is 7.6 trillion combinations, or about half an hour of cracking. An 8-character password using upper case, lower case, numbers, special symbols, and the space character would take around 20 days to crack....still not nearly strong enough. Add another character, and it's 5 years.
As a reference, I usually make my passwords upwards of 16 characters, and they have at least one Cap, one space, one number, and one special character. Such a password at 16 characters would take several (hundred) trillion years to brute force.

Encoding name strings into an unique number

I have a large set of names (millions in number). Each of them has a first name, an optional middle name, and a lastname. I need to encode these names into a number that uniquely represents the names. The encoding should be one-one, that is a name should be associated with only one number, and a number should be associated with only one name.
What is a smart way of encoding this? I know it is easy to tag each alphabet of the name according to its position in the alphabet set (a-> 1, b->2.. and so on) and so a name like Deepa would get -> 455161, but again here I cannot make out if the '16' is really 16 or a combination of 1 and 6.
So, I am looking for a smart way of encoding the names.
Furthermore, the encoding should be such that the number of digits in the output numeral for any name should have fixed number of digits, i.e., it should be independent of the length. Is this possible?
Thanks
Abhishek S
To get the same width numbers, can't you just zero-pad on the left?
Some options:
Sort them. Count them. The 10th name is number 10.
Treat each character as a digit in a base 26 (case insensitive, no
digits) or 52 (case significant, no digits) or 36 (case insensitive
with digits) or 62 (case sensitive with digits) number. Compute the
value in an int. EG, for a name of "abc", you'd have 0 * 26^2 + 1 *
26^1 + 2 * 20^0. Sometimes Chinese names may use digits to indicate tonality.
Use a "perfect hashing" scheme: http://en.wikipedia.org/wiki/Perfect_hash_function
This one's mostly suggested in fun: use goedel numbering :). So
"abc" would be 2^0 * 3^1 * 5^2 - it's a product of powers of primes.
Factoring the number gives you back the characters. The numbers
could get quite large though.
Convert to ASCII, if you aren't already using it. Then treat each
ordinal of a character as a digit in a base-256 numbering system.
So "abc" is 0*256^2 + 1*256^1 + 2*256^0.
If you need to be able to update your list of names and numbers from time to time, #2, #4 and #5 should work. #1 and #3 would have problems. #5 is probably the most future-proofed, though you may find you need unicode at some point.
I believe you could do unicode as a variant of #5, using powers of 2^32 instead of 2^8 == 256.
What you are trying to do there is actually hashing (at least if you have a fixed number of digits). There are some good hashing algorithms with few collisions. Try out sha1 for example, that one is well tested and available for modern languages (see http://en.wikipedia.org/wiki/Sha1) -- it seems to be good enough for git, so it might work for you.
There is of course a small possibility for identical hash values for two different names, but that's always the case with hashing and can be taken care of. With sha1 and such you won't have any obvious connection between names and IDs, which can be a good or a bad thing, depending on your problem.
If you really want unique ids for sure, you will need to do something like NealB suggested, create IDs yourself and connect names and IDs in a Database (you could create them randomly and check for collisions or increment them, starting at 0000000000001 or so).
(improved answer after giving it some thought and reading the first comments)
You can use the BigInteger for encoding arbitrary strings like this:
BigInteger bi = new BigInteger("some string".getBytes());
And for getting the string back use:
String str = new String(bi.toByteArray());
I've been looking for a solution to a problem very similar to the one you proposed and this is what I came up with:
def hash_string(value):
score = 0
depth = 1
for char in value:
score += (ord(char)) * depth
depth /= 256.
return score
If you are unfamiliar with Python, here's what it does.
The score is initially 0 and the depth are set to 1
For every character add the ord value * the depth
The ord function returns the UTF-8 value (0-255) for each character
Then it's multiplied by the 'depth'.
Finally the depth is divided by 256.
Essentially, the way that it works is that the initial characters add more to the score while later characters contribute less and less. If you need an integer, multiply the end score by 2**64. Otherwise you will have a decimal value between 0-256. This encoding scheme works for binary data as well as there are only 256 possible values in a byte/char.
This method works great for smaller string values, however, for longer strings you will notice that the decimal value requires more precision than a regular double (64-bit) can provide. In Java, you can use the 'BigDecimal' and in Python use the 'decimal' module for added precision. A bonus to using this method is that the values returned are in sorted order so they can be searched 'efficiently'.
Take a look at https://en.wikipedia.org/wiki/Huffman_coding. That is the standard approach.
You can translate it, if every character (plus blank, at least) will occupy a position.
Therefore ABC, which is 1,2,3 has to be translated to
1*(2*26+1)² + 2*(53) + 3
This way, you could encode arbitrary strings, but if the length of the input isn't limited (and how should it?), you aren't guaranteed to have an upper limit for the length.

Resources