Why is y before x in ncurses functions and arguments? - naming

I've noticed that with most ncurses functions, both in function names and their arguments, y comes before x (getmaxyx(),getparyx(),getyx(), etc), but the opposite is true almost everywhere else. Both in programming languages and math in general.
Why is this?

That matches the order used for the terminal description capability for moving the cursor:
cursor_address cup cm move to row #1 col-
umns #2
For terminals that support cursor-address, you could in principle have row,col or col,row as well as other variations such as using a 1-based set of coordinates:
%i add 1 to first two parameters (for ANSI terminals)
An ANSI terminal uses the row,col ordering, as seen in this fragment used in many terminals:
ansi+cup,
cup=\E[%i%p1%d;%p2%dH, home=\E[H,
That is, cup is an escape character, followed by [ and the row and column values (p1 and p2) separated by ; and terminated by H.
Why that particular order was chosen for the standard is obscure (about 40 years ago), but it certainly had an effect on the libraries written to use these escape sequences.
Further reading:
VT100.net
ECMA-48: Control Functions for Coded Character Sets

Related

Is there a module or regex in Python to convert all fonts to a uniform font? (Text is coming from Twitter)

I'm working with some text from twitter, using Tweepy. All that is fine, and at the moment I'm just looking to start with some basic frequency counts for words. However, I'm running into an issue where the ability of users to use different fonts for their tweets is making it look like some words are their own unique word, when in reality they're words that have already been encountered but in a different font/font size, like in the picture below (those words are words that were counted previously and appear in the spreadsheet earlier up).
This messes up the accuracy of the counts. I'm wondering if there's a package or general solution to make all the words a uniform font/size - either while I'm tokenizing it (just by hand, not using a module) or while writing it to the csv (using the csv module). Or any other solutions for this that I may not be considering. Thanks!
You can (mostly) solve your problem by normalising your input, using unicodedata.normalize('NFKC', str).
The KC normalization form (which is what NF stands for) first does a "compatibility decomposition" on the text, which replaces Unicode characters which represent style variants, and then does a canonical composition on the result, so that ñ, which is converted to an n and a separate ~ diacritic by the decomposition, is then turned back into an ñ, the canonical composite for that character. (If you don't want the recomposition step, use NFKD normalisation.) See Unicode Annex 15 for a more precise description, with examples.
Unicode contains a number of symbols, mostly used for mathematics, which are simply stylistic variations on some letter or digit. Or, in some cases, on several letters or digits, such as ¼ or ℆. In particular, this includes commonly-used symbols written with font variants which have particular mathematical or other meanings, such as ℒ (the Laplace transform) and ℚ (the set of rational numbers). Canonical decomposition will strip out the stylistic information, which reduces those four examples to '1/4', 'c/u', 'L' and 'Q', respectively.
The first published Unicode standard defined a block of Letter-like symbols block in the Basic Multilingula Plane (BMP). (All of the above examples are drawn from that block.) In Unicode 3.1, complete Latin and Greek alphabets and digits were added in the Mathematical Alphanumeric Symbols block, which includes 13 different font variants of the 52 upper- and lower-case letters of the roman alphabet (lower and upper case), 58 greek letters in five font variants (some of which could pass for roman letters, such as 𝝪 which is upsilon, not capital Y), and the 10 digits in five variants (𝟎 𝟙 𝟤 𝟯 𝟺). And a few loose characters which mathematicians apparently asked for.
None of these should be used outside of mathematical typography, but that's not a constraint which most users of social networks care about. So people compensate for the lack of styled text in Twitter (and elsewhere) by using these Unicode characters, despite the fact that they are not properly rendered on all devices, make life difficult for screen readers, cannot readily be searched, and all the other disadvantages of used hacked typography, such as the issue you are running into. (Some of the rendering problems are also visible in your screenshot.)
Compatibility decomposition can go a long way in resolving the problem, but it also tends to erase information which is really useful. For example, x² and H₂O become just x2 and H2O, which might or might not be what you wanted. But it's probably the best you can do.

Why do ANSI color escapes end in 'm' rather than ']'?

ANSI terminal color escapes can be done with \033[...m in most programming languages. (You may need to do \e or \x1b in some languages)
What has always seemed odd to me is how they start with \033[, but they end in m Is there some historical reason for this (perhaps ] was mapped to the slot that is now occupied by m in the ASCII table?) or is it an arbitrary character choice?
It's not completely arbitrary, but follows a scheme laid out by committees, and documented in ECMA-48 (the same as ISO 6429). Except for the initial Escape character, the succeeding characters are specified by ranges.
While the pair Escape[ is widely used (this is called the control sequence introducer CSI), there are other control sequences (such as Escape], the operating system command OSC). These sequences may have parameters, and a final byte.
In the question, using CSI, the m is a final byte, which happens to tell the terminal what the sequence is supposed to do. The parameters if given are a list of numbers. On the other hand, with OSC, the command-type is at the beginning, and the parameters are less constrained (they might be any string of printable characters).

Programming languages where indexing starts at 1? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
C programming language is known as a zero index array language. The first item in an array is accessible using 0. For example double arr[2] = {1.5,2.5} The first item in array arr is at position 0. arr[0] === 1.5 What programming languages are 1 based indexes?
I've heard of the these languages start at 1 instead of 0 for array access: Algol, Matlab, Action!, Pascal, Fortran, Cobol. Is this complete?
Specificially, a 1 based array would access the first item with 1, not zero.
A list can be found on wikipedia.
ALGOL 68
APL
AWK
CFML
COBOL
Fortran
FoxPro
Julia
Lua
Mathematica
MATLAB
PL/I
Ring
RPG
Sass
Smalltalk
Wolfram Language
XPath/XQuery
Fortran starts at 1. I know that because my Dad used to program Fortran before I was born (I am 33 now) and he really criticizes modern programming languages for starting at 0, saying it's unnatural, not how humans think, unlike maths, and so on.
However, I find things starting at 0 quite natural; my first real programming language was C and *(ptr+n) wouldn't have worked so nicely if n hadn't started at zero!
A pretty big list of languages is on Wikipedia under Comparison of Programming Languages (array) under "Array system cross-reference list" table (Default base index column)
This has a good discussion of 1- vs. 0- indexed and subscriptions in general
To quote from the blog:
EWD831 by E.W. Dijkstra, 1982.
When dealing with a sequence of length N, the elements of which we
wish to distinguish by subscript, the
next vexing question is what subscript
value to assign to its starting
element. Adhering to convention a)
yields, when starting with subscript
1, the subscript range 1 ≤ i < N+1;
starting with 0, however, gives the
nicer range 0 ≤ i < N. So let us let
our ordinals start at zero: an
element's ordinal (subscript) equals
the number of elements preceding it in
the sequence. And the moral of the
story is that we had better regard
—after all those centuries!— zero as a
most natural number.
Remark:: Many programming languages have been designed without due
attention to this detail. In FORTRAN
subscripts always start at 1; in ALGOL
60 and in PASCAL, convention c) has
been adopted; the more recent SASL has
fallen back on the FORTRAN convention:
a sequence in SASL is at the same time
a function on the positive integers.
Pity! (End of Remark.)
Fortran, Matlab, Pascal, Algol, Smalltalk, and many many others.
You can do it in Perl
$[ = 1; # set the base array index to 1
You can also make it start with 42 if you feel like that. This also affects string indexes.
Actually using this feature is highly discouraged.
Also in Ada you can define your array indices as required:
A : array(-5..5) of Integer; -- defines an array with 11 elements
B : array(-1..1, -1..1) of Float; -- defines a 3x3 matrix
Someone might argue that user-defined array index ranges will lead to maintenance problems. However, it is normal to write Ada code in a way which does not depend on the array indices. For this purpose, the language provides element attributes, which are automatically defined for all defined types:
A'first -- this has the value -5
A'last -- this has the value +5
A'range -- returns the range -5..+5 which can be used e.g. in for loops
JDBC (not a language, but an API)
String x = resultSet.getString(1); // the first column
Erlang's tuples and lists index starting at 1.
Lua - disappointingly
Found one - Lua (programming language)
Check Arrays section which says -
"Lua arrays are 1-based: the first index is 1 rather than 0 as it is for many other programming languages (though an explicit index of 0 is allowed)"
VB Classic, at least through
Option Base 1
Strings in Delphi start at 1.
(Static arrays must have lower bound specified explicitly. Dynamic arrays always start at 0.)
ColdFusion - even though it is Java under the hood
Ada and Pascal.
PL/SQL. An upshot of this is when using languages that start from 0 and interacting with Oracle you need to handle the 0-1 conversions yourself for array access by index. In practice if you use a construct like foreach over rows or access columns by name, it's not much of an issue, but you might want the leftmost column, for example, which will be column 1.
Indexes start at one in CFML.
The entire Wirthian line of languages including Pascal, Object Pascal, Modula-2, Modula-3, Oberon, Oberon-2 and Ada (plus a few others I've probably overlooked) allow arrays to be indexed from whatever point you like including, obviously, 1.
Erlang indexes tuples and arrays from 1.
I think—but am no longer positive—that Algol and PL/1 both index from 1. I'm also pretty sure that Cobol indexes from 1.
Basically most high level programming languages before C indexed from 1 (with assembly languages being a notable exception for obvious reasons – and the reason C indexes from 0) and many languages from outside of the C-dominated hegemony still do so to this day.
There is also Smalltalk
Visual FoxPro, FoxPro and Clipper all use arrays where element 1 is the first element of an array... I assume that is what you mean by 1-indexed.
I see that the knowledge of fortran here is still on the '66 version.
Fortran has variable both the lower and the upper bounds of an array.
Meaning, if you declare an array like:
real, dimension (90) :: x
then 1 will be the lower bound (by default).
If you declare it like
real, dimension(0,89) :: x
then however, it will have a lower bound of 0.
If on the other hand you declare it like
real, allocatable :: x(:,:)
then you can allocate it to whatever you like. For example
allocate(x(0:np,0:np))
means the array will have the elements
x(0, 0), x(0, 1), x(0, 2 .... np)
x(1, 0), x(1, 1), ...
.
.
.
x(np, 0) ...
There are also some more interesting combinations possible:
real, dimension(:, :, 0:) :: d
real, dimension(9, 0:99, -99:99) :: iii
which are left as homework for the interested reader :)
These are just the ones I remembered off the top of my head. Since one of fortran's main strengths are array handling capabilities, it is clear that there are lot of other in&outs not mentioned here.
Nobody mentioned XPath.
Mathematica and Maxima, besides other languages already mentioned.
informix, besides other languages already mentioned.
Basic - not just VB, but all the old 1980s era line numbered versions.
Richard
FoxPro used arrays starting at index 1.
dBASE used arrays starting at index 1.
Arrays (Beginning) in dBASE
RPG, including modern RPGLE
Although C is by design 0 indexed, it is possible to arrange for an array in C to be accessed as if it were 1 (or any other value) indexed. Not something you would expect a normal C coder to do often, but it sometimes helps.
Example:
#include <stdio.h>
int main(){
int zero_based[10];
int* one_based;
int i;
one_based=zero_based-1;
for (i=1;i<=10;i++) one_based[i]=i;
for(i=10;i>=1;i--) printf("one_based[%d] = %d\n", i, one_based[i]);
return 0;
}

Best strategies for reading J code

I've been using J for a few months now, and I find that reading unfamiliar code (e.g. that I didn't write myself) is one of the most challenging aspects of the language, particularly when it's in tacit. After a while, I came up with this strategy:
1) Copy the code segment into a word document
2) Take each operator from (1) and place it on a separate line, so that it reads vertically
3) Replace each operator with its verbal description in the Vocabulary page
4) Do a rough translation from J syntax into English grammar
5) Use the translation to identify conceptually related components and separate them with line breaks
6) Write a description of what each component from (5) is supposed to do, in plain English prose
7) Write a description of what the whole program is supposed to do, based on (6)
8) Write an explanation of why the code from (1) can be said to represent the design concept from (7).
Although I learn a lot from this process, I find it to be rather arduous and time-consuming -- especially if someone designed their program using a concept I never encountered before. So I wonder: do other people in the J community have favorite ways to figure out obscure code? If so, what are the advantages and disadvantages of these methods?
EDIT:
An example of the sort of code I would need to break down is the following:
binconv =: +/# ((|.#(2^i.###])) * ]) # ((3&#.)^:_1)
I wrote this one myself, so I happen to know that it takes a numerical input, reinterprets it as a ternary array and interprets the result as the representation of a number in base-2 with at most one duplication. (e.g., binconv 5 = (3^1)+2*(3^0) -> 1 2 -> (2^1)+2*(2^0) = 4.) But if I had stumbled upon it without any prior history or documentation, figuring out that this is what it does would be a nontrivial exercise.
Just wanted to add to Jordan's Answer : if you don't have box display turned on, you can format things this way explicitly with 5!:2
f =. <.#-:##{/:~
5!:2 < 'f'
┌───────────────┬─┬──────┐
│┌─────────┬─┬─┐│{│┌──┬─┐│
││┌──┬─┬──┐│#│#││ ││/:│~││
│││<.│#│-:││ │ ││ │└──┴─┘│
││└──┴─┴──┘│ │ ││ │ │
│└─────────┴─┴─┘│ │ │
└───────────────┴─┴──────┘
There's also a tree display:
5!:4 <'f'
┌─ <.
┌─ # ─┴─ -:
┌─ # ─┴─ #
──┼─ {
└─ ~ ─── /:
See the vocabulary page for 5!: Representation and also 9!: Global Parameters for changing the default.
Also, for what it's worth, my own approach to reading J has been to retype the expression by hand, building it up from right to left, and looking up the pieces as I go, and using identity functions to form temporary trains when I need to.
So for example:
/:~ i.5
0 1 2 3 4
NB. That didn't tell me anything
/:~ 'hello'
ehllo
NB. Okay, so it sorts. Let's try it as a train:
[ { /:~ 'hello'
┌─────┐
│ehllo│
└─────┘
NB. Whoops. I meant a train:
([ { /:~) 'hello'
|domain error
| ([{/:~)'hello'
NB. Not helpful, but the dictionary says
NB. "{" ("From") wants a number on the left.
(0: { /:~) 'hello'
e
(1: { /:~) 'hello'
h
NB. Okay, it's selecting an item from the sorted list.
NB. So f is taking the ( <. # -: # # )th item, whatever that means...
<. -: # 'hello'
2
NB. ??!?....No idea. Let's look up the words in the dictionary.
NB. Okay, so it's the floor (<.) of half (-:) the length (#)
NB. So the whole phrase selects an item halfway through the list.
NB. Let's test to make sure.
f 'radar' NB. should return 'd'
d
NB. Yay!
addendum:
NB. just to be clear:
f 'drara' NB. should also return 'd' because it sorts first
d
Try breaking the verb up into its components first, and then see what they do. And rather than always referring to the vocab, you could simply try out a component on data to see what it does, and see if you can figure it out. To see the structure of the verb, it helps to know what parts of speech you're looking at, and how to identify basic constructions like forks (and of course, in larger tacit constructions, separate by parentheses). Simply typing the verb into the ijx window and pressing enter will break down the structure too, and probably help.
Consider the following simple example: <.#-:##{/:~
I know that <. -: # { and /: are all verbs, ~ is an adverb, and # is a conjunction (see the parts of speech link in the vocab). Therefore I can see that this is a fork structure with left verb <.#-:## , right verb /:~ , and dyad { . This takes some practice to see, but there is an easier way, let J show you the structure by typing it into the ijx window and pressing enter:
<.#-:##{/:~
+---------------+-+------+
|+---------+-+-+|{|+--+-+|
||+--+-+--+|#|#|| ||/:|~||
|||<.|#|-:|| | || |+--+-+|
||+--+-+--+| | || | |
|+---------+-+-+| | |
+---------------+-+------+
Here you can see the structure of the verb (or, you will be able to after you get used to looking at these). Then, if you can't identify the pieces, play with them to see what they do.
10?20
15 10 18 7 17 12 19 16 4 2
/:~ 10?20
1 4 6 7 8 10 11 15 17 19
<.#-:## 10?20
5
You can break them down further and experiment as needed to figure them out (this little example is a median verb).
J packs a lot of code into a few characters and big tacit verbs can look very intimidating, even to experienced users. Experimenting will be quicker than your documenting method, and you can really learn a lot about J by trying to break down large complex verbs. I think I'd recommend focusing on trying to see the grammatical structure and then figure out the pieces, building it up step by step (since that's how you'll eventually be writing tacit verbs).
(I'm putting this in the answer section instead of editing the question because the question looks long enough as it is.)
I just found an excellent paper on the jsoftware website that works well in combination with Jordan's answer and the method I described in the question. The author makes some pertinent observations:
1) A verb modified by an adverb is a verb.
2) A train of more than three consecutive verbs is a series of forks, which may have a single verb or a hook at the far left-hand side depending on how many verbs there are.
This speeds up the process of translating a tacit expression into English, since it lets you group verbs and adverbs into conceptual units and then use the nested fork structure to quickly determine whether an instance of an operator is monadic or dyadic. Here's an example of a translation I did using the refined method:
d28=: [:+/\{.#],>:#[#(}.-}:)#]%>:#[
[: +/\
{.#] ,
>:#[ #
(}.-}:)#] %
>:#[
cap (plus infix prefix)
(head atop right argument) ravel
(increment atop left argument) tally
(behead minus curtail) atop right
argument
divided by
increment atop left argument
the partial sums of the sequence
defined by
the first item of the right argument,
raveled together with
(one plus the left argument) copies
of
(all but the first element) minus
(all but the last element)
of the right argument, divided by
(one plus the left argument).
the partial sums of the sequence
defined by
starting with the same initial point,
and appending consecutive copies of
points derived from the right argument by
subtracting each predecessor from its
successor
and dividing the result by the number
of copies to be made
Interpolating x-many values between
the items of y
I just want to talk about how I read:
<.#-:##{/:~
First off, I knew that if it was a function, from the command line, it had to be entered (for testing) as
(<.#-:##{/:~)
Now I looked at the stuff in the parenthesis. I saw a /:~, which returns a sorted list of its arguments, { which selects an item from a list, # which returns the number of items in a list, -: half, and <., floor...and I started to think that it might be median, - half of the number of items in the list rounded down, but how did # get its arguments? I looked at the # signs - and realized that there were three verbs there - so this is a fork. The list comes in at the right and is sorted, then at the left, the fork got the list to the # to get the number of arguments, and then we knew it took the floor of half of that. So now we have the execution sequence:
sort, and pass the output to the middle verb as the right argument.
Take the floor of half of the number of elements in the list, and that becomes the left argument of the middle verb.
Do the middle verb.
That is my approach. I agree that sometimes the phrases have too many odd things, and you need to look them up, but I am always figuring this stuff out at the J instant command line.
Personally, I think of J code in terms of what it does -- if I do not have any example arguments, I rapidly get lost. If I do have examples, it's usually easy for me to see what a sub-expression is doing.
And, when it gets hard, that means I need to look up a word in the dictionary, or possibly study its grammar.
Reading through the prescriptions here, I get the idea that this is not too different from how other people work with the language.
Maybe we should call this 'Test Driven Comprehension'?

Where did string escape codes (\n, \t...) originate?

Purely wondering... since they're still around and in use in C# today...
Where did the pattern of using string escape codes come from? What language did it first appear in? What languages, if any, have solved the problem in a different way?
I suspect that these escape codes originated in B, a high-level assembly programming language for the Honeywell 6000 GCOS operating system. This language was developed at Bell Labs based on a British language called BCPL. Because BCPL was rather wordy, the B developers simplified the syntax and added things like braces to replace BEGIN and END. That's where the name B came from, because it was an abbreviated form of BCPL.
Later on some people at Bell Labs created a language that was the successor to B, mainly by adding typing and a standard I/O library. Because it was B's successor, they chose the next letter in the name BCPL.
I do not recall seeing the backslash notation before B, and since C and UNIX inherited it from B, I thing that B is the origin of this notation, or more specifically, that Bell Labs was the origin. It's entirely possible that this notation was used in other Bell Labs software before B, since they were a prolific producer of software, much of which was distributed freely to universities such as the one which I attended in the mid 1970's.
By the way, the idea of an escape sequence existed long before that, dating back to the 19th century Baudot code which was a fixed length 5 bit binary code intended to replace variable length Morse code. Baudot had SI (Shift In) and SO (Shift Out) codes that escaped letters into their capital variation, just like the Shift key on a typewriter.

Resources