Strings from a machine language perspective - string

From a low-level assembly and architecture perspective, how are strings handled in instructions differently from numbers? I am trying to understand how compilers work from online courses but ultimately don't quite understand how an architecture that has a word length of, say 64 bit, can understand and relate multiple characters of Unicode that comprise one string but that do not fit in one single instruction. Do strings have completely independent instructions for them based on architecture that seeks a null byte at the end of the string? Understanding this one thing would make compilers much simpler to understand, I believe. Thanks!

The low-level machine instruction languages have a few string instructions (like, CMPS,MOVS, etc.). These instructions, and the prefixes like REP, REPNZ, etc. are all predicated on the user knowing the length of the string a-priori, so effectively these execute the same instruction on each byte/word/dword/qword repeatedly. These instructions are faster than their manually encoded loops, simply because these instructions also trigger read-ahead of the memory.
These do not assume any NULL termination condition, or some other terminating condition. These also have absolutely no understanding of whether the data that is being treated as a string is a Unicode, or an ASCII, or any such specific format. These are all language specific conventions.
There are many programming techniques that can quickly determine the length of the string as long as it is terminated by a known sentinel (NULL, or a "full-stop") using the same instruction sets.
So, to summarize, the low-level architecture focuses on automating repetitiveness of the sequentially stored data handling process, but does not treat the strings "differently" from a number in any other manner.
Trust this helps.

Related

Strings and Strands in MoarVM

When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program? In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).
If there isn't any way to access this info at runtime, is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?
Finally, does MoarVM manage the number of Strands in a given Str? I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n). For example,
(^$n).map({~rand}).join
seems like it would create a Str with a length proportional to $n that consists of $n Strands – and, if I'm understanding the datastructure correctly, that means that into this Str would require checking the length of each Strand, for a time complexity of O(n). But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case? Or have I misunderstood something more basic?
When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program?
My educated guess is yes, as described below for App::MoarVM modules. That said, my education came from a degree I started at the Unseen University, and a wizard had me expelled for guessing too much, so...
In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).
I'm 99.99% sure strands are purely an implementation detail of the backend, and there'll be no Raku or NQP access to that information without MoarVM specific tricks. That said, read on.
If there isn't any way to access this info at runtime
I can see there is access at runtime via MoarVM.
is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?
I'm 99.99% sure there are multiple ways.
For example, there's a bunch of strand debugging code in MoarVM's ops.c file starting with #define MVM_DEBUG_STRANDS ....
Perhaps more interesting are what appears to be a veritable goldmine of sophisticated debugging and profiling features built into MoarVM. Plus what appear to be Rakudo specific modules that drive those features, presumably via Raku code. For a dozen or so articles discussing some aspects of those features, I suggest reading timotimo's blog. Browsing github I see ongoing commits related to MoarVM's debugging features for years and on into 2021.
Finally, does MoarVM manage the number of Strands in a given Str?
Yes. I can see that the string handling code (some links are below), which was written by samcv (extremely smart and careful) and, I believe, reviewed by jnthn, has logic limiting the number of strands.
I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n).
Yes, if a backend that supported strands did not manage the number of strands.
But for MoarVM I think the intent is to set an absolute upper bound with #define MVM_STRING_MAX_STRANDS 64 in MoarVM's MVMString.h file, and logic that checks against that (and other characteristics of strings; see this else if statement as an exemplar). But the logic is sufficiently complex, and my C chops sufficiently meagre, that I am nowhere near being able to express confidence in that, even if I can say that that appears to be the intent.
For example, (^$n).map({~rand}).join seems like it would create a Str with a length proportional to $n that consists of $n Strands
I'm 95% confident that the strings constructed by simple joins like that will be O(1).
This is based on me thinking that a Raku/NQP level string join operation is handled by MVM_string_join, and my attempts to understand what that code does.
But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case?
If you read the code you will find it's doing very sophisticated handling.
Or have I misunderstood something more basic?
I'm pretty sure I will have misunderstood something basic so I sure ain't gonna comment on whether you have. :)
As far as I understand it, the fact that MoarVM implements strands (aka, a concatenating two strings will only result in creation of a strand that consists of "references" to the original strings), is really that: an implementation detail.
You can implement the Raku Programming Language without needing to implement strands. Therefore there is no way to introspect this, at least to my knowledge.
There has been a PR to expose the nqp:: op that would actually concatenate strands into a single string, but that has been refused / closed: https://github.com/rakudo/rakudo/pull/3975

Is rbp/ebp(x86-64) register still used in conventional way?

I have been writing a small kernel lately based on x86-64 architecture. When taking care of some user space code, I realized I am virtually not using rbp. I then looked up at some other things and found out compilers are getting smarter these days and they really dont use rbp anymore. (I could be wrong here.)
I was wondering if conventional use of rbp/epb is not required anymore in many instances or am I wrong here. If that kind of usage is not required then can it be used like a general purpose register?
Thanks
It is only needed if you have variable-length arrays in your stack frames (recording the array length would require more memory and more computations). It is no longer needed for unwinding because there is now metadata for that.
It is still useful if you are hand-writing entire assembly functions, but who does that? Assembly should only be used as glue to jump into a C (or whatever) function.

Website with function equivalents for various languages

When learning (or relearning) a language, a significant amount of time goes into learning the functions for doing basic operations. For example, suppose I want to reverse a String. In one language, it may be simple as myString.reverse(). In Python, it is myString[::-1]. In other languages, you may have to create an array, iterate through the string and add all the characters in reverse order and then convert it back to a string. What would be extremely useful would be a reference so that if you know the name of the function in one language, then I could find the equivalent in another. Googling or searching StackOverflow don't seem to solve this problem very well at the moment, as you have to usually try a large number of different queries. I guess I am thinking of some kind of Wiki system. Are there any websites that do this?
It sounds like you're looking for Rosetta Code. There is in fact a page on reversing a string.

What is the worst programming language you ever worked with? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 13 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
If you have an interesting story to
share, please post an answer, but
do not abuse this question for bashing
a language.
We are programmers, and our primary tool is the programming language we use.
While there is a lot of discussion about the best one, I'd like to hear your stories about
the worst programming languages you ever worked with and I'd like to know exactly what annoyed you.
I'd like to collect this stories partly to avoid common pitfalls while designing a language (especially a DSL) and partly to avoid quirky languages in the future in general.
This question is not subjective. If a language supports only single character identifiers (see my own answer) this is bad in a non-debatable way.
EDIT
Some people have raised concerns that this question attracts trolls.
Wading through all your answers made one thing clear.
The large majority of answers is appropriate, useful and well written.
UPDATE 2009-07-01 19:15 GMT
The language overview is now complete, covering 103 different languages from 102 answers.
I decided to be lax about what counts as a programming language and included
anything reasonable. Thank you David for your comments on this.
Here are all programming languages covered so far
(alphabetical order, linked with answer, new entries in bold):
ABAP,
all 20th century languages,
all drag and drop languages,
all proprietary languages,
APF,
APL
(1),
AS400,
Authorware,
Autohotkey,
BancaStar,
BASIC,
Bourne Shell,
Brainfuck,
C++,
Centura Team Developer,
Cobol
(1),
Cold Fusion,
Coldfusion,
CRM114,
Crystal Syntax,
CSS,
Dataflex 2.3,
DB/c DX,
dbase II,
DCL,
Delphi IDE,
Doors DXL,
DOS batch
(1),
Excel Macro language,
FileMaker,
FOCUS,
Forth,
FORTRAN,
FORTRAN 77,
HTML,
Illustra web blade,
Informix 4th Generation Language,
Informix Universal Server web blade,
INTERCAL,
Java,
JavaScript
(1),
JCL
(1),
karol,
LabTalk,
Labview,
Lingo,
LISP,
Logo,
LOLCODE,
LotusScript,
m4,
Magic II,
Makefiles,
MapBasic,
MaxScript,
Meditech Magic,
MEL,
mIRC Script,
MS Access,
MUMPS,
Oberon,
object extensions to C,
Objective-C,
OPS5,
Oz,
Perl
(1),
PHP,
PL/SQL,
PowerDynamo,
PROGRESS 4GL,
prova,
PS-FOCUS,
Python,
Regular Expressions,
RPG,
RPG II,
Scheme,
ScriptMaker,
sendmail.conf,
Smalltalk,
Smalltalk ,
SNOBOL,
SpeedScript,
Sybase PowerBuilder,
Symbian C++,
System RPL,
TCL,
TECO,
The Visual Software Environment,
Tiny praat,
TransCAD,
troff,
uBasic,
VB6
(1),
VBScript
(1),
VDF4,
Vimscript,
Visual Basic
(1),
Visual C++,
Visual Foxpro,
VSE,
Webspeed,
XSLT
The answers covering 80386 assembler, VB6 and VBScript have been removed.
PHP (In no particular order)
Inconsistent function names and argument orders
Because there are a zillion functions, each one of which seems to use a different naming convention and argument order. "Lets see... is it foo_bar or foobar or fooBar... and is it needle, haystack or haystack, needle?" The PHP string functions are a perfect example of this. Half of them use str_foo and the other half use strfoo.
Non-standard date format characters
Take j for example
In UNIX (which, by the way, is what everyone else uses as a guide for date string formats) %j returns the day of the year with leading zeros.
In PHP's date function j returns the day of the month without leading zeros.
Still No Support for Apache 2.0 MPM
It's not recommended.
Why isn't this supported? "When you make the underlying framework more complex by not having completely separate execution threads, completely separate memory segments and a strong sandbox for each request to play in, feet of clay are introduced into PHP's system." Link So... it's not supported 'cause it makes things harder? 'Cause only the things that are easy are worth doing right? (To be fair, as Emil H pointed out, this is generally attributed to bad 3rd-party libs not being thread-safe, whereas the core of PHP is.)
No native Unicode support
Native Unicode support is slated for PHP6
I'm sure glad that we haven't lived in a global environment where we might have need to speak to people in other languages for the past, oh 18 years. Oh wait. (To be fair, the fact that everything doesn't use Unicode in this day and age really annoys me. My point is I shouldn't have to do any extra work to make Unicode happen. This isn't only a PHP problem.)
I have other beefs with the language. These are just some.
Jeff Atwood has an old post about why PHP sucks. He also says it doesn't matter. I don't agree but there we are.
XSLT.
XSLT is baffling, to begin with. The metaphor is completely different from anything else I know.
The thing was designed by a committee so deep in angle brackets that it comes off as a bizarre frankenstein.
The weird incantations required to specify the output format.
The built-in, invisible rules.
The odd bolt-on stuff, like scripts.
The dependency on XPath.
The tools support has been pretty slim, until lately. Debugging XSLT in the early days was an exercise in navigating in complete darkness. The tools change that but, still XSLT tops my list.
XSLT is weird enough that most people just ignore it. If you must use it, you need an XSLT Shaman to give you the magic incantations to make things go.
DOS Batch files. Not sure if this qualifies as programming language at all.
It's not that you can't solve your problems, but if you are used to bash...
Just my two cents.
Not sure if its a true language, but I hate Makefiles.
Makefiles have meaningful differences between space and TAB, so even if two lines appear identical, they do not run the same.
Make also relies on a complex set of implicit rules for many languages, which are difficult to learn, but then are frequently overridden by the make file.
A Makefile system is typically spread over many, many files, across many directories.
With virtually no scoping or abstraction, a change to a make file several directories away can prevent my source from building. Yet the error message is invariably a compliation error, not a meaningful error about make, or the makefiles.
Any environment I've worked in that uses makefiles successfully has a full-time Make expert. And all this to shave a few minutes off compilation??
The worse language I've ever seen come from the tool praat, which is a good audio analysis tool. It does a pretty good job until you use the script language. sigh bad memories.
Tiny praat script tutorial for beginners
Function call
We've listed at least 3 different function calling syntax :
The regular one
string = selected("Strings")
Nothing special here, you assign to the variable string the result of the selected function. Not really scary... yet.
The "I'm invoking some GUI command with parameters"
Create Strings as file list... liste 'path$'/'type$'
As you can see, the function name start at "Create" and finish with the "...". The command "Create Strings as file list" is the text displayed on a button or a menu (I'm to scared to check) on praat. This command take 2 parameters liste and an expression. I'm going to look deeper in the expression 'path$'/'type$'
Hmm. Yep. No spaces. If spaces were introduced, it would be separate arguments. As you can imagine, parenthesis don't work. At this point of the description I would like to point out the suffix of the variable names. I won't develop it in this paragraph, I'm just teasing.
The "Oh, but I want to get the result of the GUI command in my variable"
noliftt = Get number of strings
Yes we can see a pattern here, long and weird function name, it must be a GUI calling. But there's no '...' so no parameters. I don't want to see what the parser looks like.
The incredible type system (AKA Haskell and OCaml, praat is coming to you)
Simple natives types
windowname$ = left$(line$,length(line$)-4)
So, what's going on there?
It's now time to look at the convention and types of expression, so here we got :
left$ :: (String, Int) -> String
lenght :: (String) -> Int
windowname$ :: String
line$ :: String
As you can see, variable name and function names are suffixed with their type or return type. If their suffix is a '$', then it return a string or is a string. If there is nothing it's a number. I can see the point of prefixing the type to a variable to ease implementation, but to suffix, no sorry, I can't
Array type
To show the array type, let me introduce a 'tiny' loop :
for i from 1 to 4
Select... time time
bandwidth'i'$ = Get bandwidth... i
forhertz'i'$ = Get formant... i
endfor
We got i which is a number and... (no it's not a function)
bandwidth'i'$
What it does is create string variables : bandwidth1$, bandwidth2$, bandwidth3$, bandwidth4$ and give them values. As you can expect, you can't create two dimensional array this way, you must do something like that :
band2D__'i'__'j'$
The special string invocation
outline$ = "'time'#F'i':'forhertznum'Hz,'bandnum'Hz, 'spec''newline$'"
outline$ >> 'outfile$'
Strings are weirdly (at least) handled in the language. the '' is used to call the value of a variable inside the global "" string. This is _weird_. It goes against all the convention built into many languages from bash to PHP passing by the powershell. And look, it even got redirection. Don't be fooled, it doesn't work like in your beloved shell. No you have to get the variable value with the ''
Da Wonderderderfulful execution model
I'm going to put the final touch to this wonderderderfulful presentation by talking to you about the execution model. So as in every procedural languages you got instruction executed from top to bottom, there is the variables and the praat GUI. That is you code everything on the praat gui, you invoke commands written on menu/buttons.
The main window of praat contain a list of items which can be :
files
list of files (created by a function with a wonderderfulful long long name)
Spectrogramm
Strings (don't ask)
So if you want to perform operation on a given file, you must select the file in the list programmatically and then push the different buttons to take some actions. If you wanted to pass parameters to a GUI action, you have to follow the GUI layout of the form for your arguments, for example "To Spectrogram... 0.005 5000 0.002 20 Gaussian
" is like that because it follows this layout:
Needless to say, my nightmares are filled with praat scripts dancing around me and shouting "DEBUG MEEEE!!".
More information at the praat site, under the well-named section "easy programmable scripting language"
Well since this question refuses to die and since the OP did prod me into answering...
I humbly proffer for your consideration Authorware (AW) as the worst language it is possible to create. (n.b. I'm going off recollection here, it's been ~6 years since I used AW, which of course means there's a number of awful things I can't even remember)
the horror, the horror http://img.brothersoft.com/screenshots/softimage/a/adobe_authorware-67096-1.jpeg
Let's start with the fact that it's a Macromedia product (-10 points), a proprietary language (-50 more) primarily intended for creating e-learning software and moreover software that could be created by non-programmers and programmers alike implemented as an iconic language AND a text language (-100).
Now if that last statement didn't scare you then you haven't had to fix WYSIWYG generated code before (hello Dreamweaver and Frontpage devs!), but the salient point is that AW had a library of about 12 or so elements which could be dragged into a flow. Like "Page" elements, Animations, IFELSE, and GOTO (-100). Of course removing objects from the flow created any number of broken connections and artifacts which the IDE had variable levels of success coping with. Naturally the built in wizards (-10) were a major source of these.
Fortunately you could always step into a code view, and eventually you'd have to because with a limited set of iconic elements some things just weren't possible otherwise. The language itself was based on TUTOR (-50) - a candidate for worst language itself if only it had the ambition and scope to reach the depths AW would strive for - about which wikipedia says:
...the TUTOR language was not easy to
learn. In fact, it was even suggested
that several years of experience with
the language would be required before
programmers could build programs worth
keeping.
An excellent foundation then, which was built upon in the years before the rise of the internet with exactly nothing. Absolutely no form of data structure beyond an array (-100), certainly no sugar (real men don't use switch statements?) (-10), and a large splash of syntactic vinegar ("--" was the comment indicator so no decrement operator for you!) (-10). Language reference documentation was provided in paper or zip file formats (-100), but at least you had the support of the developer run usegroup and could quickly establish the solution to your problem was to use the DLL or SWF importing features of AW to enable you to do the actual coding in a real language.
AW was driven by a flow (with necessary PAUSE commands) and therefore has all the attendant problems of a linear rather than event based system (-50), and despite the outright marketing lies of the documentation it was not object oriented (-50) either. All code reuse was achieved through GOTO. No scope, lots of globals (-50).
It's not the language's fault directly, but obviously no source control integration was possible, and certainly no TDD, documentation generation or any other add-on tool you might like.
Of course Macromedia met the challenge of the internet head on with a stubborn refusal to engage for years, eventually producing the buggy, hard to use, security nightmare which is Shockwave (-100) to essentially serialise desktop versions of the software through a required plugin (-10). AS HTML rose so did AW stagnate, still persisting with it's shockwave delivery even in the face of IEEE SCORM javascript standards.
Ultimately after years of begging and promises Macromedia announced a radical new version of AW in development to address these issues, and a few years later offshored the development and then cancelled the project. Although of course Macromedia are still selling it (EVIL BONUS -500).
If anything else needs to be said, this is the language which allows spaces in variable names (-10000).
If you ever want to experience true pain, try reading somebody else's uncommented hungarian notation in a language which isn't case sensitive and allows variable name spaces.
Total Annakata Arbitrary Score (AAS): -11300
Adjusted for personal experience: OutOfRangeException
(apologies for length, but it was cathartic)
Seriously: Perl.
It's just a pain in the ass to code with for beginners and even for semi-professionals which work with perl on a daily basis. I can constantly see my colleagues struggle with the language, building the worst scripts, like 2000 lines with no regard of any well accepted coding standard. It's the worst mess i've ever seen in programming.
Now, you can always say, that those people are bad in coding (despite the fact that some of them have used perl for a lot of years, now), but the language just encourages all that freaking shit that makes me scream when i have to read a script by some other guy.
MS Access Visual Basic for Applications (VBA) was also pretty bad. Access was bad altogether in that it forced you down a weak paradigm and was deceptively simple to get started, but a nightmare to finish.
No answer about Cobol yet? :O
Old-skool BASICs with line numbers would be my choice. When you had no space between line numbers to add new lines, you had to run a renumber utility, which caused you to lose any mental anchors you had to what was where.
As a result, you ended up squeezing in too many statements on a single line (separated by colons), or you did a goto or gosub somewhere else to do the work you couldn't cram in.
MUMPS
I worked in it for a couple years, but have done a complete brain dump since then. All I can really remember was no documentation (at my location) and cryptic commands.
It was horrible. Horrible! HORRIBLE!!!
There are just two kinds of languages: the ones everybody complains about and the ones nobody uses.
Bjarne Stroustrup
I haven't yet worked with many languages and deal mostly with scripting languages; out of these VBScript is the one I like least. Although it has some handy features, some things really piss me off:
Object assignments are made using the Set keyword:
Set foo = Nothing
Omitting Set is one of the most common causes of run-time errors.
No such thing as structured exception handling. Error checking is like this:
On Error Resume Next
' Do something
If Err.Number <> 0
' Handle error
Err.Clear
End If
' And so on
Enclosing the procedure call parameters in parentheses requires using the Call keyword:
Call Foo (a, b)
Its English-like syntax is way too verbose. (I'm a fan of curly braces.)
Logical operators are long-circuit. If you need to test a compound condition where the subsequent condition relies on the success of the previous one, you need to put conditions into separate If statements.
Lack of parameterized class constructors.
To wrap a statement into several lines, you have to use an underscore:
str = "Hello, " & _
"world!"
Lack of multiline comments.
Edit: found this article: The Flangy Guide to Hating VBScript. The author sums up his complaints as "VBS isn't Python" :)
Objective-C.
The annotations are confusing, using brackets to call methods still does not compute in my brain, and what is worse is that all of the library functions from C are called using the standard operators in C, -> and ., and it seems like the only company that is driving this language is Apple.
I admit I have only used the language when programming for the iPhone (and looking into programming for OS X), but it feels as if C++ were merely forked, adding in annotations and forcing the implementation and the header files to be separate would make much more sense.
PROGRESS 4GL (apparently now known as "OpenEdge Advanced Business Language").
PROGRESS is both a language and a database system. The whole language is designed to make it easy to write crappy green-screen data-entry screens. (So start by imagining how well this translates to Windows.) Anything fancier than that, whether pretty screens, program logic, or batch processing... not so much.
I last used version 7, back in the late '90s, so it's vaguely possible that some of this is out-of-date, but I wouldn't bet on it.
It was originally designed for text-mode data-entry screens, so on Windows, all screen coordinates are in "character" units, which are some weird number of pixels wide and a different number of pixels high. But of course they default to a proportional font, so the number of "character units" doesn't correspond to the actual number of characters that will fit in a given space.
No classes or objects.
No language support for arrays or dynamic memory allocation. If you want something resembling an array, you create a temporary in-memory database table, define its schema, and then get a cursor on it. (I saw a bit of code from a later version, where they actually built and shipped a primitive object-oriented system on top of these in-memory tables. Scary.)
ISAM database access is built in. (But not SQL. Who needs it?) If you want to increment the Counter field in the current record in the State table, you just say State.Counter = State.Counter + 1. Which isn't so bad, except...
When you use a table directly in code, then behind the scenes, they create something resembling an invisible, magic local variable to hold the current cursor position in that table. They guess at which containing block this cursor will be scoped to. If you're not careful, your cursor will vanish when you exit a block, and reset itself later, with no warning. Or you'll start working with a table and find that you're not starting at the first record, because you're reusing the cursor from some other block (or even your own, because your scope was expanded when you didn't expect it).
Transactions operate on these wild-guess scopes. Are we having fun yet?
Everything can be abbreviated. For some of the offensively long keywords, this might not seem so bad at first. But if you have a variable named Index, you can refer to it as Index or as Ind or even as I. (Typos can have very interesting results.) And if you want to access a database field, not only can you abbreviate the field name, but you don't even have to qualify it with the table name; they'll guess the table too. For truly frightening results, combine this with:
Unless otherwise specified, they assume everything is a database access. If you access a variable you haven't declared yet (or, more likely, if you mistype the variable name), there's no compiler error: instead, it goes looking for a database field with that name... or a field that abbreviates to that name.
The guessing is the worst. Between the abbreviations and the field-by-default, you could get some nasty stuff if you weren't careful. (Forgot to declare I as a local variable before using it as a loop variable? No problem, we'll just randomly pick a table, grab its current record, and completely trash an arbitrarily-chosen field whose name starts with I!)
Then add in the fact that an accidental field-by-default access could change the scope it guessed for your tables, thus breaking some completely unrelated piece of code. Fun, yes?
They also have a reporting system built into the language, but I have apparently repressed all memories of it.
When I got another job working with Netscape LiveWire (an ill-fated attempt at server-side JavaScript) and classic ASP (VBScript), I was in heaven.
The worst language? BancStar, hands down.
3,000 predefined variables, all numbered, all global. No variable declaration, no initialization. Half of them, scattered over the range, reserved for system use, but you can use them at your peril. A hundred or so are automatically filled in as a result of various operations, and no list of which ones those are. They all fit in 38k bytes, and there is no protection whatsoever for buffer overflow. The system will cheerfully let users put 20 bytes in a ten byte field if you declared the length of an input field incorrectly. The effects are unpredictable, to say the least.
This is a language that will let you declare a calculated gosub or goto; due to its limitations, this is frequently necessary. Conditionals can be declared forward or reverse. Picture an "If" statement that terminates 20 lines before it begins.
The return stack is very shallow, (20 Gosubs or so) and since a user's press of any function key kicks off a different subroutine, you can overrun the stack easily. The designers thoughtfully included a "Clear Gosubs" command to nuke the stack completely in order to fix that problem and to make sure you would never know exactly what the program would do next.
There is much more. Tens of thousands of lines of this Lovecraftian horror.
The .bat files scripting language on DOS/Windows. God only knows how un-powerful is this one, specially if you compare it to the Unix shell languages (that aren't so powerful either, but way better nonetheless).
Just try to concatenate two strings or make a for loop. Nah.
VSE, The Visual Software Environment.
This is a language that a prof of mine (Dr. Henry Ledgard) tried to sell us on back in undergrad/grad school. (I don't feel bad about giving his name because, as far as I can tell, he's still a big proponent and would welcome the chance to convince some folks it's the best thing since sliced bread). When describing it to people, my best analogy is that it's sort of a bastard child of FORTRAN and COBOL, with some extra bad thrown in. From the only really accessible folder I've found with this material (there's lots more in there that I'm not going to link specifically here):
VSE Overview (pdf)
Chapter 3: The VSE Language (pdf) (Not really an overview of the language at all)
Appendix: On Strings and Characters (pdf)
The Software Survivors (pdf) (Fevered ramblings attempting to justify this turd)
VSE is built around what they call "The Separation Principle". The idea is that Data and Behavior must be completely segregated. Imagine C's requirement that all variables/data must be declared at the beginning of the function, except now move that declaration into a separate file that other functions can use as well. When other functions use it, they're using the same data, not a local copy of data with the same layout.
Why do things this way? We learn that from The Software Survivors that Variable Scope Rules Are Hard. I'd include a quote but, like most fools, it takes these guys forever to say anything. Search that PDF for "Quagmire Of Scope" and you'll discover some true enlightenment.
They go on to claim that this somehow makes it more suitable for multi-proc environments because it more closely models the underlying hardware implementation. Riiiight.
Another choice theme that comes up frequently:
INCREMENT DAY COUNT BY 7 (or DAY COUNT = DAY COUNT + 7)
DECREMENT TOTAL LOSS BY GROUND_LOSS
ADD 100.3 TO TOTAL LOSS(LINK_POINTER)
SET AIRCRAFT STATE TO ON_THE_GROUND
PERCENT BUSY = (TOTAL BUSY CALLS * 100)/TOTAL CALLS
Although not earthshaking, the style
of arithmetic reflects ordinary usage,
i.e., anyone can read and understand
it - without knowing a programming
language. In fact, VisiSoft arithmetic
is virtually identical to FORTRAN,
including embedded complex arithmetic.
This puts programmers concerned with
their professional status and
corresponding job security ill at
ease.
Ummm, not that concerned at all, really. One of the key selling points that Bill Cave uses to try to sell VSE is the democratization of programming so that business people don't need to indenture themselves to programmers who use crazy, arcane tools for the sole purpose of job security. He leverages this irrational fear to sell his tool. (And it works-- the federal gov't is his biggest customer). I counted 17 uses of the phrase "job security" in the document. Examples:
... and fit only for those desiring artificial job security.
More false job security?
Is job security dependent upon ensuring the other guy can't figure out what was done?
Is job security dependent upon complex code...?
One of the strongest forces affecting the acceptance of new technology is the perception of one's job security.
He uses this paranoia to drive wedge between the managers holding the purse strings and the technical people who have the knowledge to recognize VSE for the turd that it is. This is how he squeezes it into companies-- "Your technical people are only saying it sucks because they're afraid it will make them obsolete!"
A few additional choice quotes from the overview documentation:
Another consequence of this approach
is that data is mapped into memory
on a "What You See Is What You Get"
basis, and maintained throughout.
This allows users to move a complete
structure as a string of characters
into a template that descrives each
individual field. Multiple templates
can be redefined for a given storage
area. Unlike C and other languages,
substructures can be moved without the problems of misalignment due to
word boundary alignment standards.
Now, I don't know about you, but I know that a WYSIWYG approach to memory layout is at the top of my priority list when it comes to language choice! Basically, they ignore alignment issues because only old languages that were designed in the '60's and '70's care about word alignment. Or something like that. The reasoning is bogus. It made so little sense to me that I proceeded to forget it almost immediately.
There are no user-defined types in VSE. This is a far-reaching
decision that greatly simplifies the
language. The gain from a practical
point of view is also great. VSE
allows the designer and programmer to
organize a program along the same
lines as a physical system being
modeled. VSE allows structures to be
built in an easy-to-read, logical
attribute hierarchy.
Awesome! User-defined types are lame. Why would I want something like an InputMessage object when I can have:
LINKS_IN_USE INTEGER
INPUT_MESSAGE
1 ORIGIN INTEGER
1 DESTINATION INTEGER
1 MESSAGE
2 MESSAGE_HEADER CHAR 10
2 MESSAGE_BODY CHAR 24
2 MESSAGE_TRAILER CHAR 10
1 ARRIVAL_TIME INTEGER
1 DURATION INTEGER
1 TYPE CHAR 5
OUTPUT_MESSAGE CHARACTER 50
You might look at that and think, "Oh, that's pretty nicely formatted, if a bit old-school." Old-school is right. Whitespace is significant-- very significant. And redundant! The 1's must be in column 3. The 1 indicates that it's at the first level of the hierarchy. The Symbol name must be in column 5. You hierarchies are limited to a depth of 9.
Well, ok, but is that so awful? Just wait:
It is well known that for reading
text, use of conventional upper/lower
case is more readable. VSE uses all
upper case (except for comments). Why?
The literature in psychology is based
on prose. Programs, simply, are not
prose. Programs are more like math,
accounting, tables. Program fonts
(usually Courier) are almost
universally fixed-pitch, and for good
reason – vertical alignment among
related lines of code. Programs in
upper case are nicely readable, and,
after a time, much better in our
opinion
Nothing like enforcing your opinion at the language level! That's right, you cannot use any lower case in VSE unless it's in a comment. Just keep your CAPSLOCK on, it's gonna be stuck there for a while.
VSE subprocedures are called processes. This code sample contains three processes:
PROCESS_MUSIC
EXECUTE INITIALIZE_THE_SCENE
EXECUTE PROCESS_PANEL_WIDGET
INITIALIZE_THE_SCENE
SET TEST_BUTTON PANEL_BUTTON_STATUS TO ON
MOVE ' ' TO TEST_INPUT PANEL_INPUT_TEXT
DISPLAY PANEL PANEL_MUSIC
PROCESS_PANEL_WIDGET
ACCEPT PANEL PANEL_MUSIC
*** CHECK FOR BUTTON CLICK
IF RTG_PANEL_WIDGET_NAME IS EQUAL TO 'TEST_BUTTON'
MOVE 'I LIKE THE BEATLES!' TO TEST_INPUT PANEL_INPUT_TEXT.
DISPLAY PANEL PANEL_MUSIC
All caps as expected. After all, that's easier to read. Note the whitespace. It's significant again. All process names must start in column 0. The initial level of instructions must start on column 4. Deeper levels must be indented exactly 3 spaces. This isn't a big deal, though, because you aren't allowed to do things like nest conditionals. You want a nested conditional? Well just make another process and call it. And note the delicious COBOL-esque syntax!
You want loops? Easy:
EXECUTE NEXT_CALL
EXECUTE NEXT_CALL 5 TIMES
EXECUTE NEXT_CALL TOTAL CALL TIMES
EXECUTE NEXT_CALL UNTIL NO LINES ARE AVAILABLE
EXECUTE NEXT_CALL UNTIL CALLS_ANSWERED ARE EQUAL TO CALLS_WAITING
EXECUTE READ_MESSAGE UNTIL LEAD_CHARACTER IS A DELIMITER
Ugh.
Here is the contribution to my own question:
Origin LabTalk
My all-time favourite in this regard is Origin LabTalk.
In LabTalk the maximum length of a string variable identifier is one character.
That is, there are only 26 string variables at all. Even worse, some of them are used by Origin itself, and it is not clear which ones.
From the manual:
LabTalk uses the % notation to define
a string variable. A legal string
variable name must be a % character
followed by a single alphabetic
character (a letter from A to Z).
String variable names are
caseinsensitive. Of all the 26 string
variables that exist, Origin itself
uses 14.
Doors DXL
For me the second worst in my opinion is Doors DXL.
Programming languages can be divided into two groups:
Those with manual memory management (e.g. delete, free) and those with a garbage collector.
Some languages offer both, but DXL is probably the only language in the world that
supports neither. OK, to be honest this is only true for strings, but hey, strings aren't exactly
the most rarely used data type in requirements engineering software.
The consequence is that memory used by a string can never be reclaimed and
DOORS DXL leaks like sieve.
There are countless other quirks in DXL, just to name a few:
DXL function syntax
DXL arrays
Cold Fusion
I guess it's good for designers but as a programmer I always felt like one hand was tied behind my back.
The worst two languages I've worked with were APL, which is relatively well known for languages of its age, and TECO, the language in which the original Emacs was written. Both are notable for their terse, inscrutable syntax.
APL is an array processing language; it's extremely powerful, but nearly impossible to read, since every character is an operator, and many don't appear on standard keyboards.
TECO had a similar look, and for a similar reason. Most characters are operators, and this special purpose language was devoted to editing text files. It was a little better, since it used the standard character set. And it did have the ability to define functions, which was what gave life to emacs--people wrote macros, and only invoked those after a while. But figuring out what a program did or writing a new one was quite a challenge.
LOLCODE:
HAI
CAN HAS STDIO?
VISIBLE "HAI WORLD!"
KTHXBYE
Seriously, the worst programming language ever is that of Makefiles. Totally incomprehensible, tabs have a syntactic meaning and not even a debugger to find out what's going on.
I'm not sure if you meant to include scripting languages, but I've seen TCL (which is also annoying), but... the mIRC scripting language annoys me to no end.
Because of some oversight in the parsing, it's whitespace significant when it's not supposed to be. Conditional statements will sometimes be executed when they're supposed to be skipped because of this. Opening a block statement cannot be done on a separate line, etc.
Other than that it's just full of messy, inconsistent syntax that was probably designed that way to make very basic stuff easy, but at the same time makes anything a little more complex barely readable.
I lost most of my mIRC scripts, or I could have probably found some good examples of what a horrible mess it forces you to create :(
Regular expressions
It's a write only language, and it's hard to verify if it works correctly for the right inputs.
Visual Foxpro
I can't belive nobody has said this one:
LotusScript
I thinks is far worst than php at least.
Is not about the language itself which follows a syntax similar to Visual Basic, is the fact that it seem to have a lot of functions for extremely unuseful things that you will never (or one in a million times) use, but lack support for things you will use everyday.
I don't remember any concrete example but they were like:
"Ok, I have an event to check whether the mouse pointer is in the upper corner of the form and I don't have an double click event for the Form!!?? WTF??"
Twice I've had to work in 'languages' where you drag-n-dropped modules onto the page and linked them together with lines to show data flow. (One claimed to be a RDBMs, and the other a general purpose data acquisition and number crunching language.)
Just thinking of it makes me what to throttle someone. Or puke. Or both.
Worse, neither exposed a text language that you could hack directly.
I find myself avoid having to use VBScript/Visual Basic 6 the most.
I use primarily C++, javascript, Java for most tasks and dabble in ruby, scala, erlang, python, assembler, perl when the need arises.
I, like most other reasonably minded polyglots/programmers, strongly feel that you have to use the right tool for the job - this requires you to understand your domain and to understand your tools.
My issue with VBscript and VB6 is when I use them to script windows or office applications (the right domain for them) - i find myself struggling with the language (they fall short of being the right tool).
VBScript's lack of easy to use native data structures (such as associative containers/maps) and other quirks (such as set for assignment to objects) is a needless and frustrating annoyance, especially for a scripting language. Contrast it with Javascript (which i now use to program wscript/cscript windows and do activex automation scripts) which is far more expressive. While there are certain things that work better with vbscript (such as passing arrays back and forth from COM objects is slightly easier, although it is easier to pass event handlers into COM components with jscript), I am still surprised by the amount of coders that still use vbscript to script windows - I bet if they wrote the same program in both languages they would find that jscript works with you much more than vbscript, because of jscript's native hash data types and closures.
Vb6/VBA, though a little better than vbscript in general, still has many similar issues where (for their domain) they require much more boiler plate to do simple tasks than what I would like and have seen in other scripting languages.
In 25+ years of computer programming, by far the worst thing I've ever experienced was a derivative of MUMPS called Meditech Magic. It's much more evil than PHP could ever hope to be.
It doesn't even use '=' for assignment! 100^b assigns a value of 100 to b and is read as "100 goes to b". Basically, this language invented its own syntax from top to bottom. So no matter how many programming languages you know, Magic will be a complete mystery to you.
Here is 100 bottles of beer on the wall written in this abomination of a language:
BEERv1.1,
100^b,T("")^#,DO{b'<1 NN(b,"bottle"_IF{b=1 " ";"s "}_"of beer on the wall")^#,
N(b,"bottle"_IF{b=1 " ";"s "}_"of beer!")^#,
N("You take one down, pass it around,")^#,b-1^b,
N(b,"bottle"_IF{b=1 " ";"s "}_"of beer on the wall!")^#},
END;
TCL. It only compiles code right before it executes, so it's possible that if your code never went down branch A while testing, and one day, in the field it goes down branch A, it could have a SYNTAX ERROR!

Modifying C++ DLL to support unicode - common pitfalls to avoid?

I have a windows DLL that currently only supports ASCII and I need to update it to work with Unicode strings. This DLL currently uses char* strings in a number of places, along with making a number of ASCII Windows API calls (like GetWindowTextA, RegQueryValueExA, CreateFileA, etc).
I want to switch to using the unicode/ascii macros defined in VC++. So instead of char or CHAR I'd use TCHAR. For char* I'd use LPTSTR. And I think things like sprintf_s would be changed to _stprintf_s.
I've never really dealt with unicode before, so I'm wondering if there are any common pitfalls I should look out for while doing this. Should it just be as simple as replacing the types and method names with the proper macros be enough, or are there other complications to look out for?
First read this article by Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then run through these links on Stack Overflow: What do I need to know about Unicode?
Generally, you are looking for any code that assumes one character = one byte (memory/buffer allocation, etc). But the links above will give you a pretty good rundown of the details.
The biggest danger is likely to be buffer sizes. If your memory allocations are made in terms of sizeof(TCHAR) you'll probably be OK, but if there is code where the original programmer was assuming that characters were 1 byte each and they used integers in malloc statements, that's hard to do a global search for.

Resources