Wednesday, June 13, 2007

Program Language Hamming Space

A couple of computer language related events came together in the last few days to form a train of thought.

One was reading Justin's blog discussing strongly typed vs loosely typed variables with side trips into compiler-detected/assigned types for variables. Along the way Justin also discussed case sensitive languages (he's ag'in' 'em)

Another was a question Rich (who doesn't have a blog) asked me about a Python program.

The Python programmer, who obviously came from a C++ or Java background had was hoping for some function overloading. To paraphrase:


class Widget:
def tweak(filbert):
print "tweak with one argument\n"

def tweak(pecan, almond):
print "tweak with two arguments\n"


It didn't work. That makes sense because Python is a duck typed language that gives you incredibly flexible argument passing support so you really can't overload functions based on signature.

What concerned Rich is that it not only didn't work, it also didn't generate a "compiler" error. Python just blithely and silently replaced the first definition of tweak with the second one.

My first response to Rich was to explain why Python worked the way it did. That's fine, but a better discussion might be how can Python be improved to allow this type of error to be detected while not eliminating the possibility of replacing an class's methods at runtime.

Of course you might think (with some merit) that replacing an class's methods at runtime is an evil, bad, nasty, cruel thing to do and should be forbidden altogether, but I disagree. There are occasions, in which it's the right thing to do. One example is my "Aspect Oriented Python" demonstration program that I include in my "Power Python" talk. It shouldn't be impossible to replace a method; it should just be a little bit harder than it was in Rich's example: maybe a "redef" keyword.

But this is not about Python (or VB). It's about a programming language's Hamming space.

Hamming space, and Hamming distance is a concept that helps design error detecting and correcting codes. The goal is to detect and/or correct mistakes as long as they aren't too bad. A typical encoding scheme can correct any single bit error and detect all double bit errors. Errors involving three or more bits lead to unpredictable results -- the encoded value just might be "corrected" into to the wrong value.

Now consider the difference between a valid and correct program and a program that is wrong. One could imagine a programming language in which any statement containing a single error could be corrected and any statement containing more than one error could be detected as being wrong. You might notice, however, that I'm not defining "error" here. In a case sensitive language, an error might consist of a single character of the wrong case (widget when I meant Widget) or we might consider "case mismatch" to be a single error no matter how many letters are involved (newwidget when I mean NewWidget). Or how about this error: int pi = 3.14159;

I'm not advocating error correcting languages (Justin is!). I'd just as soon correct my own mistakes, thank you, but I would like a language that could detect many more of my errors than today's languages do. One of the reasons I don't especially like perl is that it has a very low Hamming distance between syntactically valid programs. I can put a perl statement together in all sorts of strange and mysterious ways and perl kindly tries to make sense of it. That's OK as long as it guesses right. What really annoys me, however, is that *you* can put together a perl statement in strange ways and if I want to understand and/or maintain the program I have to out guess the compiler to figure out what's really going on.

But this is not about perl. It's about a programming language's Hamming space.

One measure of "goodness" for a programming language should be the "distance" between valid programs. Making a minor coding mistake should produce a compiler/interpreter error message. It should not produce a syntactically correct, but semantically flawed running program. The only way to achieve this is to have a larger Hamming distance between syntactically valid programs.

Alas, none of the programming languages commonly in use lives up to this standard.

At least it's job security for programmers.

3 comments:

Anonymous said...

I feel I've been misunderstood, or misrepresented.

1. I do prefer a case-insensitive language like VB over a case-sensitive one like Java. However, I would like a case hyper-sensitive language even more.

void foo() {
int oneVar = 1;
int OneVar = 2;
}
I want the above to be a syntax error.

void foo() {
String string = "";
}
I'd also prefer use of the same word "string" to be an error.

2. I don't think I've ever said that I want a "language" to be error correcting. What I really want is for the IDE to be error correcting. For instance, if there is a variable in scope named "someVar", and I type "SOMEVAR = 1", then my IDE should automatically change it to "someVar = 1".

I also believe that if you have the perfect IDE (something better than our current state of the art) then you don't need most of the things currently know as a "programming language".

Dale Wilson said...

Justin says:
> I feel I've been misunderstood, or misrepresented.

OK, you're distinguishing between an error correcting language, and an error correcting IDE -- opposing the former and wanting the latter. A valid point that gets a little fuzzy when the boundary between the IDE and the language is blurred. I agree, that boundary should be blurred or even erased -- assuming a decent cross-platform IDE to support cross platform languages. Programs as text files are *so* 20th century ;-).


I'm just a little more conservative than you are. I despised it when VB corrected my capitalization. I like the squiggly underline approach with a specific action required to actually "correct" the problem.

For example, right clicking on the "misspelled" word should produce a menu that contains the proper spelling (along with other options that may apply.)

Michael Easter said...

Thank you for using a dynamically typed language that does not begin with 'R'.

If Perl has a low Hamming distance, which languages are closer to the 'goal' (i.e. a relatively high H distance)? C++ ? ML ? I vaguely remember from school that ML is a very strongly typed language. It was actually kinda neat, if only for academic exercises.

My intuition is that a high Hamming distance would conflict with my ideal, which is a language that stays out of my way. An intuitive, fluid syntax that let's me hit the Zen flow.

Such a language would inevitably have semantically incorrect programs, but IMHO the answer is unit-testing, not syntax.

ps. Actually, the closest language for that, in my experience, has in fact been Python, though I do have a date with 'R' really soon now...