Tuesday, May 23, 2006

The Software Crisis Solved (yet again)

Thirty five years ago someone offered to pay me for writing a computer program. "What a great way to make a living," I thought.

When the program was working, I proudly showed the result to my customer. He had the sheer audaucity to check the results by hand. When the numbers didn't add up, I was, shall we say, chagrined.

Since that time I have written a lot more programs for money, but I've never forgotten the lesson learned on that first assignment. Writing programs that are not only "done" but are also demonstratably correct has been an important goal in my professional life.

Thus when I see the teaser "How to Prevent Software Disasters" on the front of the June 2006 "Scientific American," and an article "Dependable Software by Design" by Daniel Jackson, you can be sure it had my attention.

After the usual scare stories -- in this case the ones involving the Denver Airport, the IRS, the FBI, and the FAA -- along with cautionary scenarios of out-of-control computers running planes, trains, and automobiles, financial markets, and machine tools (the author forgot medical equipment, oh, never mind,that showed up in a sidebar a couple of pages later) the article got down to business.

My first warning that this was going to be yet another silver-bullet article came when I read, "Almost all grave software problems can be traced to conceptual mistakes made before programming started."

Um, excuse me, out here in the field it turns out that the overwhelming majority of mistakes have to do with the little fiddly bits, not the grand scheme of things. For every Denver Airport Baggage system that couldn't possibly work due to faulty analysis of the problem to be solved, there are hundreds or thousands of "little problems" like the Mars Observer missing the planet because someone used the wrong measurement system, or the software viruses that were able to cripple the internet because some programmer neglected to include a simple check for the length of an incoming message.

"Ok," I thought. "We'll ignore that one and move on. There's often a pony to be found in this type of article." And then I came to the statement: "The idea is to simulate every state that the software can take to determine that none leads to a failure." I must admit I broke out laughing at that point.

Consider the climate control system in my car. It's not very complicated, A rough estimate of the "possible states" for the heater and air conditioning controls on the dashboard is around 2000. The estimate is rough because the temperature control is analog so I'm estimating about ten possible settings. Factoring in the radio gets the number of possible states for the dashboard controls well into the millions, and we haven't even handled half of the dashboard. I daresay generating all possible states for my single automobile would take modern computers (all of them!) more time than has passed since the automobile was invented.

Yes, it turns out the author is aware of the state explosion problem, and he didn't really mean "simulate every state..." He really meant "use some heuristics to search for possible pathological states..." Apparently those states that are identified by the developer as being pathological (unreachable files in his example.) Once again, referring to my experience in "the real world" it's not the pathological cases that you can think up ahead of time that get you. It's the ones that come as a surprise. "Gosh, I never imagined that a trader would enter the price in the quantity field and vice versa. There goes the company."

Simulating a model is just another form of testing in which a negative result (we found a failure case) is important and significant, but a positive result (no problems found here!) can be used, along with $5.00 to buy a fancy cup of coffee at Starbucks. The authors are describing an automated testing tool that is driven from a model of the program behavior.

To make this system work, the author has invented a new language to describe the design of the program, Alloy. Presumably Alloy expresses the design at a higher more abstract level than the typical languages (Java, Ruby, C++, Python, ...) used by programmers, and therefore allows a higher level of validation (aka testing) to be applied to the design. The question not answered by the author is what relationship exists (if any) between the Alloy model of the program design and the actual program as executed by the computer?

Some possibilities include automatically generating the "low level" computer code from the high level model, automatically verifying the correspondence between the model and the actual implementation, or depending on the intellegence and understanding of the software analysts and developers involved in the project to be sure that the relationship holds as expected. Each of these possibilities has corresponding problems.

If the actual code is auto-generated what you have is an executable model. In fact you've invented a new programming language. Now rather than programming in Java, the software developers will be programming in Alloy. This is not necessarily a bad thing, but it means the Alloy modeling language needs to be rich enough to address all the issues that software developers must worry about when writing programs. And, just as mistakes are often made by programmers writing in Java or Python, mistakes will be made in Alloy. If it turns out that Alloy is a good language and can hide more of the complexity of programming that can traditional languages, then progammers will make fewer mistakes. However if Alloy is not seen as a programming language and analyzed as such, we'll never know how expressive and safe it is.

Suppose we try the approach of automatically verifying the equivalence between the Alloy model and the "real" program. Once again Alloy is a programming language, but this time it does not need to handle all the low level details. They can be addressed in the "actual" programming language (goodness, I hope the programmer gets the details right.) All that needs to happen is that the high level abstraction expressed in Alloy must be compared to the abstraction expressed in Java or Python. To put this another way, we need a tool that can examine a Java program and understand what it is supposed to do.

Making this comparison is an interesting, and quite difficult problem to solve. One wonders, we had a way to extract the higher level meaning of a program from the lower level C# or VB or Java source, why we can't just validate this representation of what the program means directly using the same techniques that are used to validate an Alloy program (er... model.)

This does, however, raise and interesting possibility. Rather than inventing a new higher level language in which to express the program, maybe it would be more productive to have two teams writing the same program and an automated comparison of the results of the two team's work. Surely it is easier to compare two C++ programs to each other than it is to compare a program written in an abstract modelling language to one written in a low level implementation language.

As an aside, let me mention the stiff resistance such a plan will encounter in the "real world" of managers and business people who already think that it takes too long to develop software so don't even think of using two people to do one job! I happen to know this because many developers would like to spend half their time writing code that tests to be sure the other half of our code behaves the way we expect it to. We call it "unit testing" and it's mighty difficult to get the funding to do it from some short sighted organizations.

The remaining approach for verifying the correspondence between the Alloy model and the actual code is to depend on the native intelligence and experience of the software developers. This is a good thing to the extent that it helps the developers involved analyze and understand the program at a higher level than the one at which they normally work. In fact I speculate that a great deal of the benefit claimed for Alloy comes because of just this effect. In order to create an Alloy model the developers must come to a deeper understanding of the basic design of the program, and in doing so, they end up creating a better system.

So, I congratulate the author on addressing a difficult topic and bringing some new ideas to the table (or at least some interesting new variants of the old ideas that have been around for a while) but I do wish he had done a better job of understanding the meaning of this work. I cringe when he suggests that "governments may even establish inspection and licensing regulations that enforce high-quality program construction techniques" implying that modeling the program in Alloy is such a good thing that it should be imposed by fiat.

Thursday, January 05, 2006

Functional Programming Observation Continues...

A follow-up to my previous post on functional programming in C++.
Compare


    size_t total = 0;
    for(size_t i = 0; i < elements_size(), ++i)
       total += elements_[i].bytes();


    size_t total = 0;
    std::for_each(elements_.begin(), elements_.end(), count_bytes(total));
    return total;

Neither one is "right."

What I want is to hide the details of iterating through a collection while revealing what is actually being done. In fact, I want to say:


   size_t total = 0;
   for (each element in elements_)
        total += element.bytes();

(Python anyone?)

The "for-loop" approach fails on the "hide the iteration" criteria [although I've been using for loops for so long that for(size_t i = 0; i < elements_size(), ++i) is a single conceptual chunk for me.]

The "for-each" approach also fails to hide the iteration {although familiarity with STL makes begin() ... end() into a single conceptual chunk, also.]

The "for_each" approach also hides what's actually being done -- it depends on a descriptive name (count_bytes is not bad) to provide a hint.

The "for-loop" approach shows the actual work (which is good, but clutters it up with a leftover detail from the indexing process ("elements_[i]" rather than simply "element")

I wonder if there's some way to convince the compiler to recognize:


 size_t total = 0;
 while(each(element, elements_))
        total += element.bytes();

for any arbitrary collection (elements_) of data type (element). This would of course require that the collection obey STL rules.

Thursday, December 29, 2005

An observation concerning functional programming in C++

The following code was extracted from a live program.


   namespace
   {
     class count_bytes
     {
     public:
       count_bytes(size_t& total)
         : total_(total)
       {
       }
       void operator()(
         const Element& element
         )
       {
         total_ += element.bytes();
       }
     private:
       size_t& total_;
     };
   }
   size_t
   ElementSet::bytes() const
   {
     size_t total = 0;
     std::for_each(elements_.begin(), elements_.end(), count_bytes(total));
     return total;
   }

By my count that's 26 lines of code. To be fair I should admit that I tightened it up a bit. The original was well over 30 lines.

Compare that to:


   size_t
   ElementSet::bytes() const
   {
     size_t total = 0;
     for(size_t i = 0; i < elements_.size; ++i)
     {
       total += elements_[i].bytes();
     }
     return total;
   }

Ten lines (And yes, you could use an iterator rather than an index.)

Functional Programming Motto: You may have to type a whole lot more, but at least your code will be harder to understand.

Actually functional programming is just fine when you need it. The problem happens when it's the "Wrong Tool For The Job[TM]." There seems to be a lot of that goin' 'round [sigh].

Saturday, December 10, 2005

Thread return value/threadID/ handle/ join/detach vs C++

Two entries ago, I opined that in a C++ program there should be an object that represents every thread. One interesting consequence of this statement is that many of the "features" of the OS-supplied multithreading support become unnecesary and even counterproductive.

For example, on almost all platforms, start_thread (by whatever name) calls a function with a void* argument. That's ok. Amost all thread functions begin life with a cast. However the thread function is expected to return a value: usually an int but maybe a DWORD or even a void * or whatever depending on your platform. In C++ the proper return value from this function should be 0 -- always! (Actually it should be void, but it seems a shame to disappoint the OS that's eagerly awating the zero. (And besides the compiler won't let me get away with it.))

Why?

Because if there is an object associated with the thread, the thread has a much richer channel through which to return information -- the members of the object.

And speaking of return values, many times no one cares what the thread has to say as it exits. [No death-bed epigrams for you, Thread, you're outta here. ] The joinable vs detached concept in many OSs accomodates this desire on the part of the thread to get the last word in.

However, since the thread now has a whole object available through which to return values, and since anyone who cares can keep a smart pointer to the object (remember the purpose of smart pointers is to manage object lifetimes.) the whole joinable vs. detached issue becomes moot.

Threads in C++ should ALWAYS run detached. Rather than joining a thread, you can wait on a condition in the thread's object (safely because you have a smart ptr to guarantee the condition will be around to be waited on.)

This approach might be much kinder to your system resources. Many OS's hang on to lots of information about a terminated thread -- possibly even it's entire stack, register safe storage, open FDs, etc. waiting for someone to join in and tell them the resources can be freed. Using an object can be a considerable savings.

Which brings up the issue of thread ID's If your interaction with the thread is via the object, you don't need a thread ID to join the thread -- and you certainly don't need a thread ID to kill the thread (see blog entry n-1) so the thread ID becomes much less valuable. It still has some value in identifying the thread in log messages (Ever tried to follow a log that didn't include thread ID's in a message? Me too, and I still regret it.) And the thread ID might also be involved in managing thread specific storage -- although many uses of TSS could be handled better by storing the data (or pointers thereto) in the thread's object.

Speaking of TSS. Think of it as FORTRAN COMMON for the thread-wielding-crowd. There's usually a better way, but sometimes the better way requires some thought <insert cynical comment here.>

Foot shooting

In my previous post I criticized ACE for allowing the programmer to shoot himself in the foot. I was wrong. The problem is not that ACE allows you to do dangerous things. The problem is it doesn't provide enough incentive to convince a lot of programmers NOT to do the dangerous things.

If I work really hard I can imagine a situation in which the only way to save my life would be to shoot myself in the foot. Witness the hiker a couple of years ago who cut off his trapped arm so he could hike out of the wilderness with the rest of his body parts still functioning. That doesn't mean that everyone who ventures into the wilderness should pack an amputation kit just in case, or if the do, it should be in a package that is clearly labled: "For emergency use only." Once you open the package, you should find another package that says, "No, this is not an emergency, do it the right way." Only after opening THAT package should you find the Acme self-amputation, foot-targeting, and dangerous OS functions kit. [Pat. Pending]

So, if you want thread A to kill thread B, all you need to do is call the:
I_AM_A_BLINKING_IDIOT_FOR_USING_THIS::kill_thread() method.

Monday, December 05, 2005

Practical Threading

A while ago, I started to talk about multithreaded programs in this blog. Alas, I distracted myself into talking about "Why Thread" -- an important topic and one that is often misunderstood-- when I should have been talking about "How to thread" because "How" is done badly even more often than "Why".

My recent work with multithreading has been with ACE and boost threads, so I'll use them as examples (no offense, guys.)

So, "How to thread in C++"

C++ is an object oriented language (or at least it can be used to write object oriented programs which is not quite the same thing but is close enough for now.) An object oriented should have an object representing/corresponding to each entity the program is dealing with.

A thread is an entity that needs to be dealt with by a multi-threaded program.

Rule #1: There should be a one-to-one relationship between threads and thread-related objects in a C++ program.

This does not mean an application programmer (a programmer who is dealing with objects that represent "real-world" entities) should be thinking about thread objects. On the contrary the thread objects should be hidden so well that they do their job without distracting the application programmer from the real work to be done.

"But wait!" you say, "isn't that a lot of overhead? Expecially," you add -- looking ahead in this entry -- "when you have to allocate the object on the heap."

"Don't be ridiculous." I calmly reply. "You're planning to start a thread with it's own stack, register storage, and who knows what resources tied up in the OS and you worried about a simple malloc!"

But I digress (so what else is new)

Where was I? Oh, yeah. "There should be a one-to-one relationship between threads an thread-related objects." But many thread-support libraries (ACE) don't always do this. Instead they have objects like a thread group that represents some number of threads. As soon as you do this, you lose control of individual threads and that's a problem.

I'm not saying that there shouldn't be objects like thread pools, just that a thread pool should never interact directly with an OS thread. Instead it should interact with the C++ object that represents the thread -- of which there will be as many as there are threads.

Oh, didn't I mention rule 2: All interaction with a thread should be through its object. I guess that seems obvious to me, but again it's often not obvious enough to the authors of thread libraries [ACE] that they actually do it that way (sigh). I guess they think the programmers would object to having a gun that wouldn't fire when pointed directly at your foot (or more vital parts of your anatomy.)

Ok, its time for rule three: The lifetime of the thread-related object must be longer than the lifetime of the thread. It should exist (however briefly) before the thread is started, and should continue to exist (however briefly) after the thread exits.

Again, this is something that many existing libraries (ACE, boost, et. al.) get wrong.

Oh, dear. We've entered the hazardous realm of object lifetime management [sometimes misrepresented as an issue of object ownership, but thinking in terms of ownership muddles the issue.]

Fortunately object lifetime management is an area in which the *SILVER* *BULLET* solution has emerged -- reference counted pointers! Unfortunately, C++ does not provide the tools to do reference counted pointers well, but it is possible to come close. boost::shared_ptr and ACE_Strong_Ptr are examples of refcount pointers done pretty darn good if not perfectly.
[ACE_Refcounted_Autoptr on the other hand is a disaster waiting to happen -- please don't use it (at least not in an airplane I might fly in!)]

So, we're going to use boost::shared_ptr to manage the lifetime of the ThreadRelatedObject. Cool!

class ThreadRelatedObject;
// alias TRO
typedef boost::shared_ptr ThreadRelatedObjectPtr;
// alias TROPtr

That leads to the next rule (4 I think): The TRO must be the one to start the thread (deftly satisfying half of rule three by guaranteeing that the TRO exists before the thread does.) The other half of rule three is handled by rule 5: The TRO must have its own, private, TROPtr so that it can be involved in its own lifetime management. We'll call this the self pointer. The last thing the thread does before exiting will be to reset it's self pointer -- allowing the TRO to be deleted if no one else remembers it. [So if you're thinking object ownership you might not understand why an object needs to own itself, but it seems perfectly reasonable to think that an object might want to manage its own lifetime.]

And then of course, there's rule 6: Any object outside the TRO that wishes to interact with the thread must do so via a TROPtr. Otherwise it can't guarantee that the TRO still exists.

A point of information: boost::shared_ptr to the same object must touch each other. I.e.
Widget * w = new Widget;
WidgetPtr p1(w);
WidgetPtr p2(w);
and you have a disaster because p1 didn't touch p2.
No one would code the above, but they might code:
WidgetPtr p3(new Widget);
which looks perfectly reasonable, and in fact is the preferred technique up 'till the point that the Widget does
class Widget
{
Widget()
:self_(this)
{
}
WidgetPtr self_;
};
Kaboom.

Anyway, it's time for a very-import-point-that-*everybody*-gets-wrong. Rule 7: The TRO must not -- repeat must not -- start the thread in the constructor. Why?
Suppose the constructor:
1) creates the self pointer
2) starts the thread
3) returns the self pointer to the creator of the TRO (oops, see above!)

Nevermind that point 3 doesn't work because the constructor can't return anything other than this which is not a ThisPtr -- that's just a deficiency in the C+++ language, and there are ways[hack] around that problem -- like private constructors with static create() methods [hack] and my favorite: passing to the constructor a reference to a TROPtr which the constructor initializes[hack].

The real reason for rule 7 shows up somewhere between step 2 and step 3 of the constructor when the thread started by step 2, does what needs to be done, and exits(!) before step 3 is executed. The static create method can't handle this. The pass-a-ref-to-ptr-to-the-constructor hack mentioned above, can cope if done very carefully (a very careful hack, eh?) but it gets ugly --especially when layers of inheritance happen. Actually its kind of fun to get it wrong, then watch one of your coworkers try to figure out why the pointer returned from new points to an object that's already been deleted -- but only if you have a coworker who deserves it . To bad at this job I don't have any candidates.

Why do ugly when there is a simple solution.

Rule 7a: There should be a start method on the TRO that actually starts the thread.

One of the fun things about programming that when you find the right solution, stuff works! Separating object construction from thread initiation is one of those right solutions. The calling object gets the luxury continuing to initialize the TRO after creating it and before starting it. Some things are best not done in a constructor.

Separating construction from thread initiation also make it considerably easier to create a generic base class for handling thread related issues. The ugly create and or pass-a-pointer hacks aren't necessary (go ahead, try to figure out how to do a static create method in a base class). This means that we can get the solution right once and never worry about it again.

So that's what I did.

My solution that's actually being used is based on ACE thread support (ACE does provide good platform independent thread support [as long as you don't use Thread Specific Storage (grin) I just don't like the way its packaged.) Unfortunatly ACE tends to be a bit intrusive -- it's a shame to take on all that baggage just to get a few platform-neutral thread-related functions.

That's why I went looking at boost threads. Most of boost is truly high-class work. Boost threads, alas, is not. Although it passes the "use conditions rather than events" test [an altogether different topic.], it fails the object-lifetime management test.

So I guess I won't be publishing my "universal thread support the right way" base class quite yet. Maybe I'll just publish the ACE-based version and someone will point me to a platform independent thread library that separates object construction from thread initiation and supports condition rather then event, and does not come with tons of baggage.

Just remember, the multicores are coming. Do you know what your threads are doing?

Thursday, October 20, 2005

Strangelove

I bought a copy of Dr. Strangelove last weekend. I hadn't seen it in years, so I had the joy, once again, of discovering all the the little gems buried in the movie.

"You can't fight here! This is a War Room!"

To really appreciate the movie, you have to understand it in the context of the early '60s with the commie scare, the bomb scare, and, yes, the floridated water scare.

So where is the moviemaker today who'll do a comedy about terrorism, Al Queda, homeland security -- with a side jab or two at intellegent design? Yes, we really do need to laugh while we watch the World Trade Centers burn -- otherwise the terrorists win.

Remember at the end of Strangelove the doomsday device was triggered while George C. Scott was warning the president about the "mine shaft gap."

Dale

Thursday, October 13, 2005

We've come a long way...

I just stumbled over this code deep down in ACE -- a C++ library/framework that prides itself on its portability:

ACE_OS::sprintf (date_and_time,
ACE_LIB_TEXT ("%3s %3s %2d %04d %02d:%02d:%02d.%06d"),
day_of_week_name[local.wDayOfWeek],
month_name[local.wMonth - 1],
(int) local.wDay,
(int) local.wYear,
(int) local.wHour,
(int) local.wMinute,
(int) local.wSecond,
(int) (local.wMilliseconds * 1000));
return &date_and_time[15 + (return_pointer_to_first_digit != 0)];

A word of explanation. For a long time the ACE community didn't believe in bool, so "return_pointer_to_first_digit" is a bool-like substance that when equal to zero means false. Thus (return_pointer_to_first_digit != 0) converts the pseudobool to a genuine bool.

Question: What value does *your* favorite C++ compiler use to represent true?

Dale

Monday, August 29, 2005

Coming soon to a cell phone near you...

Just a reminder: In a month, cell phone numbers are being released to
telemarketing companies and you will start to receive sale calls.
YOU WILL BE CHARGED FOR THESE CALLS
To prevent this, call the following number from your cell phone:
888/382-1222. It is the National DO NOT CALL list. It will only take a
minute of your time. It blocks your number for five (5) years.

You can also use the following web link:

https://www.donotcall.gov/default.aspx

Friday, August 26, 2005

The opposite of in...

Quick, what's the opposite of login?

The answer, of course, is logout.

And the opposite of logon?

Logoff!

So why do some systems want you to login, then logoff; while others prefer that you logon and logout?

I must be developing another pet peeve.

Friday, August 19, 2005

Why I worry about Ruby

In the FAQ on the official Ruby site, Matz (author of Ruby) is quoted as saying:

Well, Ruby was born on February 24 1993. I was talking with my colleague about the possibility of an object-oriented scripting language. I knew Perl (Perl4, not Perl5), but I didn’t like it really, because it had smell of toy language (it still has). The object-oriented scripting language seemed very promising.

I knew Python then. But I didn’t like it, because I didn’t think it was a true object-oriented language—OO features appeared to be add-on to the language. As a language manic and OO fan for 15 years, I really wanted a genuine object-oriented, easy-to-use scripting language. I looked for, but couldn’t find one.

Let's see: In 1993 he had been an OO fan for fifteen years. He must have been using Simula in 1978. I'll give him the benefit of the doubt on that one, but...

Python is not object-oriented enough? OO features tacked on?

Apparently Matz doesn't quite get it. In Python *everything* is an object Witness the following interactive session:

>>> type(1)
<type 'int'>
>>> dir(1)

['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__str__', '__sub__', '__truediv__', '__xor__']

As you can see an integer, like all the other native data types, is an object with a value, a type, methods -- the whole shebag.

Not only that, but!

>>> def spam():
... """This is the spam function"""
... print "Peanut butter and Spam sandwich"
...
>>> spam()
Peanut butter and Spam sandwich
>>> type(spam)
<type function >
>>> dir(spam)
['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc',
'func_globals', 'func_name']

A function is an object. It can be manipulated just like any other object in Python. It just happens to support the __call__ method. Yes, of course you can create your own class of object that supports the __call__ method and use it anywhere a "normal" function is expected.

Other types objects in Python include modules (in Java they're called packages); chunks of compiled code (that's the func_code property of the method above); "None"; "NotImplemented"; and "Ellipsis"; and more, all of which are available to be manipulated by the programmer as objects (if your into that kind of thing), or to just quietly do their job if you'd rather concentrate on the important stuff.

Of course, Python supports user defined classes from which instances (i.e. objects) can be created. Like the rest of Python, class defintions are syntactically and conceptually clean. And yes, the class itself is just as much an object as its instantiations are.

About the only OO feature I can think of that Python doesn't support is function overloading -- it's kind of hard to do in a dynamically typed language (grin).

Back in '93 Python was still a bit young (it was originally released in '91) but even then it was obvious that "Guido knows Objects."

I guess Matz was mislead because Python's OO nature is not constantly in-your-face. You can code in Python without thinking about objects (unless, of course, you want to). Instead you get to think about the problem you're trying to solve. Hello world in Python is:

print "Hello, World!"

The objects are there doing their job, so you don't have to worry about them.

Maybe Matz did a better job of designing Ruby than he did of understanding Python.

Tuesday, June 21, 2005

Simulating a loom. UI vs creativity.

I bought another weaving design program last week at the Midwest Weavers' Conference. Both Tina and I use our"old" program regularly to design cloth. Why would I pay $100 for a new program when I have a perfectly good program that obviously works? The answer says something about the impact of user interface design on creativity.

Since I can't assume that everyone who reads this understands how a loom works, I have to digress. I'm going to describe the design issues faced by a weaver using a jack loom. There are many other types of looms that have their own design issues, but jack looms are very common among handweavers so it's a good place to start.

The purpose of a loom is to interleave two sets of threads that run at right angles to each other -- thereby creating cloth. (All you people with triangular looms, hush -- I'm trying to keep this simple.) One set of threads, the warp, is installed on the loom before the actual weaving begins in a process known as dressing the loom. The second set, the weft, is added to the cloth one thread at a time by running a shuttle containing a bobbin full of weft thread between the warp threads in a preplanned pattern. (and I'm not even going to *mention* how many details and variations I just omitted.)

When you ask a weaver to describe his or her loom, you can be sure that one of the first things they mention is how many shafts the loom has (unless, of course, they've been weaving for a long time in which case they'll tell you how many harnesses the loom has. I'm sure there's a really good reason for the terminology change -- other than to confuse the innocent.) That's because the number of harnesses (oops, I mean shafts) has a strong influence on the complexity of the cloth that can be produced. So what's a shaft?

Part of dressing the loom is threading. Each of several hundred threads in the warp goes through through the eye of a heddle (Imagine a large (12" long) needle with the eye in the middle rather than near one end.) The heddle is attached to a shaft, so that when you lift the shaft, all of the heddles attached to that shaft, and therefore all the threads in the eyes of those heddles are lifted. The the remaining warp threads -- the ones that are attached to shafts that do not get lifted -- remain down and a triangular space is opened up between the two sets of threads. This space, called a shed, is where the shuttle is thrown -- trailing its warp thread behind it.

Once the shuttle is through the shed, the shed is closed, the warp thread is pressed into place at the edge of the newly woven cloth using part of the loom called the beater, and a different shed is opened for the next warp thread. Thus each warp goes under the set of lifted warp threads, and over the remaining ones and cloth happens.

Since each warp thread is associated with a single shaft, the warp is divided into independantly controllable sets of threads. The number of shafts on the loom defines an upper limit on the number of sets of warp threads. More than one shaft can be lifted to produce any particular shed so the number of potential sheds (aka lifts) goes up dramatically as the number of shafts increases. In fact, a loom that has n shafts can produce 2**n - 2 meaningful lifts. (The -2 is there because it doesn't make sense to lift 0, or n shafts.) Some of the common cases are:
2 shafts -> 2 lifts
4 shafts -> 14 lifts
8 shafts -> 254 lifts.
16 shafts -> 65534 lifts.

Thus motivating a common malady among weavers: shaft envy [ no questionable jokes allowed here.] and it's converse: shaft pride [note 1].

There is another limitation, however, that shows up when I describe a previously unmentioned part of the loom -- the treadles. In order to lift the shafts, the weaver presses down a foot treadle. Each treadle is tied to one or more shafts so that the shafts are lifted as the treadle is pressed. The number of treadles imposes an additional upper bound on the number of lifts. For example, most 8 shaft looms have 10 treadles, so part of the design process is select which of the 254 possible lifts will be used during the weaving process. Of course it is possible to press more than one treadle at the same time (two feet can produce 100 possible lifts on a 10 treadle loom), and of course the tie up between treadles and shafts can be changed during the weaving process, but that's a slow and awkward proposition. Most handweavers using treadle-operated looms end up restricting the number of distinct lifts to the number of treadles.

...unless....

Unless the loom has a dobby instead of treadles and a tie-up. For computer history buffs, dobbies are the thing that Joseph Jacquard invented that led via Herman Hollerith to punched cards (which no one under 30 remembers, anyway.) A mechanical dobby uses holes punched in a wooden board, or more commonly nowdays pegs screwed into a wooden board to indicate which shafts should be lifted to form a shed. These cards are chained together so when the weaver is ready to move to the next weft thred, the chain is advanced to the card containing next lift pattern.[note 2]

A dobby provides two benefits. First the number of possible sheds is no longer limited by the number of treadles -- the weaver can design to the full capability of the loom, and second the weaver no longer has to remember the treadling sequence. No longer is the complexity of the pattern limited by the capacity of the weaver's memory, or the speed of weaving limited by the need to carefully follow a treadling sequence.

An electronic dobby takes this one step further. Rather than pegs in a wooden card to select a lift pattern, an electronic dobby uses solenoids to select the shafts to be lifted. These solenoids can be computer activated, so the chain of dobby cards can be relplaced with a lift plan stored in the computer. This removes yet another limitation in that the length of a woven pattern is no longer limited by the number of dobby cards in a chain. Instead it is limited only by the capacity of the computer and the ability of the weaver to design the pattern. Suddenly those 65 thousand possible lift patterns are accessable -- if only the weaver can figure out how to actually use them.

Which brings us back, finally, to the issues of user interface design for the computer assisted design programs used by weavers -- a topic for the next entry since this has gotten way to long.

[note 1] Tina and I have looms with 4, 8, 16, and 24 shafts. The 16 and 24 shaft looms are computer controlled.

[note 2] A dobby controlled jack loom described here is not the same as a modern Jaquard loom. A Jaquard loom provides individual control of each thread. It could be (but isn't) described as a loom with several hundred shafts. Jaquard looms typically cost 10 to 100 times as much as dobby controlled jack looms, and wouldn't fit in a handweaver's studio anyway.

Thursday, May 19, 2005

When worlds collide

I love it when two separate parts of my life collide. Ruth Blau just posted this on the WeaveTech List

This past weekend, I attended a wonderful knitting workshop with Debbie New. One of her approaches to designing is based on the Sierpinksy triangle fractal. You can do it with color (either two colors, or better yet, with dark & light and then use any colors you want), with stitches (knit vs. purl) or w/ broader design concepts, e.g, cables. She calls it "rule-based knitting." At any given time, the stitch (or color or whatever) you use is determined by the stitch's surroundings. Here's an example. Assume you're ready to knit a stitch. Look at the three stitches below it (one directly below and the two on each side of it). Your rule is this: if one (and only one) of the stitches below is purl, then you purl. Otherwise you knit. It could also be this: if one and only one of the stitches below is light then use light yarn, otherwise use dark. You make this decision for every stitch in the row (yes, it's slow going).

If you happen to have Debbie's book "Unexpected Knitting," this is in the section called Cellular Automaton Knitting.

Intel Exudes Confidence

Headline and lead-in from an InformationWeek article

Intel Business PCs Won't Include Dual-Core Processors
The business-PC platform includes only technologies that have been validated, and the chipmaker promises it will remain stable and unchanged for the next 12 months.
By Darrell Dunn

Are you getting a warm fuzzy feeling about the Intel dual core chips?

Monday, May 16, 2005

Pauli Exclusion Principle applied to Airlines

Pauli Exclusion Principle

No two people on any particular airplane shall have paid the same amount for their ticket.

Wednesday, April 27, 2005

Oops. My mistake...

In a previous post on threading I gave a high-level pseudocode description of a multithreaded MPEG decoder.

It was wrong.

A revised (and, I hope, more correct) version is:

The new high level design for video looks like:

Accept input, and separate into VOBs
Hold VOBs for processing [MT]
Pick a VOB and demux its content into substreams
Queue packets for decoder(s)[MT]
Decode stream into buffer in decoded VOB
Wait for VOB completion [MT]
Hold completely decoded VOBs[MT]
Get next VOB and deliver decoded substreams to presentation engine.
Hold decoded, substreams for presentation [MT]
Mix and present decoded content.

* * *
My first instinct was to simply edit the original post and replace the incorrect "code" with the corrected version.

Then I remembered the programmer's diary, I mentioned in another post.

One of the hardest parts of becomming a good programmer is learning how to deal with mistakes. First you have to accept that you and the people you work with are going to make mistakes. Then you have to train yourself to react positively to your own and other peoples' mistakes.

Reacting positively to your own mistakes means you fix your mistakes. You don't hide them. You don't defend them. You just fix them -- and clean up any consequences resulting from the mistake.

Reacting positively to other peoples' mistakes means you bring them to their attention in a non-threatening way. You don't fix their mistakes for them (at least not silently.) You don't help them hide their mistakes. You don't gloat over their mistakes (although it's hard to avoid a certain level of "boy, I'm glad I didn't make that mistake.) What's important is that the person who makes the mistake learns that it happened, and that the mistake gets fixed.

And finally, when someone brings one of your own mistakes to your attention, the only proper response is "Thank you." After saying that then you can proceed to analyze the report to see if it's correct, but first you must reward the person who respected you enough to tell you about your (possible) mistake.

A lot of this comes from another landmark book about software development: The Psychology of Computer Programming, by Gerald Weinberg.

* * *
I predict that we will never have a good programmer as president of the United States (and vice versa.)

* * *
So why did I make this mistake? Because I was thinking about multithreading on a frame-by-frame basis. Then when I switched to thinking about it on a VOB-by-VOB basis I didn't completely reset my mental model of the problem.

How can I avoid making this kind of mistake in the future? (Or how can I make it less likely to happen?) Tough question -- maybe awarness of the potential pitfall will help.

Thursday, April 21, 2005

Baby Geese and Parrots

The eggs in the goose nest right outside the door to our building here hatched the other day. Within hours after they hatched, they were cute bundles of fuzzy yellow feathers running around on their own -- much to their mother's dismay. A day later the parents marched their goslings over to a near-by lake.

How come baby geese are so competent and cute when baby parrots are totally helpless and look like something from a grade C SF flick?

For example

My theory is that baby parrots are too busy growing an intellegent brain to have any energy left over for cute. Geese seem to make-do without benefit of brain.

Tuesday, April 19, 2005

Parallel by force

Suppose you've created the multithreaded MPEG decoder as outlined in the previous entry. Remember the good reason for multithreading the MPEG decoder was:

The task is inherently multithreaded so a multithreaded solution results in simpler code.

In fact the MPEG decoder almost begs to be multithreaded.

So one day your multithreaded MPEG decoder is happily zipping thru an MPEG stream that contains just video and one audio track. The following threads are running:

#1 Accept and demux input
#2 Decode video substream
#3 Decode audio substream
#4 Mix and present substreams.

Then your boss shows up and says, "I spent all this money on a 16 CPU superserver and your application is only keeping it 25% busy. I want you to increase the parallellism so all the CPU's will be kept busy. NOW!"

* * *
Now what do you do (other than looking for a new job with a new boss.)

You've already added the "natural" multithreading that is inherent in the problem. How can you increase parallelism even further?

It's time to try to apply the other good reason.

The task can be cleanly decomposed into multiple sub-tasks that are highly independent; the independent tasks can use resources in parallel; and the benefits of this parallel usage outweigh the overhead of multithreading. (All three conditions must be true.)

Hmmm....

A video stream is a series frames. Maybe we can create multiple threads and have each thread decode a separate frame. So we add component that separates the stream into a series of, undecoded frames (yes this is fairly easy to do without actually decoding the frames) and a pool of threads that processes these frames. Each thread from the pool picks up the next un-decoded frame, decodes it, and adds the result to a collection of decoded frames. Since frame-decode time varies as a function of the complexity of the image, we also need component to shuffle the decoded frames back into the correct order.

Voila, we can keep as many CPU's busy as we want to by looking forward far enough. Makes sense, right?

Nice theory, anyway. When you start coding the frame decoder, you'll quickly run into a major stumbling block. One of the techniques MPEG uses to compress the video image is to send most frames as a diff from the previous frame. This is very effective -- especially when the movie is showing relatively static scenery (it doesn't work so well during explosions.) Thus as you decode frame #n you regularly have to refer back to frame #n-1 to apply the diff and thereby create the final result. Even more interesting, sometimes you have to look *forward* to frame #n+1! (Don't ask, the MPEG folks are a twisted bunch.)

So the thread-per-frame solution sounds plausable (you can probably sell it to your boss) but fails the "independance" test. Back to the drawing board.

Fortunately for DVDs there's another approach. In order to support fast forward, slow motion, jump to scene, etc, the video on a DVD is carved up into chunks called video objects (VOBs) A VOB contains about half a second worth of video, audio, subtitles, etc. and what's more important each VOB is independant of the VOBs that preceed it and follow it. So, although the thread-per-frame idea was a bust, a thread-per-VOB approach will work nicely. You may need a priority scheme to insure that the thread that's decoding the VOB scheduled to show up next on the screen gets all the resources it needs, but other than that you've found a clean division of the main task into subtasks that can take advantage of the available CPU's by running in parallel.

The new high level design for video looks like:

Accept and demux input
Queue packets for decoder(s)[MT]
Separate into VOBs
Hold VOBs for processing[MT]
Decode VOB
Hold decoded VOBs for reordering[MT]
Reorder decoded VOBs into decode stream
Queue decoded streams for mixer.[MT]
Mix and present substreams.

This approach has added some more synchronization spots -- one to hold the separated VOBs waiting to be decoded, and one to hold the decoded VOBs until they can be placed in the correct sequence and passed on to the mixer. It might be tempting to try to merge demuxer with the VOB separator or the decoded VOB holder with the decoded stream queue, but don't give in to temptation. Solve one problem at a time and let the inherent parallelism take care of improving performance. [or at least get it working correctly and profile it before optimizing.]

The moral of the story:

Finding the right decomposition into independant subtasks needs to be done carefully based on detailed understanding of the domain. An obvious solution may not be the right solution.

Monday, April 18, 2005

Multithreading: Why bother?

So multithreading synchronization is hard and requires hardware support. How do all those existing multithreaded programs manage to work?

Answer #1 Someone got lucky. Doesn't it comfort you to know that the software flying your airplane might be working by accident?

Answer #2: To write thread-safe code you have to follow a different set of rules. Actually an additonal set of rules, because all the old rules for writing good programs still apply.

Since single threaded code runs faster, is easier to write, and is easier to test than multithreaded code, why anyone would willingly go to all the effort necessary to write multithreaded code? Good question. The first decision that needs to be made when designing a multithreaded program is, "is this necessary?" If you can't come up with a compelling benefit for multithreading, go for the simple solution.

There are lots of bad reasons for multithreading, and only a couple of good ones. The good reasons I know of:

The task is inherently multithreaded so a multithreaded solution results in simpler code; or
The task can be cleanly decomposed into multiple sub-tasks that are highly independent; the independent tasks can use resources in parallel; and the benefits of this parallel usage outweigh the overhead of multithreading. (All three conditions must be true.)

Let me provide an example of the first case.

MPEG is a standard for encoding audio-video information. A stream of MPEG encoded data can contain many substreams. For example: an MPEG encoded movie recorded on a DVD might contain a single stream of video, two or three streams of video overlay (the subtitles in various languages); several streams of audio (the main audio track in different languages, etc. and the director's comments); and DVD navigation information to support fast forward, fast reverse, etc.

These substreams are multiplexed at a packet level. The overall data stream consists of a set of fixed-sized packets and each packet is part of a a particular substream. You could have a navigation packet, two video packets, and audio packet, another video packet, a subtitle packet, and so on.

The substreams themselves have a rich internal structure. For example the video stream contains sequences of variable bit-length, huffman encoded data fields. Suppose the video stream decoder has extracted the first five bits of an eleven bit field when it hits a packet boundary, it would be a nightmare to attempt to save the video-decoding state including the partially extracted field, and switch to a completely different context in order to be able to properly decode the audio packet that comes next.

Splitting the MPEG decoder into a main demultiplexing thread and independent decoding threads for each substream, and a mixing thread to manage the simultaneous presentation of the decoded threads dramatically simplifies design.

It is interesting to note that there are two synchronization hot-spots in the multithreaded version of the MPEG decoder. One is the point at which the demultiplexer passes a packet is passed to the specific stream decoder for this type of packet, and the other is the point at which the mixer accepts the decoded substreams for integration and presentation. Everything between these two points can and should be coded as if the program were single threaded.

These synchronization hot spots should be separate components. A possible high level design would be:

Accept and demux input
Queue packets for decoder(s)[MT]
Decode substream
Queue decoded streams for mixer.[MT]
Mix and present substreams.

Multithreading issues should addressed only in the two components marked [MT]. Everything else should be written as if it were single threaded (and protected accordingly.)

Friday, April 15, 2005

The moral equivalent of a mutex

In yesterday's post I used the phrase "The moral equivalent of a mutex." I claimed that it was not possible to write code that shares data between threads safely without one.

This prompted an anonymous response which cited Dekker's algorithm as an example of a software-only synchronization mechanism. I appreciate the response (even though I immediately rebutted it) because it prompted a train of thought about what the "moral equivalent..." is and why multithreaded code is so falupin' hard.

Mutex equivalents on Win32 include: CriticalSection, Event, Mutex, Semaphore, InterlockedIncrement, InterlockedDecrement, InterlockedExchange, and so on... Other OS's support some of these and have their own, unique variants with various degrees of arcanity (SYSV Semaphores, for example.) The point is that all of these objects are designed specifically to address thread synchronization.

Dekker's algorithm is interesting because it is an algorithm for implementing a critical section. I'd count it as the moral equivalent... with one caveat. It doesn't work unless there is an underlying hardware synchronization mechanism.

The algorithm contains the following code:

 
  flags[i] = BUSY;
  while(flags[j] == BUSY)
    <SNIP>
  <if you get here you have access to the resource>

The problem shows up in the following sequence of events:


  Thread 0: flags[0] = BUSY;
  Thread 0: while(flags[1] == BUSY) // false so thread 0 has access
  Thread 1: flags[1] = BUSY;
  Thread 1: while(flags[0] == BUSY) // flags[0] from cache is still FREE
                                    // so the condition is false and thread 1
                                    // also has access to the resource

I'm not saying that Dekker's algorithm is wrong. I'm saying that it contains an underlying and invisible assumption about how things are executed in the computer. In particular it assumes that operations on shared memory are atomic and immediately visible to other threads. If that assumption is correct then the algorithm works. Thus the algorithm reduces the problem of providing CriticalSection behavior to the problem of implementing the shared property.

* * *

A programmer reading code, has a mental model of how the machine works. Most of the time we use a very simple model -- things in our mental model happen sequentially in the order that they appear in the source code we are reading. Having this simple model is A Good Thing[TM] because it allows us to concentrate on what the program is supposed to be achieving rather than how it goes about achieving it.

The problem with this simple model is performance. The code may say:


for(int i = 0; i < 10; ++i)
{
  someFunction(i * k);
}

but the compiler may generate code that looks more like:


int j = 0;
do
{
  someFunction(j);
  j += 10;
} while (j < 100);

on many processors a literal translation of the second version will be faster than a literal translation of the first version -- so the language standards committees have given compiler writers freedom to provide the second version as a legal compilation of the first code.

If you observe the state of the system before this code executes, and after it completes, you can't tell the difference between the two versions. The only observable difference is that one version runs a bit faster.

The programmer gets to write code in a way that describes the algorithm most clearly (in his mind, anyway), and the processor gets to execute code that generates the desired result faster. Everybody is happy.

* * *

Multithreading changes the rules. Rather than observing the before and after states of the system, you now have to be concerned about every intermediate state that might be visible to another thread. A lot of discussions of multithreading present C source code and discuss the implications of an interruption occurring between each statement. The discussion of the incorrect algorithms that precedes the presentation of Dekker's algorithm uses this technique to identify the points of failure. This is a step in the right direction, but it's still not good enough.

Consider the following statement:


  volatile i;
  a[i] = b[i] + c[i];

and what happens if "i" is potentially changeable by an outside agency (another thread, a memory mapped I/O, etc.) For example, suppose that before executing this statement i has the value 0, but sometime during the execution of the statement i takes on a value of 1. How many possible outcomes are there for this statement?

The answer surprises many people. There are 8 possible outcomes because the compiler is free to evaluate the three instances of i in any order it chooses to. To analyze an algorithm containing the above statement in a multithreaded environment you must consider all eight of these cases.

So all we need to do is break each statement down into phrases that can occur in arbitrary order and analyze the effect of an interrupt between any two phrases. Are we there yet?

Well, it depends on how many phrases you see in the following statement:


    ++j;

Assuming int j;, this probably compiles into a single machine language statement: inc [j] -- hence one phrase, right?

Nope. At this microcode level, this statment says: retrieve the value of j from memory; add one to it; store the new value back into memory location j. That's two phrases (why not three? because "add one to it" is internal to the processor and therefore invisible to other threads.)

So, we've gotten to the microcode level. We must be at the right level of analysis by now.

Sorry, to truly understand you have to throw in instruction pipelining, and cache (remember cache.) Once you take them into account, then you model of what really happens in the machine is complete enough to analyze the thread-safeness of the probram.

Alas, pipelining and caching issues are truly beyond the control of the programmer, so the problem of ensuring thread-safeness appears to be unsolvable.

Except!

Thank goodness there's a way to tell the hardware to switch momentarily from its default anything-for-speed mode into a safe-for-the-silly-programmer mode. Every processor has at least one synchronization operation that does things like flushing and/or updating cache, locking the bus for a read/alter/rewrite cycle, etc. These operations tend to produce a dramatic slow down because they defeat all the work that went into designing a cache and a pipeline, etc to speed things up. The other problem is on many CPU's the hardware guys decided that these operations should be reserved for kernel mode, so enlisting the hardware's help may involve an OS call with the corresponding high-overhead context switch.

In any case, I think this justifies Rule #1: Multithreading is hard.

Blog Archive

Links

About Me