Warren Falk

Idea: Flow Control Functions

May 13th, 2009 by Warren Falk

Background: Flow Control Syntax

Your average conventional programming language can be logically divided into two parts, (1) the metadata part and (2) the instruction part.  These are usually mixed together in one unifying syntax, but it is easy to logically separate the two in your mind.  The instruction part is the body of the functions.  The metadata is everything else (or, well, usually.  C#, for example, lets you put expressions into initializers which is technically instructional syntax, albeit crippled syntactically - for the purposes of this post, this is irrelevant so we’ll ignore it).  We don’t care about the metadata for this post, just the instructional code.

You can then further logically divide the instructional syntax into the (1) flow control and (2) statements.  The statements are the expressions (including math like “a + b * 3” and function calls like “foo()“) and assignments (such as “variable = 0“).  This leaves the flow control statements, for, foreach, do, while, if/else, try/catch, etc…  Not every language names them identically, and not every language has every construct, but every language has flow control syntax.  Flow control is what this post is about.

Traditionally, these statements are built into the language at the language level and are available even if the standard libraries are unavailable.  There are some exceptions, such as in more functional languages like Lisp, and even Ruby.

This is precisely the basis for this post.  I want to explore what it would take to build all of these into the standard library, while simultaneously making it possible for developers to create their own.  What are the pros, and what are the cons?

“Functionizing” Control Flow

Ruby allows something like this.  What you might do with a for loop in one of the traditional languages, in Ruby, you do with a function that takes another function as a callback.  So instead of:

for (i = 0; i < 4; i++) { do(stuff); }

in ruby, you use the “times” method of integers:

4.times { do stuff }

There is a price to be paid in performance for this ability, in Ruby.  That block of code is an anonymous method in Ruby and requires a managed object which has to be allocated on the heap, then later collected for every usage.  The for loop, above is all inline.  The block of code that makes up the body of the loop is not used anywhere else, so there’s no point in setting up an object on the heap etc.  This makes the for loop above much faster than the Ruby alternative.

A smart enough compiler, on the other hand, could give you the convenience of the function-based control flow with the native speed of the implementations in the conventional procedural languages.

Additionally, I think that with a slick enough language, it should be possible to implement most, if not all, of the control flow models listed before.  (i.e. not just for, and while, but also if/else, and try/catch).  This way, things like while would be implemented as special “control flow” methods included in the standard library.  This detail would be unnoticeable to most developers at first, but it opens up a whole new world in terms of language extensibility.

The while implementation is straightforward (what the syntax should look like, I am not yet sure), but basically it defines a way to test a condition which would act like an input parameter, and then based on that condition, run a block of code, after which it would test the condition again and so on.  It’s simple.  A foreach would work similarly in that it would take a block of code, but instead of a condition would take an enumerable object.  It may also, for example, initialize a variable before every loop to represent the current object.

Those are straightforward ideas, but the best part about this is that it should allow you to invent your own control flows, such as a switch-like statement which worked by hashtable, or something like a custom synchronization lock which allows you to do something within a block of code after which the lock is released (like C#’s “lock” keyword, but customizable).  You could also create a with-like statement that I’ve described in my page about reinventing the development environment.

There are probably other uses for this I haven’t yet thought of.  These are the pros.  There are potential cons, however.  The one I worry about the most is that, as a general rule, if you design a tool for good, bad developers will find a way to use it for evil in ways it was never designed.  (Sometimes they seem to do these things for no other reason than to accomplish tasks in the most obscure way possible).  Extra precaution should be taken to curb that where possible, and also make it intuitive for developers to still be able to read, and navigate someone else’s code.

ANTLR

May 13th, 2009 by Warren Falk

I’m trying to write a parser to recognize Rich Text Format (RTF) and parse it into a meaningful syntax tree.  I’ve decided to use ANTLR which is a tool which is used to write a lexer/parser/tree parser combination.  It is supposed to make this task easier.  So far it has not been.  Either this is because I don’t know how to use the tool, or the tool is just not very useful.  I am still presuming the former, at this point.

ANTLR makes a distinction between “lexing” and “parsing”.  One uses a lexer to break the character stream of a file into “tokens” which are groups of one (actually zero) or more characters.  A parser, then, should be able to take the new stream of tokens and create a tree out of them.  This tree is known as the “abstract syntax tree” or AST.  Now the hard part here, if you’re new to this, as I am, is trying to figure out just what the difference is between a token and a node in the AST, because you have to create “lexer” rules to tease tokens out of the character stream, and then later use “parser” rules to further group/assemble those tokens into nodes in an AST.

The distinction is not obvious, and probably partially subjective, which makes figuring out how to accomplish this quite difficult using the standard try/fail/adjust/retry loop.

For example, here’s a very simple rtf document:

{\rtf\asc\lang1024 {\par This is a test}{\par of the \{emergency\} broadcast system}}

…which would be rendered, roughly, as follows:

This is a test
of the {emergency} broadcast system

Now the “\rtf” and “\lang1024″ are called “controls”.  The “This is a test” and “of the {emergency} broadcast system” are the regular text, and the whol document is in a hierarchy of “groups” delimited by curly braces.  This is a bit of an over simplification, but not by much.  The question is now, when the lexing is complete, where should the lines be drawn between tokens?

Eventually I began to think the tokens should be separated into something like the following:

{, \, rtf, \, asc, \, lang, 1024 , {, \, par, This is a test, }, {, \, par, of the \{emergency\} broadcast system, }

But then I finally got it to work in a reasonably useful manner by breaking it up like this:

{, \rtf, \asc, \lang1024 , {, \par, This is a test, }, {, \par, of the , \{, emergency, \}, broadcast system, }

But this was not obvious, and I’m not even sure whether this is the best way to break this up into tokens, but it ended up working the best.  Some interesting things arose here.  Such as that I expected to break up the “controls” into their constituent parts (the ‘\’ the name and the optional number) but it turns out that the whole control construct works out, nicely, as a single token.  Unfortunately, I’m not quite sure why.  Also, I tried to get the text to be one big token, but had to break it up into strings of text without control sequences and strings of text with control sequences.  I’m not sure why, in this case, either.

Language Support for Concurrency

May 12th, 2009 by Warren Falk

I posted yesterday about an idea for language support for automated thread pool usage.  (I also have published an entire page about reinventing the whole development environment which remains a draft, but is published on the site nevertheless and viewable here.  It doesn’t yet contain my half-hatched thread pool idea, yet).

On the same day I published my post about the thread pool idea, I read on Slashdot that Microsoft have an “incubator” project called, “Axum“, which is their concurrency-friendly language.  I took a quick glance - much too quick to form an opinion.  Their idea is not the same as my thread pool idea, but there are some conceptual overlaps.  Interestingly, the new language does automatically take advantage of Microsoft’s APM (Asynchronous Programming Model) functions, also known as “Begin/End”.  Perhaps worth checking out.

Idea: Automatic Thread Pooling

May 11th, 2009 by Warren Falk

I had an idea for integrating automatic thread pool usage into a programming language.

First, if you don’t understand thread pools and how they are best used, and how they can be superior to manual thread management, then you should read up on how and why thread pools work - but basically keep in mind that thread pools, if used optimally, can get you closer to linear scalability than any other model.

Also, it’s worth reminding yourself that your application spends almost no time on the processor.  Unless you’re doing 3D rendering, or video encoding, the processor probably sees brief flashes of your application separated by what are relatively large stretches (a virtual eternity from a processor’s perspective) where every thread in your application is in a wait state.  Most importantly, this is usually true even during the heaviest use of your application, because the processor is simply not the bottleneck these days (except in the rare cases already noted).

If you aren’t using a thread pool, or you are using a thread pool but doing so suboptimally, your application may be in a wait state when work actually needs to be done.  This is what destroys scalability.  In a thread pooled environment, this happens when all the threads in your pool are stuck in a wait state, waiting for one of the things threads wait for (disk access, network access, synchronization lock, etc).  More advanced pool management can detect this situation and increase the number of available threads so that work can get done while those other threads sleep.  This helps with scalability, but comes with its own expense thereby making the scalability nonlinear (i.e. the processing power you were able to tap didn’t go as far as the processing power you already had available).

This situation is avoidable.  You can avoid it by breaking your application up into segments of code that never ever go into a kernel wait state.  You do that by using the async requests available for those requests.  Instead of calling Read() on a stream, you call BeginRead() at the end of one segment, and then EndRead() at the beginning of another.  When your segment calls BeginRead() and quits, the thread in the thread pool can then move on to the next work item without delay, and when the read completes, the next segment can be queued to handle it.

The problem is that coding like this is difficult.  It’s difficult because breaking up your code into linear functions is natural, but dividing a function into multiple functions along every blocking call is just not intuitive, and it becomes quite difficult to do things like loop.  How do you start a for loop in one function that terminates in another?  You can achieve this result by passing a state variable between calls, but this becomes extremely complicated.

With version 2.0 of C#, we’ve seen the compiler rig up some interesting code for us to support automatic enumerators.  We can create a function that returns IEnumerable<string> and then yield return strings from our procedure as many times as we want.  This is much easier than the alternative which is to create a new class that implements IEnumerable<string>, and another class that implements IEnumerator<string> which has our member variables, and responding to Next() requests instead of just calling yield return.  That’s trouble enough, but try doing a for loop in the middle of that and you’ll see just what the compiler is doing for you automatically when you use yield return.

If the compiler can break up the developer’s enumerating function into a couple of class definitions and multiple functions, then the compiler should be able to do the same thing for our function’s blocking calls.

Here is an example function which takes an array of HTTP URLs, fetches them, and writes them all out to a file, “output.txt”.

void MyFunction(string[] requests)
{
    using (Stream output = File.OpenWrite("output.txt"))
    {
        byte[] buffer = new byte[1024];
        foreach (string request in requests)
        {
            HttpWebRequest hreq = (HttpWebRequest)WebRequest.Create(request);
            HttpWebResponse hresp = (HttpWebResponse)hreq.GetResponse();
            using (Stream input = hresp.GetResponseStream())
            {
                int len;
                while (0 != (len = input.Read(buffer, 0, buffer.Length)))
                {
                    output.Write(buffer, 0, len);
                }
            }
        }
    }
}

I’ve highlighted in red the functions which are blockers.  Calls to these functions will put the thread to sleep for what could be a very long time in processor terms.  For this reason, these functions have asynchronous, Begin/End versions which can be used instead.  However, try to do that with the above code and you are left with the following abomination:

class MyFunctionState
{
    internal Stream output;
    internal Stream input;
    internal byte[] buffer;
    internal string[] requests;
    internal IEnumerator request_enumerator;
    internal HttpWebRequest hreq;
    internal HttpWebResponse hresp;
    internal string request;
    internal int len;
}

void MyFunction(string[] requests)
{
    MyFunctionState state = new MyFunctionState();
    state.requests = requests;
    state.output = File.OpenWrite("output.txt");
    state.buffer = new byte[1024];
    state.request_enumerator = requests.GetEnumerator(); // for loop
    MyFunction_Segment1(state);
}

void MyFunction_Segment1(MyFunctionState state)
{
    if (!state.request_enumerator.MoveNext()) // for loop
    {
        // after for loop
        state.output.Dispose();
        return;
    }
    state.request = (string)state.request_enumerator.Current;
    state.hreq = (HttpWebRequest)WebRequest.Create(state.request);
    state.hreq.BeginGetResponse(new System.AsyncCallback(MyFunction_Segment2), state);
}

void MyFunction_Segment2(IAsyncResult result)
{
    MyFunctionState state = (MyFunctionState)result.AsyncState;
    state.hresp = (HttpWebResponse)state.hreq.EndGetResponse(result);
    state.input = state.hresp.GetResponseStream();
    MyFunction_Segment3(state);
}

void MyFunction_Segment3(MyFunctionState state)
{
    state.input.BeginRead(state.buffer, 0, state.buffer.Length, new AsyncCallback(MyFunction_Segment4), state);
}

void MyFunction_Segment4(IAsyncResult result)
{
    MyFunctionState state = (MyFunctionState)result.AsyncState;
    state.len = state.input.EndRead(result);
    if (state.len == 0) // while condition
    {
        // after while loop
        state.input.Dispose();
        MyFunction_Segment3(state);
        return;
    }
    state.output.BeginWrite(state.buffer, 0, state.len, new AsyncCallback(MyFunction_Segment5), state);
}

void MyFunction_Segment5(IAsyncResult result)
{
    MyFunctionState state = (MyFunctionState)result.AsyncState;
    state.output.EndWrite(result);
    MyFunction_Segment3(state); // top of while
}

If this made you cringe, then you’re not alone.  There are several things to notice here:

  1. The result is extremely ugly and hard to follow, but does achieve scalability.
  2. You have an extra state class and several extra functions which have no purpose except to implement what would conceptually be part of the original function.
  3. The using, while, and foreach all had to be manually implemented, and the using was actually not implemented as a try/catch as it should be (it probably could, but that would have been even more confusing).

I propose that the language provide a keyword by which you could have all this done for you; much like yield return does so for you with automatic enumerators.  Just what the keyword should look like, and how this should be implemented, I’m not 100% sure yet, but I wanted to get this idea down before I forgot it.

In this example, by the way, the MyFunction() method returns immediately before it is complete.  This may not be ideally how this would work (i.e. you might even want to return a value… I’ve left off this detail for now.  Adding this should be fairly trivial).

Singletons by delegate

February 2nd, 2008 by Warren Falk

Singletons are a common programming model, wherein a program ever only uses a single instance of a certain class, and that instance is shared program-wide.

The typical implementation of this is an if-null/lock/if-null setup that looks something like this:

class OnlyOne
{
  private OnlyOne _instance;
  public OnlyOne Instance
  {
    get
    {
      if (_instance == null)
      {
        lock (typeof(OnlyOne))
        {
          if (_instance == null)
          {
            _instance = new OnlyOne();
          }
        }
      }
      return _instance;
    }
  }
}

The reason for the if-null/lock/if-null is one for a different discussion, but it is basically the fastest portable way to allow thread-safe access to the single instance. However, it requires that the first if-null be executed every time a request for the instance is made. If your instance never changes, this is needless overhead.

I developed a quick and dirty solution using delegates. The basic idea is that when you create the first instance and you know it can no longer be null, you change the “GetInstance” routine to point to a routine that does not check for null. I created a generic for this. Here’s the result:

public class Singleton<T> where T: new()
{
  public delegate T InstanceGetter();

  private static T _instance;
  public static InstanceGetter GetInstance = new InstanceGetter(_GetNewInstance);
  private static T _GetNewInstance()
  {
    if (_instance == null)
    {
      lock (typeof(T))
      {
        if (_instance == null)
        {
          _instance = new T();
          GetInstance = new InstanceGetter(_GetInstance);
        }
      }
    }
    return _instance;
  }
  private static T _GetInstance() { return _instance; }
}

And then reusing it is quite easy:

class OnlyOne : Singleton<OnlyOne>
{
}

Now only the first call to OnlyOne.GetInstance() will actually check for null in a single threaded environment. And in the worst-case multithreaded environment, only the first simultaneous calls will check, the rest will just return the instance immediately.

My tests showed about a 25% performance gain. This is actually of questionable usefulness because you should not be calling the GetInstance method in a tight loop, you should call it once and assign it to a local. But I thought I’d share the basic idea because using delegates in this fashion might be useful in other scenarios.

Monodevelop

December 11th, 2007 by Warren Falk

I’m still relatively new to Mono, which is a cross platform, open-source implementation of .Net. The implementation has proved surprisingly gap free for the most part.

But, what Mono suffers from is its lack of Visual Studio. Much of what makes .Net development so effortless is what is a really good implementation of an IDE (refactoring, a really good debugger, smart tags, and several other nice things). An IDE is beyond the scope of the Mono project, and another project, “Monodevelop” is filling in that gap.

Unfortunately, there are still gaping holes which makes it frustrating to anyone who has been spoiled by Visual Studio. I miss the refactoring, the smart tags which prompt you to effortlessly add a “Using” if necessary, but most of all, a debugger.

It is not hard to write complex code, it isn’t hard to write code quickly, but if you need to do both, you really need a debugger to get it right. And this is where Monodevelop (and also Mono) are way short. It’s relatively young still and free open source; and most importantly the developers are planning to implement all these things. I may see about getting in on that myself. But for now, Monodevelop is slightly refreshing but mostly frustrating.

Latest mono on Ubuntu

December 5th, 2007 by Warren Falk

The current version of mono on Ubuntu 7.10 is 1.2.4.2. There are fixes in 1.2.5.2 which I need. Getting 1.2.5.2 built is no problem, but getting it installed without causing all kinds of problems appears to be. This is something I miss about Gentoo.

I’ve tried building my own package, but the official packages for mono are split up into many smaller packages which made that impossible.

I think what I’ll have to do in order to upgrade is to get the source package using apt-get source, unpack it all, and see if I can just replace the 1.4.2.2 code with the 1.2.5.2 code.

I have already pulled the source package down successfully and unpacked it. It looks like the Ubuntu guys applied many patches to it. I don’t know if those patches will work with the new version of code, or if the new version of code contains those patches. It’s probably some combination of the two. This will take some time. I’ll post a follow up if I ever get this working.

Back to Linux

December 4th, 2007 by Warren Falk

Ubuntu IconOver two years ago I shed Windows from my desktop and built a Gentoo desktop. At the time, I was working from home only and had lots of extra time to tweak, compile and troubleshoot. When I changed jobs again, the lack of time, and compatibility issues with being back in a Microsoft environment at work forced me to abandon my Gentoo and go back to Windows.

Well, I’m still in a Microsoft environment at work, and I still don’t have any time, but now I’m trying to make a go of it again. This time, however, noting my lack of available tinker time, I’m going with the flow and have installed Ubuntu. (I love Gentoo but just don’t have the time to deal with its idiosyncrasies right now).

PHP Class Autoload

August 31st, 2006 by Warren Falk

It’s been a long time since I posted.  I just wanted to write up a quick post today to say how much I like the new __autoload() function in PHP.  For those not familiar with it, the function gets called when a class is used in PHP that has not been defined.  This gives the developer a chance to allow the script to find the include file it needs and include it to define the class.

The function’s not all that new, but I’m just getting around to using it and it’s changing the way I design my sites.

So now I don’t use ever need to manually include a file anymore.  I don’t use any global functions.  I make all my otherwise-global functions static functions of some class and call them that way.  I never have to remember to include the file at the top of the page, I never have to worry about including it twice because of the quirks of include_once(), and I don’t have a bunch of orphaned include()s at the top of the script anymore from removing a function but leaving it’s include.

I’m sure there’s someone out there that thinks this isn’t a good idea for whatever reason.  I’m all ears, but for now, this has made my PHP development much easier.

Stupid Standards

June 1st, 2006 by Warren Falk

I’m all for standards in web browsers. But it really bugs me when the standards-compliant behavior is worse than the non-compliant browsers’ behavior. So here we have made a standard protocol, and to follow the protocol means making your browser behave poorly. This isn’t always the case. Quite often the standard is better, and for the most part a standard is almost always better than no standard.

It’s almost strange how little agreement there is between browser makers on the best way to implement what’s known as the box model. The box model refers to the dimensions and spacing given to html tags. For instance, a box in html has the main body where the text goes, surrounded by the padding which is filled with the object’s background color but is otherwise empty of text. That is surrounded by the border, which is surrounded by the margin, which is more empty space which is transparent and collapses against (is allowed to overlap) margins of other elements. The confusion comes when it’s time to set the width of the element. To which part of the box model does/should the width refer?

The standard apparently says that the width refers to the body part, not the padding. Internet Explorer, on the other hand, says that the width refers to the body plus the padding plus the border. Oddly, in standards compliant mode, IE still says that the width refers to the body plus the padding. IE’s “quirks” way makes the most sense to me. The standard way does not. Why? Because the standard way makes certain things impossible, whereas, IE’s way does not.

Take for instance a table of data. In one column of the table, you have an input field in each row so that the user can edit information. You want the input field to expand horizontally in size to fill the cell it’s in. So you set its width to 100% and that works perfectly, except you notice that the text seems to butt up against the left edge too closely, so you add padding to the left (about five pixels (5px) or so). Ah, that seems to work fine in IE, but in Firefox, now, the right edge of your input field has encroached five pixels into the neighboring cell on the right. Why? Because the 100% width you set doesn’t apply to the padding. So how do you take the padding into account? You can’t, because you can’t do math in CSS. Otherwise 100% - 5px would be nice. Or it might be nice to be able to actually specify which part of the box model you want width to affect.

Now IE gets a lot of flack when it doesn’t adhere to the standard, and for the most part it should. And I like that IE can be switched into “compliance” mode with a DOCTYPE declaration (even though it is only a little more compliant). However, I often find that it is the standard which is most annoying.

.
Entries (RSS) and Comments (RSS).