XST 5: A Niterator. An Iterator. Something.

I found the easy way to build an iterator in Lua, so we’ll do that and see whether it improves the code as much as I think it will.

As I was browsing the internet reading about how to write iterators in Lua, it came to me that a coroutine would be just the thing. So I searched for lua iterator coroutine and there it was.

The trick is easy. Imagine that you write a loop to print everything in whatever collection you’re iterating. For us it would look like this:

    for scope,elements in pairs(self.contents) do
        for element, _true in pairs(elements) do
            print(element, scope)
        end
    end

You think to yourself, you think “self, if that was a coroutine it would be just the thing”. So you rewrite it:

    for scope,elements in pairs(self.contents) do
        for element, _true in pairs(elements) do
            coroutine.yield(element,scope)
        end
    end

So then you think to yourself, you think “self, how can we wrap that up to be a coroutine on the inside and a function like pairs on the outside?” And you find out that coroutine.wrap is made just for that purpose.

So you write your test, planning just to print at the beginning.

        _:test("elements", function()
            local ron = XSet()
            ron:addAt("Jeffries","last")
            ron:addAt("Ron","first")
            ron:addAt("Pinckney","city")
            local chet = XSet()
            chet:addAt("Hendrickson","last")
            chet:addAt("Chet","first")
            chet:addAt("Atlanta","city")
            local ricia = XSet()
            ricia:addAt("Hughes","last")
            ricia:addAt("Ricia","first")
            ricia:addAt("Pinckney","city")
            local A = XSet():addAt(ron,NULL):addAt(chet,NULL):addAt(ricia,NULL) 
            for e,s in A:elements() do
                print(e.."@"..s)
            end
        end)

And then you code:

function XSet:elements()
    -- return an iterator for contents
    return coroutine.wrap(function() self:element_co() end)
end

function XSet:element_co()
    for scope,elements in pairs(self.contents) do
        for element, _true in pairs(elements) do
            coroutine.yield(element,scope)
        end
    end
end

And then you improve the test:

        _:test("elements", function()
            local ron = XSet()
            ron:addAt("Jeffries","last")
            ron:addAt("Ron","first")
            ron:addAt("Pinckney","city")
            local chet = XSet()
            chet:addAt("Hendrickson","last")
            chet:addAt("Chet","first")
            chet:addAt("Atlanta","city")
            local ricia = XSet()
            ricia:addAt("Hughes","last")
            ricia:addAt("Ricia","first")
            ricia:addAt("Pinckney","city")
            local A = XSet():addAt(ron,NULL):addAt(chet,NULL):addAt(ricia,NULL) 
            local result = "{\n"
            for rec,_null in A:elements() do
                result = result.."    {\n"
                for field,scope in rec:elements() do
                    result = result.."        "..field.."@"..scope.."\n"
                end
                result = result.."    }\n"
            end
            result = result.."}\n\n"
            print(result)
        end)

This should pretty print all the records of the set A. And if you’ll look at the left hand column here, you’ll see that it does.

pretty print of all three records

Since that test actually calls elements twice, an outer loop and an inner one, we can feel pretty sure it’s working. We need to figure out a test that isn’t printing. But let’s commit: XSet:elements() returns an iterator you can loop with.

If we were to run this test multiple times, we’d find that the fields of the records occur in different orders:

city first

And it turns out that the records themselves can occur in any order:

chet first

That’s because our storage scheme is hash tables, so the order of things is whatever order Codea happens to get things in. (A deeper explanation is possible, but I don’t have it at my mental fingertips.)

This is actually OK. Extended sets do not have an inherent notion of sort order, although we do have the notion of an ordered n-tuple <a,b,c>, which is {a@1, b@2, c@3 }. We are not presently making any use of that mathematical fact.

So how can we reasonably test our elements function? Well, how about if we were to make another set while we loop, and check the two for equality? That might be interesting.

An issue is that our implementation of equal is weak:

function XSet:equals(B)
    return self:isSubset(B) and B:isSubset(self)
end

function XSet:isSubset(other)
    local flag = true
    local function find(e,s)
        if other:hasntAt(e,s) then
            flag = false
        end
    end
    self:iterate(find)
    return flag
end

function XSet:hasAt(element, scope)
    local tab = self.contents[scope]
    if not tab then return false end
    return tab[element] == true
end

function XSet:hasntAt(e,s)
    return not self:hasAt(e,s)
end

The hasAt function is looking in the set’s table to find a specific element. Since we’re comparing the two outer sets, A and B, and since we have created new records in our loop, the records may have the same contents but they are not identical, so the hashing won’t find the A record in the B set.

As written, our hasAt can’t do what we need.

I’m torn here. I a sure that elements is working correctly, but we don’t have a test for it that checks values automatically. But to really do that test we need some things we don’t have.

Let’s step back.

Stepping Back

What do we need here? I’m not sure that’s an easy question. Our storage scheme, a contents table containing a hash from scope to a table of element to true has help up well until now. However, if we have a set in hand, and we want to know if that set is an element of some other set, we will have to ask, whether there is, under the corresponding scope, a set that equals our set. That’ll do a recursive call to equals, which will do recursive calls to isSubset, which ought to work, but it seems we should be able to do something less horribly inefficient.

We should at least consider the math.

A set A has a given element e at a given scope s iff
    there exists an element a@s in A 
    such that a equals e.

XST has the notion of “rank”, I think it’s called. The rank of an atom is 0. The rank of A = {a,b,c} where a,b,c are atoms is 1. The rank of a set, in general, is one more than the highest rank of the elements of the set.

If we had rank, we could use it to decide what to do in our hasAt. But we cannot easily cause strings and numbers to respond to rank. (Strings are probably possible. Numbers, I think, are not.) We could cheat and ask whether the thing responds to rank. That’s messy. We could ask whether the element is an XSet, also messy and troublesome if we ever have sets that are not subsets of XSet.

I wonder if there’s some way to make equal sets hash to the same value. Meh. Shouldn’t be worrying about efficiency at this point, should be worrying about getting it formally correct.

Let’s write it out so that it’ll do the recursive thing and see if we can make it work. If we can, our test will start to run, because we’re sure that we have in fact correctly iterated the set and created a copy of it.

We have a function exists. Let’s look at that:

function XSet:exists(booleanFunction)
    for scope,elements in pairs(self.contents) do
        for element, _true in pairs(elements) do
            if booleanFunction(element, scope) then
                return true
            end
        end
    end
    return false
end

We can improve that now, I think. Let’s try. What’s our commit status? Only change is our new elements test. I commit: elements test fails. If I believed in branches, I’d be one now, probably.

Improve exists:

function XSet:exists(booleanFunction)
    for element,scope in self:elements() do
        if booleanFunction(element, scope) then
            return true
        end
    end
    return false
end

Now we can enhance hasAt. I’ve discovered that Lua has a function type that returns the type of a variable. We can use that for now.

Meh. Can’t really use exists. The thing I have isn’t an XSet. Write it out:

function XSet:hasAt(element, scope)
    local tab = self.contents[scope]
    if not tab then return false end
    if type(element) ~= "table" then
        return tab[element] == true
    else
        for e,_true in pairs(tab) do
            if type(e) == "table" and e:equals(element) then
                return true
            end
        end
        return false
    end
end

We’re assuming here that if it’s a table, it’s one of our XSets and therefore responds to equals.

Would you believe that the tests are now passing? They are.

I’m going to make the test fail by putting something additional into B:

...
            local A = XSet():addAt(ron,NULL):addAt(chet,NULL):addAt(ricia,NULL) 
            local B = XSet()
            for rec,_null in A:elements() do
                local temp = XSet()
                for field,scope in rec:elements() do
                    temp:addAt(field,scope)
                end
                B:addAt(temp,NULL)
            end
            B:addAt("hello","greeting")
            _:expect(B:card()).is(3)
            _:expect(B:equals(A)).is(true)
        end)

Tests fail:

10: elements  -- Actual: 4, Expected: 3
10: elements  -- Actual: false, Expected: true

Remove that flaw and commit: XSet:equals robust enough to handle sets containing sets.

I think it’s a decent time to do a quick retrospective and decide whether we’re done for the morning.

Retro

So. That went well. The coroutine iterator technique that I found worked perfectly. It has been shown to work to improve the code in the exists method, and it will improve any code where we iterate over elements.

Are there other instances of that?

We have this function:

function XSet:iterate(f)
    for scope,elements in pairs(self.contents) do
        for element, _true in pairs(elements) do
            f(element, scope)
        end
    end
end

We use that in intersect, isSubset and in restrict. I think we can perhaps recast those methods, but that requires thought. We can improve iterate directly:

function XSet:iterate(f)
    for element, scope in self:elements() do
        f(element, scope)
    end
end

Tests run. Commit: use elements in iterate.

So that improves the code a bit, converging down to fewer bottom-level methods that know how things work.

The Math

Let’s talk about the math. Formally, it all comes down to a single question, whether the element e is found at scope s in the set X. We call that hasAt just now. (Informally, we need to be able to iterate the elements of a set to implement hasAt, and we need a way of setting elements, which we call addAt. Both of these are outside the math, which just considers sets as if they all just existed.)

So that means that at the bottom of things, so long as we can implement an operator in terms of iterated hasAt, we can make any operation work on any set no matter what internal format it has.

Let me make that concrete. Suppose we want to define sets in comma-separated-values format (CSV). A CSV table’s first row is generally supposed to be its field names. So if we can implement an elements operation on a CSV table, and a hasAt, we should be able to find values in CSV records right now.

The implication of this is, I think, the justification for the XST approach to information management. If we can express a data structure so that we can ask hasAt and we can iterate it, our entire system, all our set operations, will work on that data. We might also want to implement writing that data format, but just the ability to read and process any data structure is incredibly powerful.

In future articles, we’ll be doing just that. We have a story that says:

Provide at least two alternate storage structures for the restrict operation.

And we will in fact implement that story. (Or, I suppose, we’ll fail utterly, but at the moment I feel quite good about at least that much accomplishment.

So, mathematically, if we implement hasAt and elements on all our data structures, and if we make sure that all our high-power operations all come down to hasAt and iteration, we can do great things.

Of course, Codea can hardly even read files, so our great things will be about the size of an iPad, but we could always port the program to the Mac if we wanted to.

Curly Sets

Years ago, last century actually, I created the term “curly sets”, by which I meant sets with lots curly braces in them, that is, sets containing sets and sets and sets. Our personnel table in the tests is just a little bit curly. We could imagine a similar table where city scope had an element consisting of a set of all the names of all the cities where the person had ever lived. Since our personnel table is rank 2, that ew kind of table would be rank 3.

A question comes up as to what constraints we should put on our atoms and our sets. Should we allow scopes to be sets? Formally they can be, though our current storage model will almost certainly not support them.

Ideally, I think we’d settle on our atoms, probably strings and numbers, maybe just strings, and then we’d allow elements and scopes to be atoms or sets, and make it all work.

In practice, that may not be what we want to do. But there are reasons that argue for it. For example, XML and JSON can both represent nested information that would turn out to be curly sets if we wanted to support them. If we can figure out a base format for XSet that will support sets as scopes, that would add to the power of what we have here, and add to the credibility of its ability to process wide ranges of data structures.

I haven’t decided whether that’s interesting to explore or not. There comes a time when you get down to decoding XML or JSON into your set format and it’s more of a tedious work task than it is interesting. So I might draw the line at CSV. Then again, I might not.

I honestly don’t know.

Current Bottom Line

We have a rather nice system taking shape. It’s not quite as general as we might want, and I think the iterate function, while it got us in the air, might not be what we want to retain.

We’ll explore further as the days wear on. To cheer yourself up, read this:

Flowers of Evil: Ask Charles Beaudelaire

See you next time!