Encoding messages has a long history. Ric Parking looks back at how this affected computing.
A couple of weeks ago, I attended the new ACCU autumn conference held at Bletchley Park - home of the Enigma Code crackers. I won't go into much detail - there should be some writeups in the next CVu - but suffice to say it was a splendid day, full of fascinating details of the history of computing, held at what could be described as its birthplace.
One remarkable part of the day was going around the National Museum of Computing, which is based in some of the maze of sprawling huts that housed much of the original code breaking activity. You can imagine that wandering around with around 90 fellow computer geeks made for some interesting anecdotes, especially once you got to the rooms holding the earliest home and personal computers - coos and sighs of "Ah I had one of them" abounded. But one thing that really stuck in the mind was how limited those computers were compared to now, and yet how the designers and users were so ingenious at getting around those limitations. My personal favourite was on the WITCH [ WITCH ] - it could use punched paper tapes to load a program to run from its central store, but as that store was very small they also used loops of tapes to hold subroutines that could be run directly from the paper (this had one side effect that you'd have to write your program so that a subroutine mustn't get called too many times - after about 500 calls the paper would break!)
But the main thrust of the day was all about codes and code breaking, and it is significant that the first computers were created to make and break codes. At the most simple level, it is because of the sheer amount of repetitive actions that are involved, which have to be performed accurately. But I think there's something deeper going on here: codes and cyphers are all about manipulation of symbolic representations, which when you think about it, is all that a computer ever does - even the simplest number type is actually stored as a pattern of bits, and it's the interpretation of that pattern that makes that pattern 'be' the number. The fact that when you tell the computer to add two 'numbers', it actually does manipulations of the symbols, such that the new interpreted pattern will be as if it had added the two interpreted input numbers. (This idea is not exactly new - see sidebar) How it achieves it can be very simple, or very complex. An example would be 'multiply by two' - a compiler could use the knowledge that numbers are stored as binary to turn that into a bitshift operation, or invoke a large multiplication routine.
This sort of thinking can be very useful at times. For example, when I was trying to get my head around the various flavours of Unicode, what really made things fall into place was realising that a 'Character' is really just an interpretation of one or more 'Values' - years of generally only using 7 bit US ASCII had lulled me into conflating the two. Realising that the interpretation is just as important as the actual values involved suddenly made everything make sense. This then also led me to 'get' a lot of the 8-bit extended ASCII issues.
Symbolic Computing and Alan Turing |
Precursors go all the way back to Babbage, but the purest form is surely that of Alan Turing (who of course is the most famous of code breakers at Bletchley), in his paper 'On Computable Numbers, With an Application to the Entscheidungsproblem' [ Turing ]. In it he proposed a thought experiment of a machine that worked on an infinitely long paper tape divided into cells that could either be empty or be filled with a symbol. It had a readwrite head that would be positioned on a single cell, read the contents, and could write or erase a symbol. It could also move the tape left or right. A program was a simple table of rules. Each rule consisted of what actions to do depending on the symbol at the current position. Each action would erase or write a symbol at the current location, then move the tape left, right or stay still, and then change which rule to now apply. Despite its extreme simplicity, Turing and Alonzo Church suggested that any computable algorithm could be written on a Turing machine (while not completely proven, this is now pretty much accepted as true). Turing went further and wrote a set of rules whose first act would be to read in from a tape the description of a second set of rules, and then process the rest of the tape as if it were the machine described by that input. In other words, a programmable computer. |
And it goes further. How we choose to represent information can have a significant effect on what we can do with it. Choose the right data structure and a program can perform quickly in a small memory footprint. Conversely a poor choice can mean it runs slowly, consume excessive resources, and perhaps it might not even run at all - a totally unsuitable choice may make implementing an algorithm too difficult, such that it'll either never be completed or be too buggy to use. Take as an example the problem of organising 24-hour support cover. Everyone in your team volunteers the hours they can cover, and you need to check that someone will be on call at any time. One approach would be to make a list for each person of the blocks of hours they can do, then iterate over them and remove that hour from a list of uncovered hours (taking into account that the hour might already be covered). That's rather complex, will probably have lots of memory allocations/deallocations, and will have some slow lookups. Might even contain some bugs. An alternative is to represent the hours a person can cover as a bitmask, then checking that all hours are covered is just a matter of ORing together all the masks, and checking all the bits are set.
One of the few pictures of Alan Turing as an adult, running in 1946 |
Figure 1 |
So in a sense, everything in a computer is already in code, it's just that we know how to interpret it. Encryption is only slightly different in that the author is now actively trying to hide the method of interpretation. (Although technically what's usually being hidden is the key used to drive an encryption algorithm, as it's easy to create a new key but very very hard to create a new algorithm). How hard they try to hide it depends on how much it costs to do the encryption, and how valuable the information is. Let's quickly look at those two variables.
The cost of encryption is usually due to difficulty of knowing what to do, inconvenience in doing it, and the cost of actually doing so, in time and opportunity cost (that is the value of what else you could have been doing instead). Many of these costs have plummeted over time, as encryption algorithms have been publicised, and automated, and computing power has increased so it doesn't take too long to encrypt to a fairly strong standard.
The value of the information can vary widely and depends on many things. Simple monetary value is obvious. From Bletchley we learn that government and military information can be very valuable, as knowing where a U-boat pack is can mean the difference between a convoy getting through with vital equipment, or being sunk. This also hints at the temporal aspect of information's value - it's not much use knowing where those subs were a week after they've sunk your ships. So the value of information often decreases over time, sometimes very rapidly. This is a very useful guide for deciding how strongly to encrypt something - you want the information to be worthless by the time it is cracked. There's also a rather subtle value of the mere existence of a message. This is exploited by traffic analysis, which tries to gather information from the patterns of messages over time and where they are sent, so you don't even need to decrypt the messages to get information. For example using mobile phone records to track a gang planning a robbery - from who calls whom you can often learn about the command structure, and from number and time of the calls you can spot that something is about to happen. You can defeat traffic analysis by hiding the patterns in noise, such always sending a message at the same time each day even if it only contains the equivalent of 'This page left intentionally blank', but this increases the cost of hiding your information.
Putting these two costs together, you should choose to encrypt your messages such that the cost to you of encryption is less than the expected cost of that information being disclosed at the time it can be deciphered.
So if the costs are so cheap, why aren't we all encrypting as a matter of course? Well, for several decades, governments have been trying to control access to cryptography on the grounds that they need to be able to read your messages in the name of law enforcement. In the 90s the US government tried to impose the Clipper chip [ clipper ] to give them a backdoor to voice communications. There was also the classification of encryption as a munition and so liable to strict export controls, which led to the absurdity of the T-shirt that would land you in prison for wearing it as you left the US, as it had a four line implementation of the RSA algorithm printed on it [ Back ]. And now, we recently had the full Regulation of Investigatory Powers Act come into force, which makes it a serious offence not to disclose your encryption keys. And which dangerous criminal or terrorist was the first to be jailed for this? A rocket hobbiest who distrusted the police [ RIPA ]. Best not forget those passwords.
References
[Back] http://www.cypherspace.org/adam/shirt/
[Clipper] http://en.wikipedia.org/wiki/Clipper_chip
[RIPA] http://www.theregister.co.uk/2009/11/24/ripa_jfl/
[Turing] http://www.thocp.net/biographies/papers/turing_oncomputablenumbers_1936.pdf