We live in strange times. Frances Buontempo asks if everything’s OK.
As 2021 draws to a close, a suitable editorial topic would be reflection on the year or making predictions for the following year, but I am in denial that 2022 is nearly upon us, so we’ll have none of that. Such an approach may be traditional, but why would I let what’s perceived to be normal drive my actions? My Dad always used to remind me that normal people don’t have two legs. Why? No-one has more than two, some have fewer, and so the ‘average’ or normal number isn’t two. If you were normal in every way, you would be very different indeed. Besides, an aim of mediocrity is a strange target. Trying to be better than average seems more aspirational. However, if you are no good at something, in my case some exercises in a gym class, then simply managing to get better than you were last time you tried is great. Improvement matters: comparing the actual outcome if you are far worse than average may tempt you to give up. The measurements you choose can impact the outcome, so in order to gamify any activity, you need to pick the right game.
Now, there are circumstances where knowing an average is helpful. We know the average human’s body temperature, within a range. If yours is outside that range, you might be in serious trouble. Knowing what’s normal can help in some circumstances, therefore we need to know how to spot what’s normal. Frequently this requires data gathering, involving measuring or counting, though sometimes we notice things are amiss before gathering numbers to analyse. If my PC starts making an odd noise, something may be wrong. I suspect we unconsciously register what is ordinary, so that even without numbers or metrics, you can spot when something is off. This spidey sense of something being up comes from noticing anything out of the ordinary. Experience can help you spot reasons for apparently weird behavior, like fractions in code giving what are often termed “floating point ‘errors’” or a recursive function causing a stack overflow. In contrast, if you are trying something completely different, like a new gym exercise or a new baking recipe, you have no prior experience to go on. Even without having previous attempts to remember, a bone going crack or a burning smell usually indicates a problem. You can still hazard a guess about what to expect based on different, but related, experiences. Trying to vocalise your concerns can be difficult, since learning precise language to describe a novel situation takes practice. Having an experienced person on hand to say “Looks good!” or “No, stop!” is useful. The feedback may not tell you how to improve, but can be enough to nudge you somewhere better. Now, supervised machine learning and AI uses feedback functions as similar nudges: we tell the machine if the generated solution is good or not. The algorithm uses this feedback to inform further attempts, so that the AI may appear to learn something. I suspect this is similar to the way many of us learn, but we have the advantage of being able to question instructions and ask for help.
Whether AI can ever think like a human is an unanswered question. Turing invented the so-called Turing test to avoid needing a definition of thinking and AI is a big topic, so let’s change our focus. I talked about measuring and counting to decide what’s normal. Some basic arithmetic, for example a back of the envelope calculation, will indicate ball park figures. If you want to know how long a program might run for, then knowing your clock speed and number of calculations to perform gives a good enough guestimate to determine whether your program has got stuck or not. Of course, you could debug and find out what it’s actually up to, unless it’s running on a machine you can’t access. If your program runs regularly as a batch job, keeping stats on how long it usually takes is useful. Knowing the average time taken is one thing, but the variance is useful too. Running the same program with the same inputs is unlikely to take exactly the same amount of time, or even RAM, or whatever else you want to track. If anyone tells you an average, ask them for the variance or standard deviation too. And also ask how many data points were used. If I measure once, and find my code takes 120 seconds to run, the average is 120s, the variance is 0, and though this is a data point, it is only one data point. Ask how many samples were used. If you learn enough statistics, you can decide if the sample size is large enough using maths, but even without the detailed knowledge, very few data points might not be enough to go on. It has been said “There are lies, damn lies and statistics.” Whether Mark Twain, Benjamin Disreali or someone else first said this, we’ll probably never know; however, ensure you are told the variance and sample size and you might spot untruths.
What kind of things do you measure? Aside from the performance of the programs you create, you might be tracking compile and link times as you code. Perhaps these seem relatively stable, but after a year or so, a tiny but creeping slow-down may have moved from almost imperceptible to totally unacceptable. We all have different levels of patience, but I once worked on a test suite which ran in about five seconds. Five seconds is just under the limit of my attention span, before I get distracted and forget what I was doing. Initially, there were no tests, so they took no time at all. Gradually, we added more tests and they took a little time, but not too much. I was keeping an eye on coverage to check I’d hit a lot of the code, but the time taken mattered to my mind. If a test suite takes a long time, people might not bother running the tests before committing code, and no one wants a broken build. Because I was tracking the time, even though it did gradually creep up, I spotted when a test went from running in microseconds to taking a second or so. Many might mock, but the implementation of a function had been changed, and if this had run in a tight loop in production our batch would have missed service level agreements and much trouble would have ensued. Catching the small change early saved the day.
A dynamic system, such as a growing test suite, might have a ‘moving average’. As you measure, things change. Though I said earlier to check the variance and sample size as well as the average, life is, as ever, more complicated. Ask yourself if the average is changing over time. Is it trending up? Then your build times, test suite or batch job might breach some limits in the long run. Maybe the average cycles, going up then down, following what is known as a seasonal pattern [Kenton20]. Do you know what’s driving the change? Maybe many people do a code commit a Friday, forming a backlog on build servers, so things slow down at the end of the week. A batch processing job may be longer at the weekend or the end of the month, because more reports are generated then. Alternatively, you may have a mystery slow down once in a while. Keeping stats and plotting graphs can give you new viewpoints on problems. An initial gut feeling or back of the envelope calculation as a starting point is ok, but numbers give you so much more. This may help you track down mystery abnormalities and find the root cause.
As averages can change in cycles or follow trend lines, data in general can fall into one of many statistical distributions. The so-called normal distribution is commonly assumed. This is the famous bell curve also known as the Gaussian distribution. Gauss used the word normal to describe the distribution, but in the sense of orthogonal or at right angles. As a blog puts it, “The term comes from a detail in a proof by Gauss … where he showed that two things were perpendicular in a sense.” [Cook08] If you want to know which two things and in what sense, you’ll need to go research. Wikipedia tells me,
by the end of the 19th century some authors had started using the name normal distribution, where the word ‘normal’ was used as an adjective – the term now being seen as a reflection of the fact that this distribution was seen as typical, common – and thus ‘normal’. [Wikipedia]
Collecting data and fitting regression lines to spot patterns and trends has a long history. Recently we have turned this up to the max. It’s very difficult to go anywhere near the internet without leaving ripples or a digital footprint. You will gradually collect cookies, unless you are very careful. Claims that this enables targetted marketing are used. Given a few of the items certain social media sites try to sell me, I’m not certain of the reasoning for the targets, to be honest. Maybe the internet has gotten out of hand. I noticed a recent MIT press book claiming,
this has not always been the case: In the mid-to-late 1990s, when the web was still in its infancy, ‘cyberspace’ was largely celebrated as public, non-tracked space which afforded users freedom of anonymity. How then did the individual tracking of users come to dominate the web as a market practice? [Kant21]
How did this happen? By coders writing code. Why did it happen? Because some people think collecting as much data as possible might help make money. Don’t get me wrong, the internet is a tool that can be used for good or harm, like many things. Being tracked online has become normalized now, though you can maintain some anonymity, sometimes.
This brings us to another use of the word ‘normal’. In order to do data science or statistics, you often need to scale data so it lies in the same approximate range. We call this normalization. If you have two features that differ by orders of magnitude, maybe height and shoe size, the larger quantities can overshadow the smaller ones. Scaling so our numerical values are similar, levels the playing field, as it were, makes it easier to spot trends. Being programmers, we also normalize strings to deal with various diacritic marks and similar. Furthermore, we can do this in several different ways: canonical decompositions, compatibility decomposition and one of these combined with composition afterwards [MDN]. This means we also define canonical equivalence, which is not to be confused with normalization. I thought numbers could be hard work until I discovered strings.
So, what have we learnt so far? Normal doesn’t mean normal and we are still none the wiser as to what normal really means. We do talk about conventions as normal, like running tests before a commit, warm up before exercising etc. Many habits we are encouraged into, like brushing our teeth before bed, do have obvious benefits. Running tests locally as you write code, or even checking it compiles can save you a world of pain in the long run. However, conventions do vary and when this happens accusing someone of not being normal because they do things differently is unacceptable. I tend to use a bookmark as I read down a page if I’m using a real life book or paper print out. Some people mock me for this. It helps me concentrate and stops the letters moving about all over the place. Some may say I have dyslexic tendencies, so first I apologise if my writing is littered with typos and second, using a bookmark helps me, so don’t judge.
We are all different, and that’s a good thing. Doing things differently can lead to innovations. Some programmers are told they are not normal – accused of being geeks or nerds [Buontempo21]. I say hooray for geeks, people who myopically collect details, measure and figure out what's going on. Without us, the world would be very hard to navigate. Don’t strive to be normal, it’s far too mediocre. As I draw to a close, I notice a relevant tweet to end on [Kazum93]:
Normal people on their weekend: Chill, Netflix
Simon: Let’s create a memory allocator in C++
Thank you @kazum93, I couldn’t have put it better myself.
References
[Buontempo] Frances Buontempo (2021) ‘Geek, Nerd or Neither?’ Overload 163, June 2021, available at https://accu.org/journals/overload/overload163
[Cook08] John D. Cook, ‘Four characterizations of the normal distribution’, published 13 March 2008 on https://www.johndcook.com/blog/2008/03/13/four-characterizations-of-the-normal-distribution/
[Kant21] Tanya Kant, ‘A history of the data-tracked user’, published 8 October 2021 at https://thereader.mitpress.mit.edu/a-history-of-the-data-tracked-user
[Kazum93] Kazum93 on Twitter, available at https://twitter.com/kazum93/status/1454387344031817732
[Kenton20] Will Kenton, ‘Seasonality’, updated 30 November 2020, available from https://www.investopedia.com/terms/s/seasonality.asp
[MDN] string.prototype.normalize()
at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
[Wikipedia] ‘Normal distribution’, section 7.2 (‘Naming’), available from https://en.wikipedia.org/wiki/Normal_distribution#Naming
has a BA in Maths + Philosophy, an MSc in Pure Maths and a PhD technically in Chemical Engineering, but mainly programming and learning about AI and data mining. She has been a programmer since the 90s, and learnt to program by reading the manual for her Dad’s BBC model B machine.