Category: Discoveries

Emacs editing everywhere in Gnome

Quick tip: If you do the majority of your typing in Emacs, it can feel gratingly clumsy to type anywhere else, like in browser text fields and IM windows.

I discovered today that GTK has an option to enable common Emacs editing keys in text fields everywhere on your system. This is amazing–it means no browser extensions or hacks are needed to type over four lines without cursing.

The setting is buried at ‘desktop/gnome/interface/gtk-key-theme’ in the gconf-editor; change its value to “Emacs”, or paste this in a terminal:

gconftool-2 --set /desktop/gnome/interface/gtk_key_theme Emacs --type string

Most common navigation and editing keys work just as they do in terminal emulators: C-{n,p,f,b}, M-{f,b,d}, C-{a,e}, C-k and C-y, C-{w,h,d}.

The CUA keys C-x, C-c and C-v continue to cut, copy and paste respectively.

Evolutionary divergence

Suppose we start with two sexes that have none of the particular attributes of males and females. Call them by their neutral names A and B. All we need specify is that every mating has to be between an A and a B. Now, any animal, whether an A or a B, faces a trade-off. Time and effort devoted to fighting with rivals cannot be spent on rearing existing offspring, and vice versa. Any animal can be expected to balance its effort between these two rival claims. The point I am about to come to is that the As may settle at a different balance from the Bs and that, once they do, there is likely to be an escalating disparity between them.

To see this, suppose that the two sexes, the As and the Bs, differ from one another, right from the start, in whether they can most influence their success by investing in children or by investing in fighting (I’ll use fighting to stand for all kinds of direct competition within one sex). Initially the difference between the sexes can be very slight, since my point will be that there is an inherent tendency for it to grow. Say the As start out with fighting making a greater contribution to their reproductive success than parental behaviour does; the Bs, on the other hand, start out with parental behaviour contributing slightly more than fighting to variation in their reproductive success. This means, for example, that although an A of course benefits from parental care, the difference between a successful carer and an unsuccessful carer among the As is smaller than the difference between a successful fighter and an unsuccessful fighter among the As. Among the Bs, just the reverse is true. So, for a given amount of effort, an A can do itself good by fighting, whereas a B is more likely to do itself good by shifting its effort away from fighting and towards parental care.

In subsequent generations, therefore, the As will fight a bit more than their parents, the Bs will fight a bit less and care a bit more than their parents. Now, the difference between the best A and the worst A with respect to fighting will be even greater, the difference between the best A and the worst A will be even less. Therefore an A has even more to gain by putting its effort into fighting, even less to gain by putting its effort into caring. Exactly the opposite will be true of the Bs as the generations go by. The key idea here is that a small initial difference between the sexes can be self-enhancing: selection can start with an initial, slight difference and make it grow larger and larger, until the As become what we now call males, the Bs what we now call females. The initial difference can be small enough to arise at random. After all, the starting conditions of the two sexes are unlikely to be exactly identical.

From The Selfish Gene. A fantastic explanation of the fundamental difference between the sexes, and why you would expect the asymmetry to arise1. This explains the evolution of the male and female gametes; which in turn explains the disparate strategies adopted by male and female members of a species when it comes to mating.

Evolution is amazing.

Footnotes:

1. The only problem with arguments without numbers, such as this one, is I can never be sure if I’m missing a flaw well ensconced in the smooth, convincing wording. I’ve fallen for them too many times!

On the incorrigible character of Saybolt Universal Seconds (and CLI calculators thereof)

A considerable chunk of my time as an undergraduate was spent designing hypothetical steam power cycles (entirely on paper, the defining characteristic of an Indian education) and mixing up data on hydrodynamic bearings and lubrication oil viscosities in conveniently provided combinations to yield still more data.

It was as dry then as it sounds now.

The cherry on top of this gargantuan mound of frustration was this: An hour long calculation, meticulously carried out to the right significant figure, an end to a week’s worth of futile drudgery, a manuscript of unbendable proportions and a palimpsest of scrawls, margins, sub-margins, crosses and cross-references that would terminate in a number that would fail a sanity check. Or worse, it would not.

Somewhere in the mess of Kilojoules per kilogram per second per Kelvin and Saybolt Universal Seconds* would nest a skipped exponential, or would manifest a quirk of the almighty tabulators** who would choose, in their earnest and masterful attempts at obfuscation, a system of units that differs from SI by a sliver- enough to evade a perfunctory gaze while puncturing all attempts at veracity.

It Need Not Be So. Alerted via a mention on XKCD, bolstered by the acumen of dozens of dimensionally aware readers***, I discovered that a tool to deal with headaches of the sort exists, has existed for years now, and is available nearly everywhere.

They call it the Google Calculator.

Alas, access to the tool is buried behind web browsers and HTML forms, which some people will attest to be pox-infested heathen land. Of course, necessity is the mother of most scripting (the scant remainder being General Frippery of the Insufferably Needless Kind), and alternatives have surfaced aplenty. (Here are five of them.)

What does it do? Let BASH do the talking:


$ gcalc “1.2 gallons in ml”
1.2 Imperial gallons = 5,455.31025 millilitres


$ gcalc “1.4198 petabyte in MB”
1.4198 petabyte = 1.52449864 x 10E9 megabytes


$ gcalc “131 USD in INR”
131 U.S. dollars = 6,497.69357 Indian rupees

But you knew that.


$ gcalc “234.089 * sqrt(7) * ln(12)”
234.08900 * sqrt(7) * ln(12) = 1 539.00526

Now we’re warming up.


$ gcalc “0.21 tesla in gauss”
0.21 tesla = 2100 gauss


$ gcalc “28 MPa in psi”
28 megapascals = 4,061.05666 pounds per square inch

Still, not a life saver.


$ gcalc “radius of earth * pi * 2 in feet”
radius of Earth * pi * 2 = 131,478,951 feet


$ gcalc “radius of sun^3 * 4 * pi / 3 in km^3”
((radius of the sun^3) * 4 * pi) / 3 = 1.40922394 x 10E18 km^3

We’re getting there!


$ gcalc “12 km/h * 22 minutes in miles”
(12 km/h) * 22 minutes = 2.73403325 miles


$ gcalc “1 / (4 milliohms * 14 microfarads)”
1 / ((4 milliohms) * 14 microfarads) = 17.8571429 megahertz

Wow.


$ gcalc “3.54 kilojoules / (kg * kelvin) * 0.0021 kg / s * 24 kelvin”
((3.54 kilojoules) / (kg * kelvin)) * (0.0021 (kg / s)) * (24 kelvin) = 178.41600 watts

We’ve arrived. Saybolt Universal Seconds didn’t make it. Serves ’em right. We ought to FITGD, them Saybolt viscometers.

Here’s how you get this running.

My calculation troubles have been deflated, the palimpsests discarded. (Life is good again.)

Google Calculator is amazing. But does it pass the sanity check?


$ gcalc “the answer to life the universe and everything”
the answer to life the universe and everything = 42

Aye, it does.

* An oxymoron of sorts, that.
** These are the Powers That Be, the Catalogue Lords, the spawn of Napier himself- minus his engineering ingenuity at laying waste to bovines from two miles away.
*** This footnote shall not discuss the pun it was referenced by.

A regular expression for primeness

Lazy Saturday afternoons spent on the Internets lead to the niftiest discoveries.

A Regular Expression that tests a number for (un)primeness:

/^1?$|^(11+?)\1+$/

The input string that the above regex is matched with is a sequence of N ones, where N is the number tested for primeness.

The confounding terseness of regular expressions isn’t surprising after an year of daily use, but this one led to record amounts of head-scratching.

Let’s expand that out, shall we?

qr/
^1?$       # 1 or nothing, not prime
|             # OR
^(11+?) # 2 or more 1’s, matched minimally
\1+$       # Followed by one or more instances of captured pattern (pattern in parens above)
/x

When matched against a sequence of N ones (111..), the special cases of  zero or one ones is handled by the first part of the regex; the latter part is the workhorse.

The string is tested against an integer number of repeats of the captured pattern. 111111 is matched with three occurrences of 11- once during the capture and twice during the backreference (\1)+, so 111111 is not prime.

But here’s the trick: Since (11+?) matches minimally, failure to match nine (111111111) as four occurrences of 11 causes the regex engine to backtrack and capture 111; Nine is then matched with three occurrences of 111! So the regular expression engine is trying to match the input with a multiple of a successively increasing sequence of ones, which, of course, is the usual way of testing a prime.

This piece of regex magic was picked off StackOverflow, where there’s a full explanation with examples.

“Why are you producing so few red blood cells today?”

A while ago, it was thought that the trick to making a machine play chess well was to extend how far down the branching network of possible moves it could examine. Irrespective of how far they can look ahead, though, skilled human chess players can confidently confound (or at least match) most chess programs of today.

Why?

I found a fascinating account of this puzzler involving AI and human thinking in (where else?) GEB. Apparently, the reason for this was known from the 1940s; If you’ve ever played chess- or play regularly but with skill befitting a two year old, you’ll come to appreciate the reason immensely.

Chess novices and chess masters perceive a chess situation in completely different terms. The results of the Dutch psychologist Adriaan de Groot’s study (from the 1940’s) imply that chess masters perceive the distribution of pieces in chunks.

There is a higher-level description of the board than the straightforward “white pawn on K5, black rook on Q6” type of description, and the master somehow produces such a mental image of the board. This was proven by the high speed with which a master could reproduce an actual position taken from a game, compared with the novice’s plodding reconstruction of the position, after both of them had five second glances at the board. Highly revealing was the fact that masters’ mistakes involved placing whole groups of pieces in the wrong place, which left the game strategically almost the same, but to a novice’s eyes, not at all the same. The clincher was to do the same experiment but with pieces randomly assigned to the squares on the board, instead of copied from actual games. The masters were found to be simply no better than the novices in reconstructing such random boards.

The conclusion is that in normal chess play, certain types of situation recur- certain patterns- and it is on these high-level patterns that the master is sensitive. He thinks on a different level from the novice; his set of concepts is different. Nearly everyone is surprised to find out that in actual play, a master rarely looks ahead any further than a novice does- and moreover, a master usually examines only a handful of possible moves! The trick is that his mode of perceiving the board is like a filter: he literally does not see bad moves when he looks at a chess situation- no more than chess amateurs see illegal moves when they look at a chess situation. Anyone who has played even a little chess has organized his perception so that diagonal rook-moves, forward capture by pawns, and so forth, are never brought to mind. Similarly, master-level players have built up higher levels of organization in the way they see the board; consequently, to them, bad moves are as unlikely as illegal moves are, to most people. This might be called implicit pruning of the giant branching tree of possibilities. By contrast, explicit pruning would involve thinking of a move, and after superficial examination, deciding not to pursue examining it any further.

If you pause to think about this, it comes across as an utterly spellbinding revelation. Like the proverbial frog in the well, mental models and levels of perception above what we are used to are very hard to digest- but they’re there, as the excerpt explains.

The “chunking into levels” is a predominant theme in all complex systems we see*, at least in the way we seek to understand and analyze them- from computer systems (Hardwired-code->machine language->Assembly language->Interpreters and Compilers) to DNA (Specifying each nucleotide atom-by-atom->Describing codons with symbols for nucleotides->Macromolecules->Cells), and even human thinking.

The last bit requires a little exposition, but first, we note the analogy between the nightmare of writing complex useful computer code in machine language and the terror of reading a virus DNA atom by atom. In both cases, we would miss out on the higher level structures that embody computer programs and virus DNA with their attributes- complex systems possess meaning on multiple levels.

The multi-level description extends to virtually every complex phenomenon. Weather systems, for instance, possess “hardware” (the earth’s atmosphere) which has certain properties hardwired into it (hardwired code) in the form of the laws that flitting air molecules obey, and “software”, which is the weather itself. Looking at the motions of individual molecules is akin to reading a huge, complicated program on the machine language level. We chunk higher level patterns into storms and clouds, pressures and winds- large scale coherent trends that emerge from the motion of astronomical number of molecules.

As for multi-level human thinking, it is illuminating to first appreciate that a higher level perception of a system does not necessarily mean an understanding of the lower levels too. One does not need to know machine language to write complex computer programs, nor is one required to be aware of individual molecule trajectories to describe or predict the weather.** In fact, a higher level of a system may itself not be “aware” of the levels it is composed of, such as AI programs that are ignorant of the operating system they are running on. The higher level descriptions are “sealed off” from the levels below them, although there is some “leakage” between the hierarchical levels of science. (This is necessarily a good thing- or people could not obtain an approximate understanding of other people without first figuring out how quarks interact.)

The title of the post is another excerpt from GEB, derived as a somewhat whimsical analogy:

The idea that “you” know all about “yourself” is so familiar from interaction with people that it is natural to extend it to the computer- after all, AI programs are intelligent enough that they can “talk” to you in English!  Asking an AI program (which is compiled code) about the underlying operating system is not unlike asking a person “Why are you producing so few red blood cells today?” People do not know about that level- the “operating system level”- of their bodies.

This post owes its existence in entirety to GEB, the Big Book of Big Ideas.

Continue reading

So Sed Perl

Perl’s supposedly shell scripting on steroids- a swiss army chainsaw of sorts, they say.

All very exciting, yes, but despite having written a couple of reasonably large text-processing scripts with Perl (~50 lines), I could never get it to emulate the brevity and conciseness of Sed, the non-interactive text editor that is known to cause searing pain in the temples (and assault you with garbled line noise.) Sed, incorporating the power of regular expressions, allows for fantastically obscure edits in text files with just a few characters of input. The “conditional disemvowelment” Sed command

$ sed -e '/^[a-z]/ s_[aeiouAEIOU]__g' testfile.txt

can be emulated with Perl, but it’s a near ten line script that’s nearly impossible to type correctly at the command line in one attempt:

$ perl -e ' open TEST, "testfile.txt"; while <TEST> { if (m/^[a-z]/){ s/[aeiouAEIOU]//g; } print; }'

Painful. A few more hacks bring the number of lines down to five (while keeping the script readable), but it still can’t hold a candle to Sed. Which is somewhat disappointing considering the touted claims of Perl-makes-sed/awk/sh-obsolete. And then, after six months of using Perl, I read the manpage.

Apparently, there is more than one way to do this.

  • The -p & -n flags shroud the Perl command at the command input in a loop over the lines of the file passed as argument to it:

$ perl -n -e '#yourscript' testfile.txt

is equivalent to

$ perl -e ' open TEST,"testfile.txt"; while (<TEST>){ #yourscript }'

  • Even better,

$ perl -p -e '#yourscript' testfile.txt

expands to

$ perl -e ' open TEST, "testfile.txt"; while (<TEST>){ #yourscript } continue { print; }'

  • There’s more; the -a flag performs a split on $_ on a whitespace, the -F/regexp/ splits $_ on /regexp/, and both store the result of the split in @F. -F/regexp/ is equivalent to

$ perl -e ' while (<>){ @F = split(/regexp/); #yourscript }'

And so Perl does emulate Sed after all. The conditional disemvowelment reduces to:

$ perl -p -e 's/[aeiouAEIOU]//g if (m/^[a-z]/);' testfile.txt

A few more neat constructions:

  • Mass renaming: Rename files matching in-regexp to out-regexp

$ ls | perl -p -e 's/in-regexp/mv "$&" "out-regexp"/' | sh

  • Emulating grep:

$ perl -n -e 'print if (m/regexp/);' testfile.txt

More info here and here.