Thursday, December 13, 2007

Geeking Out

Though I’m not really a geek or nerd myself, I have spent a great deal of the last 10 years living and working amongst the amusing creatures and I find a few of their habits have creeped into my general behavior. And so it was that I found myself today quite distracted by the various terms software developers use to refer to the things that they put into data structures (like vectors and arrays). Please note that this is a linguistics inquiry, not a programming one. There may be prescriptive uses of these terms, but as a linguist, I’m interested in the descriptive facts of how people actually use them.

Programming tutorials will often refer to these things as members, elements, or items, but they are not consistent with their terms. For example, one Java author uses both “objects” and “elements” here:

The main key difference is that this one doesn't actually remove objects at the end; we just leave them inside. [clip] Printing is accomplished using an Enumerator; which we use to march through every element printing as we move along. (emphasis added)

Here’s the creator of Python, Guido van Rossum, using both “item” and “element”:

insert(i, x)
Insert an
item at a given position. The first argument is the index of the element before which to insert, so
a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x). (emphasis added)

The folks at use “element” for lists, vectors, & Double-ended Queues and “item” for sets, multisets, multimaps and maps here:

insert (Vectors) inserts elements into the container
insert (Double-ended Queues) inserts
elements into the container
insert (Lists) inserts
elements into the container
insert (Sets) insert
items into a container
insert (Multisets) inserts
items into a container
insert (Multimaps) inserts
items into a container
insert (Maps) insert
items into a container
(emphasis added; modified from a table)

But in another place, they switch from elements to members:

Individual elements of a vector can be examined with the [] operator.
Two vectors are equal if:
1. Their size is the same, and
2. Each
member in location i in one vector is equal to the the member in location i in the other vector.

There are two things at play here: 1) lexical preferences and 2) discourse preferences. Though we may have a default preference for a particular term, in certain contexts we may choose another term, (e.g., to avoid repetition). Exactly what the relevant context is, and what function the choice serves, is not clear to me. I suspect that one factor is whether or not the author wants to foreground the content of the container or the structure of the container.

In classic empirical fashion, I performed a lightweight Kitchen Experiment to collect some facts about usage. I Googled the constructions “X in a vector” and “X in an array” where “X” was replaced systematically by a series of possible “item” words. The info below present the results ordered by number of hits (in its infinite wisdom, Blogger kindly removed my formatted tables and replaced them with tabbed lines).

"X in a vector"















"X in an array"















Of course, and as always, I continue my use of the term Kitchen Experiment to avoid being sued by Mark Liberman for trademark infringement.

1 comment:

Jason M. Adams said...

Interesting that "things in a vector" doesn't show up in the google results. I wonder if it's because it sounds too informal paired against the very "mathy" vector, whereas array is also used in non-geeky settings and has less mathy mojo.

Just googled "things in an array of *" and got a bunch of things (129k+) very unrelated to programming, so could just be an alternative usage pattern corrupting the numbers.

Putting the Linguistics into Kaggle Competitions

In the spirit of Dr. Emily Bender’s NAACL blog post Putting the Linguistics in Computational Linguistics , I want to apply some of her thou...