Tuesday, 25 September 2012

The Parsimonious Crow?

Recently, a study was published by Alex H. Taylor, Rachael Miller, and Russell D. Gray in Proceedings of the National Academy of Sciences (PNAS) claiming that New Caledonian crows are capable of reasoning about "hidden causal agents".

The study has generated considerable press interest (e.g., on the BBC website, and see here for a list of links to press coverage of the article), several blog posts (for example, here and here), and also a wide-ranging Reddit discussion (here). As far as I can tell, absent from all of this media attention has been an informed critique of the study's procedures and the explanation provided by the authors for their findings. Until now.

Thanks to the encouraging words of Anthony McGregor, Mark Haselgrove, and Graham Davey on Twitter, I wrote this blog to try and offer a critical evaluation of the study and to take issue with the authors' explanation. As you will see, there are several potential methodological limitations that may detract from the study's findings and which suggest a simpler, more parsimonious explanation of the crows' behaviour. Before we get to what that explanation might be, let's first consider the procedures in detail.

The authors sought to develop an ecologically valid task that resembled the following scenario:

 "Imagine a bird looking down on a monkey moving through
a forest canopy. Generally, the bird will be able to observe both
the monkey moving and the canopy shaking at the same time.
Sometimes, however, when the canopy is thick overhead, the bird
may only observe that, against a background of stationery leaves,
there are waves of moving leaves that can start and stop abruptly.
Humans are able to imagine why the leaves are moving when
the monkey is out of sight. They can hypothesize that there is
a hidden causal agent that must be moving the leaves because
when the wind is not blowing, the canopy does not shake on its own

Essentially, the authors investigated what would happen when crows were presented with two conditions similar to those described above: one, where a human agent is observed to enter a setting, (presumably) engage in a behaviour, and then leave the setting; and, two, where the same behaviour occurs but without observing a human exit the setting. Would crows' food-seeking behaviour be influenced by the assumed absence of the human agent in the second condition?


Eight crows (five adults), wild caught in New Caledonia (a series of islands in the south west Pacific), were tested in an outdoor aviary containing a table, a food source (obscured by what looked to be a hollowed-out brick), a small stick tool, and a hide through which a long stick protruded. Here's a diagram from the study:

In the condition on the left, called the human causal agent (HCA) condition, the crows "observed two humans walk into the aviary. One, the agent, walked into the hide and so became hidden from the crows. A wooden stick was then probed in and out of a hole in the hide wall 15 times toward the baited hole. The agent then exited the hide and left the room. At this point the second human, who had stood 1.5 m from the hide in the corner of the room with closed eyes and hands held in front of the body, also left the room. The crow was now free to come down to the table, pick up a tool, and use it to extract the food from the box."

In the condition on the right, called the unknown causal agent (UCA) condition, the crows observed one human enter the cage and stand "with closed eyes and hands held in front of the body as before. The tool was then probed through the hole in the hide 15 times. The human then left, and the crows again were free to come down to extract food."

So, the question was: would crows respond differently under the two conditions? In other words, would they show less caution during the HCA condition, in which they had seen the human enter the hide, poke the stick and leave the hide? Could they reason that they were unlikely to get poked in the back of the head by the stick? (The stick was, in fact, moved by a researcher in both conditions who was out of sight of the birds.) Or would they be less likely to try and extract the food in the UCA condition when that pesky human might be hidden, waiting to pounce? Remember, only 1 human entered and left during the UCA condition, whereas 2 entered and left in the HCA condition.

Now, before the authors could undertake these tests they needed to ensure that the crows could extract food using the tool and do so in the presence of the hide. "Habituation" sessions were conducted in which the crows were taught to extract food at progressively decreasing distances from the hide, starting when the hide was 100 cm away and ending when it was 20 cm distance.

By now, the crows could reliably extract food within 20 cm of the hide so the researchers were ready to begin testing the HCA and UCA conditions. Recall that the two conditions were intended to test whether crows could infer what had moved an inanimate object and whether it was likely to occur again. This was measured by the (mean) number of hide inspections & the number of abandoned searches (when crows stopped trying to use the tool to extract food). 


The authors contrasted the predictions of two hypotheses. First, the "habituation hypothesis" predicts that, because the movement of the stick is a novel, sudden stimulus event, the crows should seek to minimise chances of being struck by it while they tried to extract the food. If this hypothesis is correct, then the "level of caution would then be progressively reduced each time the crows used a tool in the box and the stick did not appear." Their cautious behaviour would, therefore, habituate. So, we should see a decrease in the number of inspections during the UCA condition relative to HCA. The second hypothesis, the "causal reasoning hypothesis", predicts that crows should "predict the stick’s movement by reasoning about why the stick was moving." In other words, if the crows are capable of attributing causal agency to a hidden human they should demonstrate a high level of caution (i.e., number of inspections) during the UCA condition. The two hypotheses therefore make opposite predictions.


The number of inspections was significantly higher during the UCA condition than the HCA condition (see the diagram above). While there was an upwards trend evident between the second and third HCA trials, the first UCA trial resulted in the highest mean number of inspections which, by the third and final trial, resembled that of the first condition. 

The crows also abandoned more searches in the UCA condition (and none at all during HCA), but this trend decreased from the first trial and stabilised during the second and third trials (see diagram below). They gave up more often because they remained highly cautious: any moment now, that pesky stick could start moving again and get in the way of their food.

Recall that the "causal reasoning hypothesis" makes opposite predictions to a simple account based on habituation - but the data showed that across the 3 UCA trials, the number of inspections decreased, not increased. Thus, the authors predicted "high level of caution" does, it seems, revolve around data from the first UCA trial only.

At this stage, it may help to view a short movie showing the first author introducing the study and with some examples of trials taken from the experiment (another video is available here):

Analysis & Alternatives

Right, so is this definitive evidence for "causal reasoning" then? Consider the following methodological issues and theoretical challenges.

 1. Absence of counterbalancing.

This is perhaps the most startling feature of the procedure.....the conditions were not counterbalanced. All eight crows received the HCA trials followed by the UCA trials.

It is a standard requirement in research design to balance the order of administration of two or more independent variables. This is done to prevent the effects of one condition carrying over and influencing subsequent conditions. Counterbalancing ensures that potential carryover effects are balanced across the different orders (i.e., for instance, half of the crows could have been administered HCA followed by UCA, and the other half administered UCA followed by HCA).

For some unexplained reason, the authors chose not to counterbalance the conditions. Both the HCA and UCA conditions are relatively distinct and easily operationalised, so it's unclear why counterbalancing was not employed.

 2. "Blind" observers?

Let's consider another troublesome aspect of the procedure. The Methods section reveals that the data were scored by two observers who had an inter-reliability score of 91%.

Presumably, the two observers were not blind as to which condition was in effect (i.e., if it's the first half of the study, then it must be HCA; if not, it must be UCA.) but that does not mean that blind observation was impossible. For instance, the observers could have been presented with video-recorded trials starting with when the crows left their perch and landed on the table. That would avoid having to show the human(s) entering and exiting the aviary and biasing the resulting observations.

Regardless of whether the observers were blind or not, the 91% inter-reliability score could refer to one or both of the dependent measures employed. Without this information and details of what inter-observer agreement procedure and calculation method was used, it is difficult to evaluate the reliability score. Moreover, we must assume from the list of author contributions and the absence of acknowledged assistance with data collection that the "two observers" were also those who carried out the study.

3. "Habituation" training.

Although no data are presented on the number of trials needed to complete this training, these sessions amounted to what is commonly called "shaping by differential reinforcement". In other words, successive approximations to the goal or target behaviour - extracting food in the presence of the hide - were reinforced (with food), while those behaviours that did not approximate the goal were not reinforced.

Shaping is required to get the behaviour of interest to occur before you can introduce variables of interest (such as response requirements, etc.). Rats are not born with the ability to lever-press, you know! Shaping may also be used to increase desirable behaviour, such as a study conducted by Charlotte Slater and I on trailer loading in horses.

Presumably, the crows did not need to be taught how to use the tool to extract the food, but without any data we are left wondering about the nature and extent of their tool-using behaviour prior to the critical test trials.

4. Stimulus control analysis of each condition. 

From the perspective of the crows, the HCA condition consisted of the following sequence of events: two humans enter the aviary, one disappears from sight, the stick moves, one human reappears and exits, and, finally, the second human leaves the aviary. From the perspective of the crows, the UCA condition consisted of the following sequence of changes: one human enters the aviary, the stick moves, and the human leaves.

Both conditions were, therefore, readily discriminable from one another (which, of course, was the whole idea). Both were followed by food; presumably, the crows successfully used the tool to extract food, although no data are provided on the consumption per trial. Strictly speaking, then, the two conditions, and the order in which they were administered, amounted to a discrimination learning task that was always followed by food.

In terms of stimulus control, there are two main types of discrimination learning procedures: simple and conditional. Simple discrimination involves a history of differential reinforcement for selecting one stimulus over another (e.g., given X and Y, selecting X is reinforced with food, whereas selecting Y is not) and involve three terms of analysis: discriminative stimulus (i.e., X) - response (i.e., pointing or pecking) - consequence (i.e., food).

Conditional discrimination involves a history of differential reinforcement in which selecting one stimulus is conditional upon the presence or absence of another stimulus. For instance, given X, selecting Y might be followed by food (S+), but given Z, selecting Y would be not be followed by food (S-). Conditional discriminations add an extra term to the analysis: conditional stimulus (i.e., X) - discriminative stimulus (i.e., Y) - response - consequence.

Do the HCA and UCA conditions resemble simple or conditional discriminations? The response of tool-use was the same and was always followed by food, the stick moved in both conditions and was always followed by one of the humans leaving the aviary. No humans were present prior to the crows leaving their perch and landing on the table, but the sequence of events that lead up to this differed in each condition, as I have already outlined. The conditions could have resembled a form of conditional discrimination (i.e., two humans enter, stick moves, both leave [HCA] and one human enters, stick moves, and human leaves), but the response (which, remember was very well established in the crows' repertoires by this point) was the same and was always consequated by food.

Moreover, because HCA was followed by UCA, generalization may have occurred in which the effects of the (directly reinforced) first three trials could have influenced the three UCA trials. Crows may have discriminated the partial similarity of the sequence of events and generalized their tool-using behaviour. The fact that caution decreased over trials supports this interpretation.

I offer this preliminary analysis, not as an exhaustive explanation of what was going on in the study, but to suggest that a parsimonious account is possible and that such an account need only refer to a few behavioural principles (in this case, discrimination and reinforcement).

This analysis does, however, lead to testable predictions. For instance, if the crows were engaging in conditional discrimination, then it should be possible to differentially reinforce greater numbers of inspections in one condition over another by following HCA (or UCA) with food and withholding food during the other. We would expect the number of inspections to decrease on trials followed by food but not in its absence. Indeed, you would probably see an increase in novelty-seeking behaviour (extinction induced) in no-food conditions.

5. Parsimony

The etymology of "parsimony" means "to spare", or, to be "stingy". This is a great way of remembering that, in any scientific analysis, it is incumbent on the scientist to pitch an explanation of the phenomena under study with as few terms as possible. Complex terms, such as "causal reasoning" require further definition, and thus lower level, simple, parsimonious explanations that adhere to Occam's razor should be sought at all times.

Essentially, it is non-parsimonious to propose that the difference between observed data obtained from a total of six trials from two conditions administered in a fixed order represents evidence "that crows can reason about a hidden causal agent." As I have suggested above, a more parsimonious account of the data is possible.

6. Experimental control: Why more trials are better than some.

In non-human research, it is not always possible to collect data in the natural environment because of the myriad of potential disrupting influences that exist there. Taylor and colleagues sought to minimise many of these influences by designing an outdoor aviary and a hide. Another way of conducting research like this might involve adapting an operant conditioning chamber, which are widely used in behavioural, neuroscience and pharmacology research with pigeons, rats, and even crows. Using an operant chamber permits precise experimental control over the animal's history with the stimulus and response dimensions of interest and the inclusion of automated recording means that the data are unlikely to influenced from potential experimenter cuing effects or observer bias.

Operant chambers also allow a greater number of training and test trials to be presented. Why should that matter? Well, I think this represents the main theoretical difference between Taylor et al's approach and mine. The authors were not necessarily interested in the experience that might have lead crows to show evidence of "causal reasoning" (as the absence of details concerning the nature and extent of "habituation" training testifies). Instead, they were concerned with what might happen when the crows were suddenly presented with the HCA and UCA test trials.

This approach, in which the role played by learning histories are either completely ignored or just downplayed, is indicative of the mechanism world view where, literally, science is a process of discovery. "Testing" is intended to reveal already-existing responses and capabilities of the participant that just required the actions of the scientist to uncover ("the truth is 'out there'"). Mechanistic viewpoints may be contrasted with contextualist or pragmatic perspectives that emphasise findings merely reflect the current and historical context, including the scientist's history and experience, and are always tentative and open to disconfirmation.

Behavioural psychology or behaviour analysis shares many features with contextualism and also emphasises the importance of steady-state or stable responding. Ensuring data paths are stable, and the effects not just transient, usually requires presenting multiple training and testing trials and employing a predetermined stability criterion. So, more trials could have been conducted using an adapted operant chamber, the training and testing details fully described, and greater emphasis placed on the steady state demonstration of complex behaviours deemed to illustrate 'causal reasoning'.

Some ingenious examples of adapting operant chambers and incorporating automated presentation of trials may be found here: for instance, follow the link to this video of baboons performing a matching to sample task in a chamber housed within their living quarters, while Wright and Delius (1994) had pigeons "scratch and match" different coloured gravel as a means of training conditional discriminations (the pigeons learned by 11 trials and met criterion within 27 trials. It often takes thousands of trials in a standard operant chamber). In 1996, I also attempted something similar with rats.

7. Further questions.

Hide inspections were only counted as such if the crows had first looked at the hole containing the food. This is a complex observation protocol requiring extensive coding and observer training, but what were the rates of other behaviours engaged in by the crows? How many times, for instance, did they inspect the hide without looking at the hole?

Abandoned probes were "defined as a crow inserting the tool into the hole and then leaving the testing area without extracting the food." Were trials in which a crow returned to the testing areas included in the analysis? Or was the next trial initiated if/when the crows returned to the testing area? What was the intertrial interval and how was recorded?

Final comments

Notwithstanding the absence of counterbalancing and the non-parsimonious basis of the explanation, the study's findings are based on a total of three trials per condition. For me, this amounts to a rather weak basis on which to attach claims about the evolution of comparative cognition and scientific and religious thought...

Hopefully, someone who reads this blog will undertake a systematic replication of the study.

UPDATE (13/12/2012): 
A commentary, written with Mark Haselgrove and Anthony McGregor, which summarised a few key points made in this blog has just been accepted for publication in Proceedings of the National Academy of Sciences of the USA

More soon!



  1. Nice analysis. Can't help feeling that this procedure basically represents an overshadowing paradigm in which the HCA condition cue is a compound moving stick followed by exiting human. Being the latter element, one would assume that the exiting human would become the predominant discriminative stimulus. So in the UCA condition when the discriminative stimulus is absent, we could assume there would be a significant lack of stimulus control over the operant - hence more abandoned probes. Perhaps a good control condition for this would be a first condition in which stick movement is paired with another element that could not obviously be a 'stick moving' agent, and then in the second part only the stick movement occurs again. If crows have the ability to discern between 'causal agents' and non-causal agents then they should show significantly fewer abandoned probes on the first trial in the second stage than crows that had a compound stick movement + human in the first stage.

  2. Great comment, Graham. It shows that a simpler level of explanation is possible when describing the findings. I think an account based on overshadowing and generalization has a lot of merit, tested against the novel control condition you describe. So, who's up for a field to New Caledonia then?!

  3. Clever Hands?

    Further to the counterbalancing issue, it is worth noting that it seems unlikely that the experimenter tasked with making the stick move was blind to the experimental condition. The authors state:

    "In both the HCA and UCA conditions, the stick moved in the same way because it was actually moved by an experimenter pulling on a string that could be pulled either from within the hide or from outside the testing room".

    Thus it would be pretty obvious to the experimenter which condition was being run. Of course, the experimenter may be convinced that he or she is pulling the string in exactly the same way in the two conditions, but let us not forget that Clever Hans's trainer was convinced that his horse had arithmetic abilities until it was pointed out to him that was not the case (at which point the trainer became depressed and soon died. Pfungst, 1965).

  4. Thanks for your comment, Mark. You may an interesting point.

    It remains possible that subtle, inadvertent experimenter cuing effects could have resulted in the two conditions, which, you will recall, were administered in a fixed order, being readily discriminated from one another by the crows.

    I'm not familiar with what the literature on the discrimination abilities of New Caledonia crows says, but I suspect they are rather good at identifying subtle environmental changes that signal the availability of food.

    A double blind procedure, in which both the stick-mover and observer were blind as to what condition was in effect, and with conditions counterbalanced, remains the only parsimonious way of concluding that crows can reason about hidden causal agents.

  5. I had a tutorial on this paper last week. The other issue that we came up with related to the dependent variables that they used: number of inspections of the hide, and number of crows that "stopped probing and left the table at least once". Why these measures rather than others? Why not number of crows successfully getting the food, or latency to retrieve the food? Or any one of a hundred other measures that you could derive from video of the crows' behaviour. A cynic might wonder if the reason these measures were used was because they "worked". You'd certainly want to see it replicated using the same measures (and all the other fixes mentioned above).

  6. Thanks, Mike. I, too, wondered about the dependant measures chosen, which seemed rather arbitrary. For instance, were two trials scored if a crow stopped probing for food, left the table, and them immediately returned? Or did that constitute just one trial?

    Success in accessing food would be a great measure, as would examining effects of non-reinforcement (i.e., would "caution" extinguish more quickly in the HCA conditon compared to the UCA condition?). In short, this study needs to be systematically replicated.

  7. Great analysis!

    Watching the video, I must say that I was most annoyed by the claim that "sticks don't move on their own". Apparently this researchers needs to spend more time in the woods. There is a tree full of branches moving dramatically in the wind right outside my window. Why don't cognitive types get more exposure to the actual ecology of their organisms?

  8. Thanks for the comment, Eric!

    I, too, can see trees from my window with branches that appear to be moving of their own accord ...

  9. Whilst reading this again (for a tutorial) I was struck by the question of why on earth would a wild crow that lives on a remote pacific island enter this experiment with the knowledge that humans can cause sticks to move?

  10. And of course, the whole effect could be just due to forgetting. As you point out very clearly in your description of the experiment from the point of view of the crow; in the HC condition the crow sees 2 humans before AND after the stick moves, in the UC condition, the crow sees only one human before and after the stick moves. The contribution of both proactive and retroactive interference is therefore greater in the HC condition than the UC condition. Thus the memory of the scary stick would be predicted to be less in the HC condition.