One of the key ways in which psychologists study human behaviour is through experiments. Experiments have the advantage that the researcher can control extraneous variables to make sure that they do not influence the behaviour of the animal or person in the study.  In this way, a stronger cause and effect relationship can be established since it is assumed that it is the manipulation of the independent variable that leads to the change in the dependent variable.

But is there a cost to this level of control in a laboratory? One of the ways that we can evaluate a study is by examining its level of ecological validity.

Orne (1970) defined ecological validity as "the extent to which the setting in which research takes place is capable of producing results that are valid." Since 1974, the term ecological validity has come to be used by some authors to refer to the degree to which results obtained in the psychological laboratory "predict" circumstances outside the laboratory.

This gets us to the question of how can we determine a study's level of ecological validity? A study has high ecological validity when there is:

  • High internal validity - that is, we can establish that the manipulation of the IV does, in fact, lead to changes in the DV and that any changes observed are not the result of extraneous (confounding) variables.
  • High external validity - that is, when the results obtained in the study can be applied to situations outside of the laboratory. Notice, this is not the same as "generalizability" to a larger sample; the focus is on similar situations outside of the experimental setting.

When considering the level of ecological validity, researchers must consider the artificiality of the experimental setting.  A study high in artificiality has low mundane realism.  For example, memorization of a random list of words is low in mundane realism, but retelling a story that you have heard is relatively high in mundane realism. Ecological validity is not the same as mundane realism. Sometimes when a study has low mundane realism, it still accurately predicts what will happen outside of the lab.  When the study has high mundane realism, it is more likely to accurately predict what will happen outside of the lab.

Some experiments done under highly controlled conditions have high ecological validity; some naturalistic studies have low ecological validity. We cannot assume that any study has or doesn’t have ecological validity. We must find evidence to support this. That is where critical thinking plays a key role in discussing research.

How does research stack up?

According to Anderson, Lindsay & Bushman (1999), “the psychological laboratory is doing quite well in terms of external validity; it has been discovering truth, not triviality.” (p 8)

Mitchell (2012) wanted to test Anderson’s claims. He looked at 217 comparisons of lab and field experiments from 82 meta-analyses in such areas as industrial, social and developmental psychology. He found a high degree of correspondence between findings observed in the lab and those found in the field, with an overall correlation of r - .73. However, there were major variations in the different areas of psychology.

Correlation of Lab-Field Effects by Research Classification


Correlation (r)

Social psychology


Industrial psychology


Clinical psychology


Consumer (personality)





- .82

Psychometrics (personality and intelligence research)


It is also important to note that 30 of the 217 comparisons the results in the field were the opposite of those in the lab. 21 of those studies were in social psychology.

Mitchell argues that to determine ecological validity we need to look at the effect size – that is, the difference between the mean of the control and treatment groups divided by the standard error. If the difference is small and the sample is small, then it has low ecological validity. If the sample is large, then the standard error will be smaller – therefore, it is more likely to predict what will happen “in the real world.”

Thus, having a large sample size or several replications of the same experiment that increases its reliability is important in determining the ecological validity of a study.

Ecological validity in research

Many students believe that they are demonstrating critical thinking by writing that “because this is a laboratory experiment, it lacks ecological validity.” However, this is not always correct. In fact, many of the classic studies that we do in IB psychology are done in labs and have high ecological validity. 

If you think about it, if the research we study has no ecological validity, why should we offer an IB psychology course to study it?

Let's take, for example, the classic “line test” by Asch (1951). Here we can see just how tricky the question of ecological validity can be – and why a discussion of it requires critical thinking.

The first question we have to ask is: Is the study highly artificial? At first glance, it is easy to say “yes.” It is not normal to be asked to judge the lengths of lines in front of a group of strangers. But the participant knew that this was a psychological experiment. The study is representative of a psychological experiment.

In addition, Asch has been replicated in many situations and has high reliability. Every replication adds to the sample size and thus there is a greater chance of ecological validity. Finally, when looking at field experiments in which individuals sit in a room where smoke is coming in under the door, when no one responds in a group of confederates, the naive participant also does not respond.  It appears that these results can be generalized to other settings.

Another assumption students make is that if it is a field experiment it automatically has high ecological validity.  Once again, we have to be careful.  Let's take, for example, Piliavin's classic study of bystander behaviour on the New York City subway where he wanted to study what would happen if a well-dressed man with a crutch fell down vs. a man that was holding a bottle of alcohol.

The study definitely has high mundane realism. The event could, in fact, happen. However, a problem with field experiments is that they are more difficult to control than laboratory experiments and therefore also difficult to replicate. This limits the overall sample size.  In Piliavin's study, most of the time people helped.  However, the participants were also trapped inside a moving metro (subway) car.  This does not reflect many studies of helping behaviour where people do not help someone who is in need of help. Although the study may be "naturalistic," it does not automatically have a high level of ecological validity.

The concept of ecological validity is tricky. In many cases, we cannot definitively say that a study is "high" or "low" in ecological validity. You have to make a clear argument for your claim. Doing so demonstrates an understanding of the terminology as well as sound critical thinking.

ATL: Critical thinking

Take a look at the following list of classic studies that are often discussed in IB psychology. For each of the studies, decide whether you think that the study has high or low ecological validity. Remember, the responses that I have given are simply examples of how you could make an argument. There is not really a perfect answer to this question - it all depends on your justification. In order to justify your response, consider the following strategies:

  • Discuss the internal validity of the study.
  • Discuss how the testing environment itself may have affected the results of the study.
  • Discuss the mundane realism of the study.
  • Discuss the extent to which the study predicts behaviour in the world outside the laboratory.
  • Discuss the reliability of the study with regard to other settings.

1. Loftus & Palmer's study on reconstructive memory in a car crash.

This study has high reliability in that when it is replicated, we often get the same results. However, a study conducted by Yuille & Cutshall (1986) contradicts Loftus and Palmer's findings. They found that misleading information did not alter the memory of people who had witnessed a real armed robbery. In addition, there is the problem that the study lacks mundane reality - that is, the situation in the lab does not represent the actual behaviour in the real world. When we witness a car accident in real life, there would be an element of surprise and an emotional response, depending on the level of damage or harm. This study would be relatively low in ecological validity.

2. Milgram's study of obedience.

Coolican & Flanagan have argued that the Milgram experiment is relatively high in ecological validity. They argue that although the situation was artificial, the ecological validity has been shown through various replications of the study in a variety of settings (Coolican & Flanagan p 25). In addition, there is real life support from interviews carried out on former German soldiers from the Second World War (Dicks 1972).
3. Maguire et al's taxi cab driver study on plasticity
MRIs or any other scanning technique can introduce confounding variables into psychological research. For example, if you are looking at a person's level of fear when exposed to certain stimuli, this could be confounded by the fact that the participant is feeling rather claustrophobic inside the tunnel of the fMRI. Also, an fMRI is an incredibly artificial environment - and because the participant cannot move, the tasks designed to see how the brain responds are rather artificial and overly simplistic.
So, does this apply to Maguire's study? No. Neither of the problems described above is relevant to Maguire's study. The study was simply trying to draw a correlation between length of time as a taxi driver and the volume of the hippocampus. Demand characteristics or the artificiality of the environment do not affect the volume of the hippocampus. The study itself has relatively high ecological validity.

4. Bandura's Bashing Bobo study

This study may have higher ecological validity than you might think. Some research findings show that children who have been exposed to domestic violence are more likely to be abusive as adults than non-exposed children (Wallace, 2002), experience more dating violence as adults (Maker et al., 1998), or express views that justify the use of violence (Lichter and McClosky, 2004). However, other research findings question this connection. Wolf and Foshee (2003), in their study on dating violence among adolescents, found no association between growing up with domestic violence and dating violence among boys.
There is also the question of the mundane reality of the situation - that is, the aggression was against a Bobo doll - which would have been unfamiliar to the children and was not another human being. Because of the ethical considerations of the study, it is also difficult to replicate the study to determine its reliability across settings. You could make the argument that the study has moderate ecological validity because of its application to the real world, in spite of some of the methodological concerns.

5 The St Helena's study on television and aggression

This was a natural experiment because the researcher did not manipulate the IV; the IV was manipulated by the simple introduction of television to the island. There was little chance of demand characteristics as the children were being videotaped for a significant period of time both before and after the introduction of television to the island. There was also a triangulation of data, making the results internally reliable. However, there is very little control over the study. One cannot know exactly how much television each child was exposed to, the levels of aggression on the television shows that were watched and to what extent the family played a role in interpreting what was being watched. The situation has high mundane reality. It is also, however, impossible to replicate, so it is difficult to know if the results would really predict behaviour in other situations. This is a study with relatively high ecological validity, but the case could be made that there are concerns about ecological validity in the study. You may be noticing a trend here. The question is not whether there is ecological validity or not, but the extent to which ecological validity can be supported or challenged.

6. The HM case study

Milner's study is a case study. This means that triangulation of methods and data were applied in the study of one individual. Data was gathered by having him carry out psychometric tests, interviews and brain imaging. Because of the nature of case studies, they are difficult to replicate. However, this study has high ecological validity for several reasons. First, the study has high internal validity. In spite of the fact that there were few controls of the research, the study demonstrates rich and consistent data. It is true that we are not sure to what extent HM's behavior may have been caused by the seizures, the medication or the surgery, but we do see that the key issue is related to the role of the hippocampus. Although the study cannot be replicated, there are several studies of people with hippocampal impairment as a result of many different factors, and these studies demonstrate consistent results.
