Psychodiagnostic Testing



The main purpose of psychodiagnostic testing is to quantify differences and similarities among individuals, or between the performances of the same individual at different times. As might be expected, most psychodiagnostic tests are conducted by schools, industry, and the armed forces. However, psychodiagnostic testing was first developed by psychologists, and it remains to this day a tool for all mental health professionals.


When studying psychodiagnostic testing, there are four topics we must consider. First, we must understand the underlying theories behind the design of the test. Second, we must know what behavior we are testing. Third, we must be knowledgeable about the tests that are available, and with sources of information about these tests. And fourth, we must be aware of the social and ethical issues raised by psychodiagnostic testing.


It was in the nineteenth century that interest in psychological testing first arose, due to the increasing concern with humane treatment for the mentally retarded and the mentally ill. With this new thinking came the need to develop uniform methods for evaluating and classifying different types of disorders. The two primary categories to be classified at that time were mental retardation and insanity. The difference between these two groups was thought to be that the mentally retarded were brought into life as such, physiologically impaired at birth or infancy, while the insane were thought to have deteriorated from a normal condition.


The first psychodiagnostic tests dealt with simple measurements of various types of behavior. The first recorded work in this area was done by French physician Esquirol (1838). He studied the varying levels of retardation and focused on their classification. Esquirol used language as his tool in the standardization of categories. According to Esquirol, language usage was consistently accurate in indicating intellectual competence. Today, verbal capacity is still integral in tests of intellectual evaluation.

Seguin, another French doctor, doubted the prevailing notion of his time (1866) that the mentally retarded were incurable. From there, Seguin’s sense-training and muscle-training techniques arose using exercise to develop sensory discrimination and motor control. Some of these techniques are currently used in non-verbal intelligence tests.

In 1904, French psychologist Alfred Binet began his work with school children, building on the intelligence tests developed in the United States by Chaille in 1887. Binet urged the Ministry of Public Instruction to examine children who did not do well in normal schooling before expelling them. Binet’s goal was to establish a ministerial commission to ensure that all possible considerations were taken to improve the condition of the intellectually handicapped.


Many thousands of tests have been developed, but there are three basic methods of gathering information from the subject. The objective method involves watching the subject and recording the information. The self-report method involves asking the subject about himself. The projective method involves interpreting the meanings of the subject’s response to a relatively unstructured task.

Types of Tests

Ability test

Refers to the individual’s cumulative learning (as opposed to the achievement test, which tests relatively recent learning). Two examples of ability tests are the Henmon-Nelson Tests of Mental Ability and the Kuhlmann-Anderson Measure of Academic Potential.

Achievement test

Refers to tests measuring an individual’s accomplishments in one or more areas. There are general achievement tests such as the California Achievement Test, and more specialized tests, such as the Content Evaluation Series.

Aptitude test

Refers to tests measuring relatively homogeneous and clearly defined segments of ability. Special aptitude tests typically measure a single aptitude. Multiple aptitude batteries measure a number of aptitudes but provide a profile of scores, one for each aptitude. These are often used in vocational and academic counseling. One example of an aptitude test is the Strong-Campbell Interest Inventory.

Intelligence test

Refers to more heterogeneous tests yielding a single global score such as an IQ. The Stanford-Binet Intelligence Scale test is undoubtedly the most widely known of all the psychological tests.

Neuropsychological test

Refers to tests that either detect cognitive dysfunction or assess neuropsychological impairment. These include the Benton Visual Retention Test and the Bender Visual Motor Gestalt Test (when used as an objective test). The Bender-Gestalt test can also be used as a projective test as a nonverbal measure of personality.

Personality test

Refers to measures of such characteristics as emotional adjustment, interpersonal relations, motivation, interests, and attitudes, as distinguished from abilities. One example of a self-report personality test is the California Psychological Inventory. An example of a projective personality test is the word association test.

Specific Tests

Although MFCCs are able to administer all tests, I will be focusing on intelligence and personality tests for the purposes of this paper.

Intelligence Tests

Stanford-Binet Intelligence Scale

In 1916 Terman, working at Stanford University, translated and revised the 1911 version of the Binet-Simon scale to make it more appropriate for American children. Tests are divided by age group, and scoring is based on the pass-fail system. The individual’s “basal age” is determined by finding the testing level at which all items are answered correctly. The “ceiling age” is determined by finding the testing level at which all items are failed. Calculations are then performed to determine the “mental age” of the individual.

The maximum mental age is 22 years 10 months, although in actual use the Stanford-Binet is not accurate for children over the age of 13 because at that point the mental age begins to lag behind the chronological age. Also, because of the low ceiling age of the test, the Stanford-Binet is not accurate for normal or superior adults, or even very superior children.

Over the years, the Stanford-Binet Scale has continued to be revised. In 1960, for instance, the old method of deviation IQs was replaced with the ratio IQ, to give each age group the same standard deviation. The most current version of the Stanford-Binet is the 1974 restandardization. With these revisions, IQ scores derived from the test have changed somewhat. For researchers who are interested in archive research of IQ test scores, tables have been developed to compare IQ scores obtained between 1937 and 1960 with those from 1960 and later. Although its primary use is as an intelligence test, some clinicians use it as part of the clinical interview. As with the Binet-Simon scale, the Stanford-Binet requires a highly trained examiner.

Wechsler Adult Intelligence Scale

Descended from the Wechsler-Bellevue test of 1939, the WAIS first appeared in 1955. The latest version, the WAIS-R (revised) appeared in 1981. The impetus behind the development of the WAIS was the need of an accurate test instrument for adults. Further differences between the WAIS and the Stanford-Binet are (1) the WAIS uses a point scale instead of the mental age measurement, and (2) items on the WAIS are arranged in subtests by item type. There are six verbal and five performance subtests.

Compared to the Stanford-Binet, the WAIS has less floor and ceiling, which means that measurement of extremes in intelligence is inaccurate. Further comparison to the Stanford-Binet shows that bright, normal, or younger subjects score higher on the Stanford-Binet; older, less intelligent, or retarded subjects score higher on the Wechsler. Although norms have been established for populations over 60 years old, sampling was done by cross-section rather than longitudinally (see below).

Wechsler Intelligence Scale for Children

The WISC was derived from the WAIS in 1949. The current version, the WISC-R, was released in 1974. While the WISC is more suited to children, it remains principally an extension of the WAIS. And, as the WAIS was intended to measure adult populations more accurately than upward extensions of the Stanford-Binet, some have questioned the validity of a downward extension of the WAIS for children.

There are five verbal and five performance scales, with one supplement on the verbal side and one alternate on the performance side bringing the total number of scales to 12. In spite of differences in test construction, the results from the WISC-R are very similar to those gained via the Stanford-Binet.

Wechsler Preschool and Preliminary Scale of Intelligence

The WPPSI (1967) was the last of the three tests introduced by Wechsler, designed to cover ages 4 through 6½. Eight of the subtests are downward extensions of the WISC, to which three additional subtests were added. Only ten of the eleven subtests are used in measuring intelligence. The last is used as a supplementary test on the verbal side. Each scale (verbal and performance) has five subtests. As with the WAIS and WISC, the WPPSI shows good correlation on retests and above-average validity. The scores also correspond well with those attained using the Stanford-Binet.

Goodenough-Harris Drawing Test

This test is the 1963 version of the 1926 Goodenough Draw-A-Man test. Although originally designed to test special populations (including those of other cultures), Goodenough and Harris have gone on record as saying that there is probably no such thing as a culture-fair test. This is supported by studies that show performance correlations with socioeconomic status. However, retest results and scorer reliability are both high, and the effect of the examiner on the test is low, which gives the Drawing Test high marks for repeatability.

Self-Report Personality Tests

Minnesota Multiphasic Personality Inventory. In 1939 the MMPI was developed in response to the need for a comprehensive test for the assessment of psychiatric patients. The MMPI is a pencil and paper questionnaire that demands considerable psychological sophistication from the examiner. The test consists of 504 personality descriptions that subjects are to respond to with “true,” “false,” or “cannot say.” The results are divided into ten personality scales and three validating scores. The expected outcome of the MMPI was that subjects would score highly on the scale appropriate to their dysfunction (hypochondriosis, depression, hysteria, paranoia, schizophrenia, mania, etc.). However, this result was not found. Mixed groups of subjects tended to score highly on several scales, and as it turned out, several of the scales intercorrelate, making differential analysis impossible. One result of this is that the original scale names have been replaced by numerals. Much research continues on possible uses of the MMPI, exploring everything from vocational problems and juvenile delinquency to brain damage.

California Personality Inventory

The CPI is a direct outgrowth of the MMPI, from which nearly half of the 480 items were taken. It differs from the MMPI in that the subjects are instructed to answer either “true” or “false;” “cannot say” is not a valid response. The CPI reveals 15 personality traits, and has three validity scales. The construct validity of the CPI appears to be very good, as test results can be used to predict certain types of behavior, a goal that has eluded the MMPI.

Edwards Personal Preference Schedule

The EPPS was one of the first attempts to incorporate Murray’s manifest need system into a psychodiagnostic test. The EPPS employs the forced-choice system, which means that the subject must choose one of two items presented to him. How the forced-choice system differs from other either-or systems is that the two items are not related. There are 210 pairs of statements (one positive and one negative), each half of which is paired with the other half of another question (for example, the positive half of statement 5 would be presented with the negative half of statement 7). Because the subject is in essence presented with each statement twice, it is easy to calculate the consistency of the subject’s answers. Another major difference with the EPPS is that it produces ipsative scores. This means that the score relates only to itself, making it exceedingly difficult to compare the EPPS scores of various subjects.

Personality Research Form

The PRF features technical advances that were made possible by the power of the computer to perform complex statistical computations. There are two versions of this personality test, one short and one long. Both versions have matching parallel forms. The shorter version can discriminate 14 personality traits, while the longer version has 20. Both forms have an Infrequency Scale to test result validity, and the longer form adds a Desirability Scale to detect bias.

Projective Personality Tests

Projective tests set forth an relatively unstructured task for the subject. It is thought that the subject’s responses to these tasks (underlying hypotheses, perception, and interpretation) will reveal his personality. According to Anastasi, “(p)rojective techniques present a curious discrepancy between research and practice. When evaluated as psychometric instruments, the large majority make a poor showing.” (1976) Yet Rapaport, Gill, and Schaefer (1968) and others remain enthusiastic about projective techniques. I tend to agree with Anastasi that “(p)rojective tests are not really tests at all, but rather tools for the clinician” (1981).

The Rorschach

This is no doubt the most famous of the visual projective personality tests. In it, the subject is shown ten cards (on which there is a design that is symmetrical around the vertical axis) and asked to give as many responses as he can. Five of the cards are gray, black, and white, two add spots of bright red, and the remaining three are pastel shades. These ten cards were chosen from among a large number of cards by Rorschach, based on his experience using the cards with different psychiatric groups. Elements that are looked for in the response are location (what part of the inkblot is responded to), determinants (form, color, shading, and “movement”), content (human and animal figures, plants, maps, clouds, etc.), and popularity (frequency of similar responses from other people). Although the Rorschach test enjoyed great popularity when first released in 1921, subsequent research has pointed up many problems with validity and repeatability, work by Exner (1974) notwithstanding. At present, the ultimate fate of the Rorschach test is undecided, although it continues to be useful in a decidedly contrary fashion: it excels in revealing aspects of personality and motivation that do not fit into neat categories.

The Holtzman Inkblot Technique

This test is the result of an attempt to eliminate all the problems found in the Rorschach test. Such extensive changes had to be made, however, that the test was named after one of its designers. The cards themselves show both monochrome and color designs, with most of the designs exhibiting symmetricality. There are two sets of cards, each set with 45 designs. These two sets are parallel in nature, allowing the examiner to retest the subject using different cards. Correlation between scores for the two sets of cards is high. The designs on the cards were chosen for their ability to discriminate between normal and abnormal subjects, and include 22 response variables. The subject is allowed one response to each card, although the length of the response is unregulated, as in the Rorschach test.

Thematic Apperception Test

In the TAT, the subject interprets the meaning of drawings that contain the faces of two people. This is in contrast to the Rorschach and Holtzman tests, which use non-specific shapes. The subject is asked to tell the examiner what is going on in the drawing, what the characters are thinking, and what the outcome of the situation will be. There are different sets of drawings for different populaces, but each set contains 20 cards (nineteen drawings and one blank card). As the cards are meant to be presented in two sessions of ten cards each, the more unusual of the two sets is presented in the second session. For the blank card, the subject is to make up a story.

In spite of the fact that there exists a fair amount of research into typical (and atypical) responses to the cards, the TAT is far from a standard test instrument. Many examiners do not show all 20 cards, or show cards not included in the standard set. As a result, the TAT must be presented by a highly skilled examiner. On top of this, results from the TAT often are not correlated by results from other types of tests.

Bender Visual Motor Gestalt Test

This is a drawing test that involves duplicating each of nine pictures while viewing the picture itself. The pictures used in the current version of the Bender-Gestalt test are a subset of a much larger group of pictures developed by Wertheimer (who was instrumental in the development of the Gestalt movement). These designs were chosen by Bender to illustrate certain Gestalt principles. However, the designs are not well drawn, and the test procedure does not follow Gestalt principles. However, because of the work of Pascal, Suttell, and Koppitz, it is now possible to evaluate Bender-Gestalt responses on a more objective basis, as opposed to the purely intuitive basis used by the originators of the test.

Word Association

These tests are among the oldest of the projective techniques. In the most common word association tests, the subject is presented with individual words and asked to respond with the first thought that comes to mind. The stimulus words can either have psychoanalytical meaning, or they can be chosen because the examiner knows the normal response, and can thus judge abnormal responses. Another form of word association is used in the Rotter Incomplete Sentences Blank, in which the subject is presented with the first few words of a sentence and asked to supply the remainder.

Rosenzweig Picture-Frustration Study

The subject is presented with a series of cartoon drawings in which there is an aggressor and another person. The aggressor is shown telling something to the other person. The subject writes in what he thinks the other’s response to this aggression will be. There are two versions of this test, one for children 4 through 13 years of age and one for adults (14 years and older).

Individual and Group Testing

Early tests were conducted with one examiner and one subject. These early tests often demanded that the examiner be highly skilled in the testing process, as when making observations or evaluations of the subject or the subject’s behavior. With the onset of World War I, it became necessary for the Army to be able to test the intellectual level of large groups of people in a very short period of time. Thus were born the group tests. A by-product of this was that examiners no longer needed to be highly skilled, as multiple choice and objective type tests were designed to be quickly and easily scored. The usefulness of the group test survived its wartime application, and there are now many tests available for the group setting. The fact that tests can be given simultaneously to large groups of people has meant that it is easy to develop a huge body of standard results.

Group testing is a double-edged sword, however. The reduced amount of examiner involvement means that there is less opportunity for the examiner to notice when the subject is tired, worried, scared, or in any other condition that would affect the test results. Finally, everyone in the group is tested on the same material, as opposed to tests such as the Stanford-Binet, in which test items are selected in response to responses to previous items. However, this last disadvantage can be overcome with the use of pyramidal and multilevel testing methods.

Testing change over time

In many instances, psychologists need to measure the population over a period of time, such as from youth to old age. There are two methods of making this measurement: longitudinal and cross-sectional. Longitudinal testing tests the same population at different time periods, in essence, following the same sample as they age. Cross-sectional testing tests different samples, each at the required age bracket. Unfortunately, cross-sectional testing can be very difficult because of the large number of variables that must be considered. In the WAIS standardization sample, which was gained through cross-sectional testing, scores tended to drop at the upper age levels. Later researchers discovered that this was due in part to the fact that past generations had received less education than the current generation. A longitudinal study has since shown that scores in older people are due more to physical well-being than to age.


As Rapaport (1968) points out, “No single test proves to yield a diagnosis in all cases, or to be in all cases correct in the diagnosis it indicates.” This in no way invalidates the concept of psychodiagnostic testing, but it does remind us that for accurate diagnosis, we must not depend on only one source of information … not even the diagnostic evaluation.

Yes, it is true that some of the tests discussed in this paper are flawed. Yes, it is true that some rely on the subjective opinion of the examiner. And yes, it is true that even the experts disagree on the underlying theories, methodology, and ultimate meaning of these tests.

I believe that these are not arguments for the elimination of psychodiagnostic tests, but rather realistic appraisals of areas in which we need to improve. Flawed tests can be fixed, or can serve as examples of what not to do in future tests. Some tests rely on the opinion of the clinician, but the clinician is an indispensable component of the therapeutic process, whether or not tests are used in diagnosis. Finally, debate concerning psychometrics has lead to better and better tests. It would be foolish to cut off debate at the expense of the positive changes it encourages.

At the very least, we are no worse off with the tests than without them. At best, we have at our disposal some powerful tools. And as our knowledge of ourselves evolves, so will our tests. Until the day comes when we have developed foolproof tests for every conceivable situation, we must heed Rapaport’s warning, and proceed with the utmost professionalism with the materials and abilities we have at hand. The bottom line is that we must serve the client, and psychodiagnostic testing brings to the client the benefit of the experience of many other researchers and clinicians in addition to our own experience.

Professional Insights

I feel there are three important areas in which psychodiagnostic testing is important for the MFCC:

  1. Diagnosing the client
  2. Clarifying clinical questions as you are doing treatment, such as the severity of a client’s depression or the severity of a child’s emotional disorder
  3. Validating or invalidating the clinical assumptions about your client.

It is most common to perform diagnostic evaluations through case studies or interviews, but I feel these have limitations because of their subjective nature (including therapist’s bias). In my opinion, testing is an attempt to objectivity our more subjective clinical assumptions. For this reason, I feel we should exercise caution when using tests such as the Rorschach or the TAT, as they are no less subjective than an interview. While they can be helpful, they show only one view of a diagnosis.

I feel much more comfortable with tests that are more objective in nature. The CPI, for example, is very comprehensive in that it has a wide range of psychiatric questions that give the potential of a broader evaluation of the client. Also, well constructed objective tests have built in methods of determining the validity of the test results.

I would like to describe one experience I had that showed me the value of psychodiagnostic testing. In this case, a six-year-old with a high IQ was referred to us because of a behavioral problem he was having in school. After an initial evaluation, Dr. Fleming suggested further evaluation because of a number of issues that were noticed in the session:

  1. This little boy rarely smiled, laughed, or showed any joy, which is unusual for a child this age.
  2. He demonstrated difficulty in handwriting. He had letter reversals and seemed to have a limited attention span.

The diagnostic and educational evaluation confirmed Dr. Fleming’s suspicions that this child had a learning difficulty and was depressed. Treatment was recommended to remediate the learning disability and family therapy was instituted to better understand the source and cause of his depression. After about seven months this little boy was functioning at a much higher level educationally and emotionally. I was extremely impressed by how important it can be to have further evaluations to double-check your diagnostic suspicions, even for experienced therapists.

Sources of Tests

Tests and Publishers
Test Publisher
Bender-Gestalt Test Am. Orthopsychiatric Assoc.
Benton Visual Retention Test Psychological Corp.
CPI Consulting Psychologists
Content Evaluation Series Riverside Publishing
EPPS Psychological Corp.
G-H Drawing Test Psychological Corp.
Henmon-Nelson Test Riverside Publishing
Holtzman Inkblot Psychological Corp.
Kuhlmann-Anderson Test Scholastic Testing Service
Machover D-A-P Test Charles C. Thomas
MMPI Psychological Corp.
Picture-Frustration Study Saul Rosenzweig
PRF Research Psychologists
Rorschach Grune & Stratton, Inc.
Rotter Sentences Blank Psychological Corp.
Stanford-Binet Riverside Publishing
Strong-Campbell Stanford Univ. Press
TAT Harvard Univ. Press
Wechsler Series Psychological Corp.


Allport, G. Personality: A Psychological Interpretation. New York: Holt, 1937.

Anastasi, A. Psychological Testing. Second Edition. New York: Collier MacMillan, 1976.

Anastasi, A. Psychological Testing. Third Edition. New York: Collier MacMillan, 1981.

Bender, L. “A Visual Motor Gestalt Test and Its Clinical Use.” In American Orthopsychiatric Association Research Monographs. 1938.

Benton, A. Revised Visual Retention Test: Manual. New York: Psychological Corp., 1974.

Campbell, D. Manual for the Strong-Campbell Interest Inventory. Second Edition. Stanford, California: Stanford University Press, 1977.

Exner, J. Jr. The Rorschach: A Comprehensive System. New York: Wiley, 1974.

Goodenough, F. and Harris, D. “Studies in the Psychology of Children’s Drawings: II.” In Psychological Bulletin, 1950.

Holtzman, W., Thorpe, J., Swartz, J. and Herron, E. Inkblot Perception and Personality - Holtzman Inkblot Technique. Austin: University of Texas Press, 1961.

Hopkin, K. and Stanley, J. Education and Psychological Measurement and Evaluation. New York: Prentice-Hall, 1972.

Machover, K. Personality Projection in the Drawing of a Human Figure: A Method of Personality Investigation. Springfield, Ill.: Charles C. Thomas, 1949.

McReynolds, P. (Ed.) Advances in Psychological Assessment, Volume 1. San Francisco: Jossey-Bass, 1981.

Rapaport, D., Gill, M., and Schafer, R. Diagnostic Psychological Testing, Second Edition. New York: International Universities Press, 1968.

Rosenzweig, S. The Rosenzweig Picture-Frustration Study: Basic Manual. St. Louis: Rana House, 1978.

Terman, L. and Merrill, M. Stanford-Binet Intelligence Scale. Boston: Houghton Mifflin, 1973.

Wechsler, D. The Measurement and Appraisal of Adult Intelligence. Baltimore: Williams and Wilkins, 1958.