Class 5 - Mischel 1982

7/28/2019 Class 5 - Mischel 1982

1/26

Psychological Revie w Copyright 1982 by the American Psychological Association, Inc.1 9 8 2 , Vol.89, No. 6, 730-755 0033-295X/82/8906-0730$00.75

Beyond Deja Vu in the Search forCross-Situational ConsistencyWalter Mischel and Philip K. PeakeStanford University

Recent efforts to resolve the debate regarding the consistency of social behaviorare critically analyzed and reviewed in the light of new data. Even with reliablemeasures, based on multiple behavior observations aggregated over occasions,mean cross-situational consistency coefficients were of modest magnitude; incontrast, impressive temporal stability was found. Although aggregation of mea-sures over occasions is a useful step in establishing reliability, aggregation ofmeasures over situations bypasses rather than resolves the problem of cross-sit-uational consistency. The Bem-Funder (1978) template-matching approach didnot enhance the search for cross-situational consistency either in their originaldata or in an extended replication presented here. The Bern-Allen (1974) mod-erator-variable approach also was not found to yield greater cross-situationalconsistency in the behavior of "some of the people some of the time" either intheir original data or in the present study of conscientiousness. Congruent witha cognitive prototype approach, it was proposed and demonstrated that the judg-ment of trait consistency is strongly related to the temporal stability of highlyprototypic behaviors. In contrast, th e global impression of consistency may notbe strongly related to high ly genera lized cross-situational consistency, even inprototypic behaviors. Thus, the perception and organization of personality con-sistencies seems to depend more on the temporal stability of key features tha non the observation of cross-situational behavioral consistency, and the formermay be easily interpreted as if it were the latter.

The relative specificity versus consistency Reviewers of this dispute have repeatedlyof social behavior, and the nature and breadth noted a curious paradox (e.g., Bern & Allen,of the dispositions underlying such behavior, 1974; Mischel, 1973).On the one hand, corn-probed by Thorndike (1906), Hartshorne pelling intuitive evidence supports the en-and May (1928), Allport (1937, 1966), Fiske during conviction that people are character-(1961), and many others, still remains the ized by broad dispositions revealed in exten-focus of controversy in contemporary per- sive cross-situational consistency. On thesonality theory. The position one takes on other hand, the history of research in the areathese issues profoundly affects one's view of has yielded persistently perplexing results,personality and the strategies worth pursuing suggesting much less consistency than ourin the search for its nature and implications, intuitions predict.Few assumptions are simultaneously more Reactions to this consistency paradox haveself-evident, yet more hotly disputed, than taken two directions. One response has beenthat an individual's behavior is characterized to challenge the underlying assumptions ofby pervasive cross-situational consistencies, traditional trait theories (Mischel, 1968; Pe-terson, 1968; Vernon, 1964) and search foralternative ways of conceptualizing personPreparation of this paper and the research by the au- variables (e.g., Bandura, 1978;Cantor &thors were supported in part by Grant H D MH-0914 A,,- u . K\-H\ K . / I - u i imi irnru A cfrom the National Institute of Child Health an d Human Mischel, 1979; Mischel, 1973,1979) and ofDevelopment and Grant MH-36953 f r o m th e National Studying person-Situation interactionsi n s t i t u t e of M e n ta l H e a l t h . W eb e n e f i t e d m u c h f r o m the through more fine-grained analyses (e.g.,thoughtful comments of more colleagues than we can Magnusson & Endler, 1977; Moos & Fuhr,thank here: We are grateful for their generous help. , 009. patterson 1976' Patterson & MooreRequests forreprints should be sent to W alter Mis- l**~' , ' * ( ' fallerson *MOOTC,c h e l , J o r d a n H a l l , B u i l d i n g 4 2 0 , D e p a r t m e n t of P s y - 1979;Raush, 1977). Alternatively, it hasc h o lo g y , S t a n f o r d U n i v e r s i ty , S t a n f o rd , C a l if o rn i a 9 4 3 0 5 . been argued that the problems raised reflect

73 0

7/28/2019 Class 5 - Mischel 1982

2/26

CROSS-SITUATIONAL CONSISTENCY 731not the inadequacy of traditional conceptu-alizations of broad traits that yield cross-sit-uationally consistent behaviors but rather theinadequacy of earlier searches fo r such traits(e.g., Block, 1977; Olweus, 1977; Rushton,Jackson, & Paunonen, 1981). Of the manyefforts to pursue this "better methods" ap-proach to the consistency problem, the onesthat appear to be most dramatic, and thatcurrently have been greeted as most prom-ising, are the "reliability" solution (Epstein,1979) and the "template-matching" and "id-iographic" solutions proposed by Bern andhis colleagues (Bern & Allen, 1974; Bern &Funder, 1978). They are the focus of the pres-ent paper both because they deserve seriousattention in their own right and because theyare excellent exemplars of the methodologi-cal solutions proposed for the consistencyproblem, and their analysis may teach somehighly instructive lessons both fo r researchand theory building. In spite of the attentionthese approaches are attracting currently,the solutions they propose do not resolvethe basic issues raised by the consistencyparadox.The present analysis documents this claim,offers data that show the limitations of theapproaches of both Epstein and Bern and hisassociates in the search fo r behavioral con-sistency, and argues for a resolution of theconsistency paradox based on a theoreticalreconceptualization. New data on the tem-poral stability and cross-situational consis-tency o f conscientiousnessin college studentsare presented and analyzed, guided by a cog-nitive prototype approach (Cantor & Mis-chel, 1979; Cantor, Mischel, & Schwartz,1982a, 1982b). These analyses suggest thatthe widely shared intuitions of consistencyare based on valid observations of behavioralregularities but of a type different from thoseusually pursued in the effort to resolve th econsistency paradox.On Predicting Most of the People Much of

the Time: The Reliability SolutionIn Epstein's (1979) view, the consistencydebate in psychology, rather than meritingdeep and enduring discussion, should neverhave happened. He sees the consistency issuein personality as ready (indeed, as overdue)

for "a solution . . . so obvious that, oncepointed out, it reminds one of the fairy taleof The Emperor's New Clothing" (p . 1097).The solution is to realize that studies of be-havioral consistency rarely sample the be-haviors of interest on more than a single oc-casion. The consistency issue "can be re-solved by recognizing that most single itemsof behavior have a high component of errorof measurement and a narrow range of gen-erality" (p. 1097). In other words, the prob-lems of demonstrating behavioral consis-tency to support global traits are simply theresult of unreliable measurement in past re-search.Remembering Reliability

In support of his arguments, Epstein (1979)recently demonstrated that coefficients oftemporal stability (e.g., of self-reported emo-tions and experiences recorded daily and ofobserver judgments) become much largerwhen based on averages over many days.Epstein computed split-half reliabilities fo rsamples of behavior varying from 2 days upto about 28 days. H e found that as the num-ber of observations included in the compositeincreased, the split-half reliability also in-creased. O f course, this phenomenon is a fun-damental premise of classical reliability the-ory (Gulliksen, 1950; Lord & Novick, 1968;Thurstone, 1932), and the increase in reli-ability through use of aggregated compositesis exactly what th e Spearman-Brown formulahas been used to estimate for years (also seeHorowitz, Inouye, & Siegelman, 1979).The recognition that reliability is impor-tant and increases with the number of itemsaggregated is hardly new to the consistencydebate. Even introductory statistics texts rou-tinely intone that we cannot have either va-lidity or utility without reliability, and it isremarkable to suggest that the consistencydebate has suffered from mass amnesia forth e reliability construct and for the Spear-man-Brown prophecy. Far from overlookingreliability, virtually all of the classic, large-scale investigations of cross-situational con-sistency (e.g., Dudycha, 1936; Hartshorne& May, 1928; Newcomb, 1929) routinelyemployed behavioral measures aggregatedover repeated occasions yet reported findings

7/28/2019 Class 5 - Mischel 1982

3/26

732 W A LTE R MI S C H EL AND PHILIP K. PEAKEthat triggered the enduring debate (see Peake,Note 1, for review). Moreover, the Office ofStrategic Services (OSS) project in WorldW ar II, the Michigan Veterans Administra-tion project, the Harvard personologists, andthe Peace Corps projectsall large-scale ap-plied assessment projects of the 1940s, 1950s,and 1960sused aggregated measures,pooled judgments, assessment boards, andmultiple-item criteria and neverthelessyielded overall results that raised basic ques-tions about the usefulness and limitations ofthe traditional personality-assessment enter-prise (e.g., Peterson, 1968; Vernon, 1964;W iggins, 1973). Although those who criti-cally evaluated the state of the field in the1960s did not overlook the issue of reliability,they nevertheless concluded, as Vernon(1964) did, that "The real trouble (with th etrait approach) is that it has not worked wellenough, and despite the huge volume of re-search it has stimulated, it seems to lead toa dead end" (p. 239).As tempting as simple solutions might be,the problems raised by the consistency debatecannot be dismissed as the result of forget-fulness for the basic concepts of measure-ment error. How, then, can Epstein concludethat the consistency debate should neverhave occurred because it is resolved whenreliability is taken into account? Put anotherway, How can the use of aggregated measuresresolve a debate in the 1980s that it was un-able to resolve throughout the 1940s, 1950s,and 1960s? The answer to this seeminglypuzzling shift becomes quite simple once onerecognizes that the discrepancy is one of in-terpretation, not effect. Reliability is doingnothing more (o r less) fo r Epstein than it didfor any of the earlier large-scale assessmentprojects that sought and obtained it.Distinguishing Temporal Stability andCross-Situational Consistency

Adequate reliability coefficients can surelybe found, as Epstein has demonstrated andas earlier work amply attests. But we haveto discriminate clearly between demonstra-tions of impressive temporal stability on theone hand and cross-situational generality orconsistency in behavior on the other. Bycol-lecting specific observations over a series of

days, and then computing split-half "stabil-ity" coefficients, Epstein has accumulateddata (Tables 1, 2, 3, of Epstein, 1979) thatare relevant to the temporal stability of be-havior. But temporal stability has never beena central issue in this debate. A s noted byM ischel in 1968, "Considerable stability overtime has been demonstrated" (p. 36), and"although behavior patterns m ay often bestable, they are usually not highly generalizedacross situations" (p. 282).Although temporal stability is a funda-mentally important phenomenon and meritscareful empirical attention, it is not the basicissue of the consistency debate. In our view,the crux of the classic debate is the cross-situational consistency or discriminativenessof social behavior and the utility of inferringtraits for the prediction of an individual'sactions in particular contexts. The opera-tional distinction between temporal stabilityand cross-situational consistency is impor-tant conceptually because it allows one topostpone (though only momentarily) th eproblems of psychological similarity by sim-ply looking for the same behaviors over mul-tiple occasions in time. The moment the be-havior measures are not identical, the prob-lem of psychological similarity surfaces, andthat is the problem that has continually be-deviled the search fo r cross-situational con-sistency in a nomothetic trait framework.Although Epstein purports to resolve theconsistency debate by using aggregated mea-sures, most of his data are relevant only toan issue.that has never been seriously raised.Epstein presents some data (in Tables 4and 5 of his 1979 article) that do go beyondthe demonstration that aggregation increasesreliability coefficients and enhances temporalstability. H e presents (in his Table 4) inter-correlations of objective events with eachother and with self-rated emotions for a 12-day sample. Because the consistency of self-ratings is also not controversial (Mischel,1968), only the data intercorrelating aggre-gated objective events are directly relevantto the issue of behavioral consistency. Table1, adapted from Epstein's Table 4, showsthose coefficients that reached statistical sig-nificance. First, note that each of the mea-sures listed in Table 1 is the aggregate of a12-day sample. As such, the corresponding

7/28/2019 Class 5 - Mischel 1982

4/26

CROSS-SITUATIONAL CONSISTENCY 733reliability coefficients (first column) are assubstantial as they are noncontroversial.Evidence fo r Cross-Situational Consistency?

Now, consider the more interesting ques-tion, What is the evidence fo r cross-situa-tional consistency when highly reliable mea-sures of objectively observed behavior areintercorrelated?Among the 105 relevant cor-relations computed, the 7 that reached sig-nificance (at the p < .0 1 level) were letterswritten with letters received, calls made withcalls received, stomachaches with headaches,heart rate mean with heart rate range, errorswith heart rate range, erasures with letterswritten, and absences with erasures. Signifi-cant intercorrelations among such itemshardly suggest that the solution to the con-sistency issue has arrived. Most of the ob-tained interrelations seem virtually auto-matic; th e more doors I open, th e more Itend to close. And we also predict a signifi-cant correlation between how often you say"hello" to people and how often they say"hello" to you. Demonstrating some linksbetween bits of behavior (calls made, callsreceived) that m ay cohere (often almost bydefinition and because they are functionallyrelated, virtually demandingeach other) doesnot provide impressive evidence fo r cross-sit-uational consistencies. Similarly, demon-strating links between bits of behavior thathave no apparent conceptual relation (errorswith heart rate range) does not provide evi-dence for the breadth of personality traits.Finally, demonstrating some links betweenany tw o bits of behavior that do not evencome from the same person seems an oddway to prove consistency within a given per-son's behavior. It is puzzling how links be-tween the number of letters I write to you,for example, and the number of letters youwrite to me can speak to my consistency.Epstein also provides coefficients betweenaggregated objective events and trait-inven-tory scores (his Table 5 ). Most of the signif-icant associations he found are between self-reported headaches, self-reported stomach-aches, and other self-reported physicalcomplaints and troubles (e.g., muscle ten-sion, autonomic arousal on the Epstein-FenzAnxiety Scale). It is well known to person-

.o3$1IK3I*ic s Be

00 ooo\

7/28/2019 Class 5 - Mischel 1982

5/26

734 W A LT ER MISCHEL AND PHILIP K. PEAKETable 2Effects of Aggregation Over Occasions onTemporal Stability and Cross-SituationalConsistency

M easureTemporal stabilityCross-situationalconsistency

Singlebehaviors.29.08

Aggregates.65.13

Note. The coefficients reported are mean correlationcoefficients across al l possible comparisons of the des-ignated type.

ality assessors that some people do complainmore than others about their well-being (e.g.,Byrne, 1964; Mischel, 1981; Mischel, Eb-besen, & Zeiss, 1973). The remaining coef-ficients are extremely unimpressive. Suchobjective events as calls made, calls received,letters written, and letters receivedthe itemsof behavior that do interrelateeven whenaggregated and highly reliable, turn out tocorrelate significantly with none of the in-ventories. Epstein's consistency data, farfrom offering a solution to the problemsraised by the consistency debate, demon-strate and illustrate those problems vividly.The Carleton Behavior Study

An adequate empirical search fo r behav-ioral consistency requires data aggregated toachieve reliability. But it also needs to gobeyond reliability, beyond temporal stability,and beyond scattered self-report and behav-ior correlations. It needs to explore cross-sit-uational consistency in behavior with appro-priate and reliable behavior measures sam-pled across a range of presumably similarsituations. Furthermore, such a search needsto be informedby some conceptualizationeven if rudimentaryof how behavior is or-ganized or should be categorized for partic-ular goals (e.g., Cantor & Mischel, 1979;Mischel, 1973, 1979). That is exactly whatwe have been trying to do over the last 4years, studying behavioral consistency amongcollege students at Carleton College inNorthfield, Minnesota.In collaboration with Neil Lutsky, we ini-tiated the Carleton study in an effort to rep-licate the work of Bern and Allen (1974),extending greatly the behavioral referents

and battery of measures employed. In thisresearch, 63 Carleton College volunteers par-ticipated in extensive self-assessments rele-vant to friendliness and conscientiousness.They were assessed by their parents and aclose friend and were observed systematicallyin a large number of situations relevant tothe traits of interest. To illustrate the gist ofthe results as they bear on the issues ad-dressed in the present article, we will focuson the domain of conscientiousness/stu-diousness (henceforth called simply con-scientiousness).1 The behavioral referents ofconscientiousness in this work consist of 19different measures. For example, the behav-ioral assessment of conscientiousness in-cluded such measures as class attendance,study-session attendance, assignment neat-ness, assignment punctuality, reserve-readingpunctuality for course sessions, room neat-ness, and personal-appearance neatness. Notethat the specific behaviors selected as relevantto each trait were supplied by the subjectsthemselves as part of the pretesting at Car-leton College to obtain referents for the traitconstructs as perceived by the subjects; thisis in contrast to many studies in which thesereferents are selected exclusively by the as-sessors. For each different measure, repeatedobservations (ranging in number from 2 to12) were obtained. Thus, for example, weobtained three observations of assignmentpunctuality and nine observations of ap-pointment punctuality.This design allowed us to assess both thetemporal stability and the cross-situationalconsistency of behavior using single obser-vations and using measures aggregated overoccasions. We were thus able to systemati-cally assess the gains accrued on both tem-poral stability and cross-situational consis-tency when we employed the aggregation so-lution espoused by Epstein. The results ofthis analysis are summarized in Table 2.Temporal Stability: Cross-SituationalDiscriminativenessFirst we computed the percentage of sig-nificant coefficients among all the possible

' Although the focus here is on the conscientiousnessdata, results that essentially parallel those reported herehave emerged from analyses in the friendliness domainand are described in Peake (Note 1).

7/28/2019 Class 5 - Mischel 1982

6/26

CROSS-SITUATIONAL CONSISTENCY 735coefficients of temporal stability. To qualifyas a coefficient of temporal stability, the cor-relation had to consist of two observationsof the, same type of measure. For example,lecture attendance on Day 1 correlated withlecture attendance on Day 6 is a correlationof temporal stability. This analysis revealedthat nearly half ofthe single-observation tem-poral-stability coefficients (specifically, 46%)were statistically significant. Note that thisis prior to any aggregation to enhance reli-ability. Here, then, isanother clear indicationthat even at the single-observation level, tem-poral stability can be demonstrated readily.Given the moderate to high levels of tem-poral stability among the single observations,it is not surprising either conceptually or em-pirically that when these single observationsare aggregated into composite measures, allof the resulting reliability coefficients aresignificant (with the mean coefficientbeing .65).2We performed a similar analysis for all thecorrelations relevant to cross-situational con-sistency. At the single-observation level, cross-situational consistency coefficients consist ofsuch correlations as a single observation ofappointment punctuality with a single ob-servation of lecture punctuality, or with asingle observation ofclass-note neatness (i.e.,with any other single observation except an-other observation of appointment punctual-ity). As is now typical for research findingsof this type, although the percentage of sig-nificant correlations ( 1 1 % ) exceeded chance,the obtained correlations were highly erratic,with a mean coefficient of .08. The criticalquestion then becomes, What gains in cross-situational consistency are evidenced whenour more reliable aggregates are intercorre-lated?To address this question, wemust examinesuch correlations as aggregated lecture atten-dance with aggregated appointment punc-tuality or aggregated lecture attendance withaggregated appointment attendance.For thispurpose, 171 cross-situational consistencycoefficients were computed by intercorrelat-ing the 19 different aggregated measures ofconscientiousness. Of the 171 coefficients,20 % (3 5 coefficients) reached significancea number considerably above chance. Someof these coefficients reached substantial lev-els. For instance, aggregated class attendance

correlates highly with aggregated appoint-ment attendance (r=.67, p

7/28/2019 Class 5 - Mischel 1982

7/26

736 W A L TE R MISCHEL AND PHILIP K. PEAKEgation. To apply a more stringent test of theeffects of increased reliability for demonstrat-ing cross-situational consistencies, we fo-cused our analysis on those measures whoseestimated reliability exceeded an arbitraryselected level of .65. In all, 14 of the 19 orig-inal measures met this break-off point, andthe mean reliability estimate of these mea-sures is .74. Intercorrelating these 14 mea-sures results in 91 consistency coefficientswith a mean level of .14. Thus, restrictingour attention to our most reliable measuresyields a minimal gain in the mean cross-sit-uational consistency coefficient which in -creases from .13 to .14.Although restricting our attention to ourmost reliable measures does not seem to havea substantial effect on the cross-situationalconsistency evidenced in the data, one mightargue that a mean reliability coefficient of.74 is still too low to evaluate the reliabilitysolution. What would happen if all our meanreliability coefficients were .95? More im -pressively, what kinds of cross-situationalconsistency would be evidenced if we wereto use perfectly reliable measures? Of course,it is to answer just this type of question thatclassical test theory tells us to to apply thecorrection fo r attenuation. By correctingeach of the obtained correlations in our ma-trix fo r attenuation due to low reliability, wecan estimate the maximal ("true") level ofassociation between each of our measures.In this sense, correcting fo r attenuation is thelogical extreme, or ultimate test, of the "re-liability solution" as proposed by Epstein.W hen we apply the correction as describedabove, the mean consistency coefficient in-creases to .20. Thus, if we were to collect alarge number of observations fo r each mea-sure such that the reliability of each of thecomposites (aggregations) of these observa-tions approached 1.0, the mean level of cross-situational consistency evidenced betweenthese measures still would be unlikely to ex-ceed .20.

In light of the various results summarizedto this point, what can one conclude aboutthe promise of the reliability solution for theconsistency debate? First ofall, it is clear thataggregation ofrepeated observations in orderto obtain adequately reliable measures yields,as expected (and hardly to anyone's surprise),

gains in the mean levels of correlations bothfor measures of temporal stability and forcross-situational consistency in behavior.These gains are most impressive fo r measuresof temporal stability (mean r - .65) and doc-ument that aggregation over occasions is auseful method fo r increasing the reliabilityof a measure. The results also indicate, how-ever, that aggregating observations over oc-casions does not necessarily lead to highcross-situational consistency (mean r = .13or about .20 if perfect reliability is assumed).4Although aggregation over occasions has thedesirable effect of enhancing reliability, itdoes not provide a simple solution to theconsistency paradox.The results of the Carleton project are notunique. A survey of the early studies of thecross-situational consistency of behavior(Peake, Note 1) indicates that the obtainedresults are quite consistent with past findings.These early studies, including the studies byHartshorne and May (1928) of honesty, byNewcomb (1929) of introversion-extrover-sion, by Allport and Vernon (1933) of ex-pressive movements, and by Dudycha (1936)of punctuality, routinely employed repeatedobservations of each behavior to increase thereliability of their measures. Each of theseinvestigators reported substantial reliabilitycoefficients, not as a solution to the consis-tency problem, but as one index of the ad-equacy (reliability) of their aggregate mea-sures. More importantly, each of these stud-ies obtained cross-situational correlationcoefficients of around .20 when using the re-liably aggregated measures. Although Epsteinproposes aggregation as a potential cure-allfor the consistency problem, this cure hasbeen employed routinely for years, and itsuse only serves to document more clearly the

4 The summary coefficients of mean temporal stabilityan d mean cross-situational consistency are not strictlycomparable because the former always reflect the sameresponse forms in the same situation, whereas the latteroften reflect different response forms in different situa-tions. To allow a direct comparison, cross-situationalconsistency coefficients were computed separately whenthe response forms in the different situations were thesame (mean r = .28) an d when theywere different (meanr= .12). Thus, whereas th e mean correlation for thesame response forms in the same situations over timewas .65 (temporal stability), it was .28 for the same re-sponse forms in different situations.

7/28/2019 Class 5 - Mischel 1982

8/26

CROSS-SITUATIONAL CONSISTENCY 737

pervasiveness of the phenomena of behav-ioral discriminativeness.The Carleton data do not imply, however,that there is little coherence among the be-haviors studied. Although the overall pat-terning of correlations is erratic and on theaverage low level, the results do not suggestthat behavior is random and unorganized.A swas noted above, some impressive coeffi-cients emerge, and coherent patterns of cor-relations are apparent among some of thevariables. In addition, 78% (133 of 171) ofthe obtained correlations are positive, and ofthe 38 negative correlations, only 2 reach sta-tistical significance. Thus, we obtained con-siderably more positive significant correla-tions and also obtained considerably fewernegative significant correlations than wouldbe expected by chance. So, whereas the datareflect behavioral discriminativeness, thereis also a positive trend, a coherence or gistamong the behaviors sampled.Aggregation of the Data WithoutAggregation of th e IssuesCross-situational consistency coefficientsof the sort we are finding can be construedas evidence either for the relative discrimi-nativeness of behavior or for its coherence,and as evidence either for a stable thread ofindividual differences or for the need to takeaccount of situations seriously. How onereads the results depends on the particularpurposes of the research or assessment task.Recently, however, it has become increas-ingly common to interpret mean coefficientsof the sort obtained at Carleton as ample ev-idence for the consistency of behavior. Byaggregating measures of behavior in partic-ular situations into a single composite, or"multiple-act criterion," substantial internalreliability coefficients are readily obtained(Fishbein & Ajzen, 1975). The problem hereis not whether reliability will increase by us-ing various types of data aggregation but howto select the appropriate type and level ofdata aggregation fo r particular research prob-lems. Automatic aggregation into overallcomposites simultaneously risks aggregationof the conceptual issues.Aggregation of observations over occa-sionsEpstein's (1979) basic admonition

certainly is a requisite fo r adequate reliabil-ity. No one would contend that a person'sattendance at today's psychology lecture at9 a.m. is an adequate index of that person'stendency to attend psychology lectures. Bymeasuring lecture attendance on repeatedoccasions, a more reliable index of lectureattendance will result. Of course, aggregationper se need not stop at aggregation over oc-casions (Epstein, 1980). The investigator whowants to amass high correlation coefficientscould forge ahead and aggregate behavioracross different response forms and evenacross situations, arriving at the now popularmultiple-act criterion (Fishbein & Ajzen,1975; Jaccard, 1974; McGowan & Gormly,1976; Rushton et al., 1981). Here, again,these aggregations will be appropriate anduseful fo r some purposes but not for others.Consider the case fo r aggregating acrossresponse forms. In many situations a partic-ular dimension of interest may be assessedthrough several behavioral manifestations ortypes of responses. In our data at CarletonCollege, for instance, one could assess cross-situational consistency not just through theintercorrelation of the 19 measures discussedso far but by treating particular settings (e.g.,classrooms) as the situations and aggregatingthe multiple response forms sampled withinthem. In that case, conscientiousness in class-room situations may be indexed by suchmeasures as lecture attendence,1lecture punc-tuality, note thoroughness, and so forth. Theinvestigator who is more interested in "con-scientiousness in classroom situations" thanin th e variations between measures withinthis situation could appropriately aggregateacross the response forms. The compositemeasure of conscientiousness in the class-room could then be compared with measuresof conscientiousness, aggregated across re-sponse forms, in other situations (e.g., at ap-pointments). By aggregating across occasionsand response forms in the Carleton data, themean number of specific observations persituation became 20. The mean cross-situa-tional consistency coefficient for these aggre-gated measures was .18. Aggregation acrossresponse forms within situations provides auseful increment, but hardly a resolution ofthe paradox.After measures have been aggregated over

7/28/2019 Class 5 - Mischel 1982

9/26

738 W A L TE R MISCHEL AND PHILIP K. PEAKEoccasions, and across response forms withinsituations, it is possible to aggregate even fur-ther, combining measures across situations.One can treat the specific situations simplyas "error" and aggregate across them to forma single composite score and, as the Spear-man-Brown formula predicts, convert ouraverage .13 cross-situational consistencycoefficient into an internal reliability esti-mate of .74. But aggregating across situationsin this way cancels the variance and speci-ficity due to situations, thereby bypassing theproblem of crbss-situational consistency in-stead of solving it; that "solution" merelytreats situations as errors to be averaged outrather than as psychologicalunits to be takeninto account. Although achieving reliablemeasures by sampling over occasions is ofself-evident value, further aggregation, acrossresponse forms or across situations, must bedictated by the assessor's goals and by a prioritheoretical considerations about psychologi-cal similarity (equivalence groupings) andabout the level of generalitythe unitsatwhich assessment should occur fo r particularpurposes (e.g., Cantor & Mischel, 1979; Mis-chel, 1977). Although such aggregation isuseful fo r making statements about meanlevels of behavior across a range of contexts,cross-situational aggregation also often hasthe undesirable effect of canceling out someof the most valuable data about a person. Itmisses the point completely for the psychol-ogist interested in the unique patterning ofthe individual by treating within-person vari-ance, and indeed the context itself, as if itwere "error."The persistent pursuit of consistency byattempting to treat situations as error is acurious paradox in a field committed to afocus on individuality and the pursuit of per-sonology (e.g., Carlson, 1971). On the onehand, the importance of attention to thewithin-person patterning of attributes andbehaviorthe crux of the idiographic ap-proach and the uniqueness of the personhas long been recognized (e.g., Allport, 1937).On the other hand, this patterning is aggre-gated out and treated as if it undermined thebasic phenomena of personality psychology.As a personologist (or clinician) I may be lessinterested in aggregating a child's total ag-gressiveness across situations than in notingthat she is aggressive with her sister, but not

with her brother, or is aggressive only whenteased in a particular way, but never whenin the presence of her father. In sum, aggre-gation may be ideal fo r canceling out manyinfluences and describing mean level differ-ences between individuals, but such lumpingis obtained at the cost of much valuable in-formationoften th e most interesting infor-mationabout the individual in the partic-ular contexts in which he or she lives. Forthe clinician as well as for the personologist,aggregation is often the route to weak gen-eralizations about people in general but by-passes the uniqueness, specificityand pre-dictabilityof individuality to which a sci-ence of personology is ostensibly devoted.The conceptual problems that must beaddressed when aggregating various types ofdata are essentially problems of psychologicalsimilarity. Evidence for higher average coef-ficients for temporal versus cross-situationalconsistency suggests that when the situationsare as close as possible to identical (i.e.,changed pnly by time), there is impressiveaverage stability. But when situations be-come even somewhat dissimilar, the patternsbecome more complex and uneven, averagecoefficients become much lower, and consis-tency can no longer be assumed. The needto search fo r ways to identify similarity thenbecomes manifest (see Lord, 1982, and Mag-nusson & Ekehammar, 1973, fo r interestingexamples). Few problems in psychology seemmore basic than that of psychological simi-larity, (e.g., Tversky, 1977), and its resolutionultimately should have much to say to thestudy of situational equivalences and the cat-egorization of behavior. Theory-guided ag-gregation requires identifying psychologicalequivalences and the psychological similarityamong situations, not just averaging every-thing that can be summed.In our view, th e challenging problems inthe consistency debate require more thansearching for significant correlations and rec-ognizing coherence in the obtained results.W e need to understand why the obtainedcoherences emerge and when and why ex-pected coherences do not. The technologiesof psychometrics supply us with ample meth-ods for distilling th e coherence among ourmeasures, fo r accentuating th e mean levelsof individual differences we have identified,and for focusing on their gist. As our analysis

7/28/2019 Class 5 - Mischel 1982

10/26

CROSS-SITUATIONAL CONSISTENCY 739continues, we intend to em ploy these v arioustechnologies in hope of fully illuminating thepsychological significance of these coher-ences. However, we simultaneously intendto pursue the oft-neglected alternative pathof attempting to understand the discrimi-nativeness that also clearly ex ists in our dataand that w e believe dem and s an interactionalperspective that treats situations as sourcesof meaningful variance. For instance, in ourresearch at Carleton College, we plan tosearch for consistency at different levels ofabstraction-generality in the data, from th emost "subordinate" or molecular to increas-ingly broad "superordinate" molar levels,guided by a cognitive prototype and hierar-chical-levels analysis of the sort proposed byCantor and M ischel (1979) and Cantor et al.(1982a). W e also plan to explore the com-parative usefulness of measures specificallydesigned to tap such co gnitive social-learningperson variables as the individual's relevantcompetencies, encodings, expectancies, val-ues, and plans (M ischel, 1973, 1979).Such analyses should help illuminate pro-cesses that underlie both the significant andthe nonsignificant coefficients yielded by per-sonality researchthe "uneven and erraticpatterns" of behavior that characterize per-son-situation interactions. Aggregation ofrepeated observations ove r occasions w ill aidin these analyses by providing a more ac-curate picture of the significant and nonsig-nificant links that characterize the data. T hereliability solution, rather than providing asimple answer to the issues raised regardingthe cross-situational consistency o f behavior,highlights their complexity. Rather than re -solving the co nsistency debate, the reliabilitysolution underlines the need to seek alter-native conceptualizations of personality thatmight lead us to a better assessment, under-standing, an d appreciation of both the co-herence and the discriminativeness of hu m anbehavior.

On Predicting M ore of the PeopleMore of the Time:The Tem plate-M atching SolutionAnother solution for the consistency prob-lem is offered by Bern and Funder (1978).Their aim was to utilize Block's C aliforniaChild Q-Set to find a common language fo r

the description of both persons an d situationsand to develop a template-matching tech-nique that allows one to examine th e inter-face of person and situation characteristics.Because of the great promise it offers, andthe substantial attention it already has at-tracted, Bern andFunder's contribution mer-its careful analysis and a thorough exami-nation of its findings and implications. In thisanalysis the focus will be on the content areaof delay of gratification, the domain Bern andFunder chose to illustrate their approach, asit speaks to the consistency problem, andthus will deal only w ith their first study.The Bem-Funder Study

Bern and Fun der no te the failure to obtainmore impressive cross-situational consis-tency coefficients in personality research gen-erally and for delay of gratification in partic-ular. They suggest that the inconsistenciesobtained are due to the erroneous equationof situations th at are sup erficially similar butfunctionally different. To identify situationsthat are functionally dissimilar though ap-parently similar, they propose ex am ining th erated personality characteristics associatedwith the behavior in these situations. Theirultimate hope is to increase the predictivepower of trait in form ation by ma tching theindividual's personality characteristics withthe "personality of the situation," defined byQ-sort ratings that supply "portraits" of boththe individual and the setting. They see theirwork as relevant to the consistency issue byshowing that situations that seem alike mayactually be dissimilar, as evidenced by dis-crepant patterns of Q-sort correlates. W etherefore should not expect behavior in suchsituations to be highly intercorrelated. More-over, they argue that only when situationsare characterized by similar Q-sort portraitsshould we expectand findhigh intercor-relations among the behaviors displayed inthem.

Their procedure and results require closeconsideration. They exposed 29 preschoolchildren to a version of the traditional delay-of-gratification situation. Although Bern andFund er retained the basic features of the de-lay paradigm (e.g., M ischel & Ebbesen, 1970),there was a possibly important difference:The experimenter remained in the room w ith

7/28/2019 Class 5 - Mischel 1982

11/26

740 W A L TE R MISCHEL AND PHILIP K. PEAKEthe child during the delay period rather thanleaving the room as in the traditional para-digm. Correlations were computed betweena child's delay time and each of the 100 itemsin the Q-set (i.e., the parents' trait ratings ofthis child).Some of the correlations they obtainedwere highly consistent with previous findingson the rated characteristics of children whoare high versus low in their ability to wait inthe standard delay situations (see Table 3,reproduced from Table 1 of Bern & Funder,1978, p. 490). Bern and Funder note the ex-istence of these expected correlations (e.g.,Table 3Q-Item Correlates With Delay Scores

Item rPositively correlated

H as high standards of performance fo rself .48***Tends to imitate and take over thecharacteristic manners and behavior ofthose he or she admires .39**Is protective of others .39**Is helpful an d cooperative .36*Shows a recognition of the feelings ofothers (empathic) .35*Is considerate an d thoughtful of otherchildren .34*Develops genuine and close relationships .31*

Negatively correlatedAppears to have high intellectual capacityIs emotionally expressiveIs verbally fluent, can express ideas well

in languageIs curious and exploring, eager to learn,open to new experiencesIs self-assertiveIs cheerfulIs an interesting, arresting childIs creative in perception, thought, work,or playAttempts to transfer blame to othersBehaves in a dominating w ay with othersIs restless an d fidgetySeeks physical contact with othersIs unable to delay gratification

-.62***-.56***-.50***-.49***-.47**-.43**-.43**-.40**-.37**-.34*-.31*-.31*-.31*

Note. From "Prediciting More of the People More of theTime: Assessing the Personality of Situations" by DarylJ. Bern an d David C. Funder, Psychological Review,1978, 85, 485-501. Copyright 1978 by the AmericanPsychological Association. Reprinted by permission.* p < .10 (two-tailed). **p < .05 (two-tailed). ***p

7/28/2019 Class 5 - Mischel 1982

12/26

CROSS-SITUATIONAL CONSISTENCY 741Q correlates of delay behavior in two differ-ent delay situations, for two different sets ofchildren, where samples were not evenmatched in age. Their paper thus raises theprospect of improving evidence for cross-sit-uational consistency in behavior, but it doesnot take the essential step of doing so. Thatis why the present authors proceeded to tryto fill this void and tested the Bem-Funderpromise empirically with an appropriate de-sign for assessing cross-situational consis-tency.The Mischel-Peake Study

W e attempted to replicate the Bem-Fun-der study, with some important additions.W e exposed children to the Bem-Funderparadigm using identical procedures, thesame immediate and delayed rewards, andsubjects of the same age drawn from the samepopulation (Stanford University'sBing Nurs-ery School), whom we tested at about thesame time of year as those in the Bem-Fun-der study. These children also participatedin the standard delay paradigm (described inMischel & Ebbesen, 1970), which differsfrom the Bem-Funder procedure only in thatthe experimenter is absent during the delayperiod. This design made it possible for usto assess the consistency of the children'sdelay behavior across the two situations (ex-perimenter present versus absent) and tocompare the Q correlates for the two situa-tions systematically. The children partici-pated in the two situations in random orderand within a period of 3 weeks. To enhancereliability, the sample size was nearly twicethat in the Bem-Funder study.Delay behavior in the two situations cor-related at a modest level (r = .22, ns). Ac-cording to Bern and Funder's reasoning, thislow level of association provides an ideal testof their proposition on two counts. First, thetw o situations appear to be similar (indeed,the two vary only in the experimenter's pres-ence versus absencean aspect apparentlyso trivial that Bern and Funder employed itas a substitute for the standard Mischel delayparadigm). Second, behavior in the two sit-uations proves to be not strongly intercor-related empirically. Applying the Bem-Fun-der Q-sort approach to the two situations

should reveal the "distinctive features" thatmake these seemingly similar situations psy-chologically different, showing their "uniquepersonalities" and revealingwhybehavior inthe two situations is not more closely inter-correlated.Our investigation yielded results that sur-prised us greatly. In spite of our considerableefforts to conduct an exact replication of theBem-Funder delay study, the data we ob-tained did not support their basic findings;it reversed them. The major distinctive Qcorrelates obtained by Bern and Funder areshown in the first column of Table 4; the Qcorrelates yieldedby our replication are givenin the second column. The resulting portrait,far from replicating the Bem-Funder dis-tinctive features, is the opposite of the onethey drew. For example, on the identicalmeasure (i.e., with experimenter present), thehigh-delay child, rather than being intellec-tually dull and uneager to learn, appears tohave high intellectual capacity (r = .23, p

7/28/2019 Class 5 - Mischel 1982

13/26

742 WALTER MISCHEL AND PHILIP K. PEAKETable 4Comparative Correlates fo r Significant"Distinctive" Items of the Bern and Funder(1978) Delay Situation

ItemBern andFunder( 1978) / -

Mischel andPeakereplication rAppears to have highintellectual capacity -.62*** .23*Is verbally fluent, canexpress ideas well inlanguage -.50*** .06Is curious and exploring,eager to learn, opento new experiences -.49*** ,27*Is self-assertive -;47** .09Is cheerful -.43** -.19Is an interesting,arresting child -.43** -.17Is creative in perception,thought, work, or play -.40** .00N 29 52* p < .10 (two-tailed). **p < .05 (two-tailed). ***p

7/28/2019 Class 5 - Mischel 1982

14/26

CROSS-SITUATIONAL CONSISTENCY 743O ld Cookbooks in New Templates?

In our view, the significanceof the resultsdiscussed so far is not that they documentthe empirical limits of a particular study butrather that they illustrate the boundaries ofa more general approach with a formidablehistory and tradition. To understand the fail-ure of the Bem-Funder strategy for solutionof the consistency problem, we must beginby examining more closely the strategy itself.They hoped their strategy would be a newroute to uncover the "personality of the sit-uation"in this case, the delay-of-gratifica-tion situation. But though they conceptual-ized and described their efforts in the lan-guage of person-situation interaction andhoped to capture the personality of the delaysituation, their method only described thecorrelates of performance in that situation(i.e., the ratings associated with duration ofwaiting time). To justify describing the Bem-Funder approach as an assessment of thepersonality of situations would require thatone at least also Q sort the situations inde-pendently, not just the people who performin them (Hoffman & Bern, Note 2). No setof correlates for performance, no matter howdescribed, can do more than illuminate someof the specific ways particular kinds of peopleare likely to perform or to seem differentfrom other kinds on particular measures. Thesearch to characterize what kinds of peopleare likely to do well on particular tasks, sit-uations, or measures is exactly what tradi-tional personality assessmentboth clinicaland actuarialhas been about fo r decades(e.g., the classic work of Murray and theHarvard personologists). Such a search forperformance correlates to characterize themeaning of a response pattern and to predictindividual differences is the essence and sta-ple of the trait approach (Mischel, 1968).This search can be undertaken through astrategy of construct validity (Cronbach &Meehl, 1955), in which the focus is on thedelay situation, there also have been attempts to replicatetheir effort to predict attitude change in the forced-com-pliance paradigm. These attempts at extended replica-tion so far also seem to have failed (Funder, 1979; Hoff-man & Bern, Note 2). A more situation-specific or "con-textual" procedure now has been substituted for theglobal Q-sort descriptions (Hoffman & Bern, Note 2)and seems promising, but it might be wise to await rep-lication of these new efforts before judging them.

development of a theory about what ac-counts for the behavior of interest. Or, likeBern and Funder's approach to revealingconsistency in delay of gratification, it canbe undertaken in an entirely empirical fash-ion, as in the actuarial and empirical keyingstrategies pioneered by Paul Meehl (e.g.,1956) and favored fo r many years in person-nel and psychodiagnostic assessment. The"cookbooks" and "atlases" devised two de-cades ago in this approach, for example, triedto assess the degree of fit between an indi-vidual and an ideal personality type (definedby a distinctive Minnesota Multiphasic Per-sonality Inventory [MMPI] test configura-tion) in order to predict a pattern of criterionperformance and characteristics associatedwith that ideal type in a particular setting(e.g., a Veterans Administration mental hy-giene clinic). It is worth remembering in de-tail Meehl's (1956) cookbook method for theprediction of personality descriptions and/orother data in particular situations. The logicand fateof his approach bears directly onthe solutions currently proposed by Bern andhis associates:In the cookbook method, any given configuration (ho-lists please noteI said "configuration," not "sum"!)ofpsychometric data is associated with each facet (or con-figuration) of a personality description, and the closenessof this association is explicitly indicated by a number.This number need not be a correlation coefficientitsform will depend upon what is most appropriate to thecircumstances. It may be a correlation, or merely anordinary probability of attribution, or (as in the empir-ical study I shall report upon later) an average Q-sortplacement, (p. 264)Twenty-two years later, Bern and Funder(1978) wrote,What is being proposed here, then, is that situations becharacterized as sets of template-behavior pairs, eachtemplate being a personality description of an idealized"type" of person expected to behave in a specifiedwayin that setting. The probability that a particular personwill behave in a particular way in a particular situationis then postulated to be a monotonically increasing func-tion of the match or similarity between his or her char-acteristics and the template associated with the corre-sponding behavior, (p. 486)

Note again that although their intent is tocharacterize situations, the operation is ac-tually to seek the correlates ofperformancein a specific situation. Both Bern and Funderand Meehl assert that the probability that asubject will do a specified act in a particular

7/28/2019 Class 5 - Mischel 1982

15/26

744 W A L TE R MISCHEL A ND PHILIP K. PEAKEsituation (or be judged to have particularcharacteristics) depends on the degree ofmatch or similarity between that subject'sscores and a given configuration. The con-figuration is a "template" fo r Bern and Fun-der, a "profile" for Meehl. The behaviors (oroutcomes) fo r Bern and Funder are suchthings as delay time or attitude changeor adjustment to Stanford University. ForMeehl, they were Q-sort patterns, other per-sonality descriptions, or any other data (e.g.,duration of remission or adjustment to theMinnesota Veterans Administration hospi-tal). The method fo r determining the degreeof match between the subject and the con-figuration involves a particular weightingprocedure fo r Bern and Funder; it was alsoachieved quantitatively by Meehl but withth e recognition that th e particular form (sim-ple correlation, regression, probability attri-bution) will depend on its appropriatenessfor th e particular purpose at hand. And, likeBern and Funder, Meehl (1956, p. 267)searched fo r ideal types whose distinctiveprofiles (recipes) would be related to suchcriterion data as Q sorts in the cookbooks.Although the language in the descriptionsof the two methods is somewhat different,the Bem-Funder "template-behavior pair"parallels the Meehl "configuration-person-ality description association." The two ap-proaches seem to overlap a good deal, in-cluding a heavy reliance on the same tech-niquethe Q sort. Indeed, Meehl (1956)illustrated hispioneering article on the cook-book method with a study that shows itsvalue for predicting therapists' Q sorts of pa-tients from their match to the cookbook'sM M P I curve types (substitute "templates").For example, for each patient who best fitsa given MMPI code, "we simply assign theQ-sort recipe found in the cookbooks as thebest available description. Howaccurate thisdescription is can be estimated (in the senseof construct validity) by Q correlating it withthe criterion therapist's description" (p.268).Empirically, the initial yield was highly en-couraging, with a median validity coefficientof .69.Once the close parallels are recognized, itbecomes clear that the Bem-Funder ap-proach shares both the strengths and theweaknesses of its predecessor. The strengths

are the bypassing of weak (poor, messy) the-ory-based predictions for the sake of neat,mechanically assisted empiricism.7 Theweakness is the opposite side of this samecoin, that is, the cost of blunt empiricism:Atheoretical approaches yield results thattend to seem promising at first but are no-toriously difficult to replicate. Assessors ex -cited by the first actuarial cookbooks andtheir strong predictions were soon sobered bythe frequent failure to replicate the config-uration-outcome patterns that at first seemedso useful. Particularly when the sample sizeis small, and when the associations are purelyempirical (with no theoretical hypotheses apriori), correlations with Q-sort items maybe extremely unstable and can vanish rap-idly, as we saw in the failure to replicate thedistinctive Bem-Funder portraits. If person-ality assessment has led to any firm conclu-sions, it is that it is generally not worth of-fering conclusions about actuarially obtainedperformance correlates unless they are basedon careful cross-validation.Another truism emerging repeatedly fromMeehl's once-exciting cookbook approachand from the history of personality assess-ment more generally is that simple linearcombinations of single scale scores often turnout to be more accurate than complex, so-phisticated configural models, including theMeehl-Dahlstrom Rules (e.g., Goldberg,1965). Simple cooking with simple recipesgenerally works better than more esotericmethods in actuarial assessment. At a min-i mum, it seems wise to compare th e incre-ments that fancier (thus costlier) methods(like template matching) provide when testedagainst old-fashioned, basic fare (like linearregression to predict a particular behavior ina given situation from any cross-validateditems shown to have predicted it before). Itm ay be less exciting, but often it works better(e.g., Tellegen, Kamp, & Watson, 1982).Complex weighting procedures, as in theBem-Funder template-matching technique,m ay inadvertently serve to compound error

7 O ur analysisand com ments in the present article arerestricted to Bern and Funder's (1978) first study, theonly on e that addressed the consistency issue, not totheir efforts to use social-psychological theories to gen-erate predictors about individual differences in the rel-evant social situations.

7/28/2019 Class 5 - Mischel 1982

16/26

CROSS-SITUATIONAL CONSISTENCY 745(by weighting chance findings) and thus hurtrather than help the enterprise. For example,reviewing efforts to compare a number ofregression equations that varied in complex-ity from linear to higher order, Wiggins(1981) noted,As one might expect, there was a tendency for increasedpredictiveness to be associated with increased complex-ity of prediction modelsin the sample from whichpredictor weights were derived. W h en these same equa-tions were applied to an independent sample, however,there was a tendency for decreased predictiveness to beassociated with increased complexity ofmodels. In otherwords, cross-validation appears to wipe out any predic-tive gains that are apparent in a derivation sample. Per-haps there is a general principle here that should dis-courage psychologists from overfilling their data withcomplex equations, (p. 6)

In sum, the Bem-Funder attempt to re-solve the consistency issue held out an ex-citing prospect. Wehave analyzed it in detailto try to untangle the promise from the out-come. The prospect of a parallel languageand methodology for the study of personsand situations remains attractive, but Bernand Funder's efforts toward a solution of theconsistency issue in the delay domain proveddisappointing both in its method and in itsresults. When the sample is adequate, onefinds the typical, replicable, low-level, theory-consistent Q-sort correlates of delay behav-ior. No advance ismade by use of their meth-ods to show replicable distinctive portraits,to demonstrate improved cross-situationalconsistency in the domain, or to explain itsphenotypic absence. The promise that theBem-Funder technique would allow dem-onstrations of impressive cross-situationalconsistency by identifying measures thathave similar rather than distinctive Q-sortcorrelates still awaits realization. We are pes-simistic about the prospect fo r this effort inits present form not only because of the datawe presented bu t because of the theoreticalconsiderations discussed and the fate of rel-evant efforts attempted repeatedly in thepast.On Predicting Some of the People Some ofthe Time: Bern and Allen's (1974)"Idiographic" Solution

In the third approach to the consistencyissue to be discussed here, Bern and Allen

(1974) argued that the low consistency coef-ficients so typically found in research reflectth e nomothetic fallacy of assuming that alltraits are relevant to all people. Instead, theyurged adopting an "idiographic stance,"studying only the subset of people fo r whoma given trait is relevant.8 For this purpose,Bern and Allen separated subjects into high-and low-variability groups, using subjects'self-reports about their variability and be-havior. They then tried to document thatlow-variability subjects are in fact more con-sistent than high-variability subjects on twotraits: friendliness and conscientiousness.To assess cross-situational consistency,Bern and Allen used measures consistingmostly of global ratings of the subjects madeby the subject, a peer, and the subject's par-ents. In addition, O n each trait dimension,several composited behavior measures wereemployed. Careful inspection of their cor-relational matrices reveals good support fo rtheir predictions on the rating data. For ex-ample, for subjects classified as low ratherthan high variability, raters agreed much

more about the subject's level of conscien-tiousness. But, whereas the technique nicelyidentifies people fo r whom the correlationsamong the global ratings will be substantial(i.e., people about whose trait level ratersagree), the results are tenuous when behaviormeasures are intercorrelated. In fact, onthose fe w measures directly relevant to theissue of cross-situational consistency of be-havior, only one correlation was higher fo rsubjects rated a priori as "low variability."Thus, in their analysis of friendliness, the

8 Bern and Allen presented their work as "idio-graph ic," and it is widely cited as exemplifying the powerof idiographic methodologies (e.g., Kenrick & String-field, 1980). There is a confusion of terminology here,however. Idiographic usually refers to the unique orga-nization of traits within individuals(Allport, 1937), no tto the fact that not all characteristics may be relevantto all people. Because Bern and Allen's approach doesno t speak to wilhin-person trait organization, it seemsa misnomer to label it idiographic. Rather, their ap-proach rests on the assumption that a given trait di-mension may simply not be relevant fo r some people,and such irrelevance may be identified byselecting thosesubjects who rate themselves as highly variable on thatdimension. Instead of bearing on the idiographic-no-mothetic distinction, this approach, as noted elsewhere,is an instance of the classic moderator-variable strategy(Tellegen, Kamp, & Watson, 1982).

7/28/2019 Class 5 - Mischel 1982

17/26

746 W A LT ER MISCHEL AND PHILIP K. PEAKETable 5Interrater Agreement About Trait Standing for Low- and High-Variability Groups Determined bySelf-Report (Below Diagonal) and Ipsatized Variance (Above Diagonal) fo r Conscientiousness

Measure Self-report Mother'sreport Father'sreport Peer'sreportSelf-reportM other's reportFather's reportPeer's report

.56 (.20).74 (.28).66 (.39)

.47 (.29)

.76 (.29).71 (.29)

(.52)(.45).66 (-.12)

.54 (.50).75 (.35).44 (.24)

Note. Agreement ratings for high-variability subjects are in parentheses.

correlation between "spontaneous friendli-ness" and "group discussion friendliness"was .73 for the low-variability group, whereasthe same correlation for the high-variabilitygroup was .30. However, none of the threelow-variability versus high-variability com-parisons among behavior measures of con-scientiousness fell in the predicted direction.It would seem premature to offer conclusionsabout the efficacy of the Bern and Allen ap-proach as it speaks to the issue of behavioralconsistency on the basis of a single confirm-ing comparison (see Lutsky, Peake, & Wray,Note 3 ).In view of the weight given to the Bern andAllen (1974) data in current theorizing aboutthe consistency issue, it seemed important totry to replicate their work as carefully as pos-sible, extending the number and types of be-havior measures obtained. For this purpose,their Cross-Situational Behavior Survey(CSBS) was administered to the 63 subjectsat Carleton College. Similar ratings of thestudents were obtained from their parentsand from a close friend, using a modifiedCSBS. Subjects were divided into high- andlow-variability groups for both traits on thebasis of their self-reported variability and onthe basis of the ipsatized variance indexthetw o techniques proposed by Bern and Allen.For convenience, as well as continuity withthe rest of our presentation, we will sum-marize the findings only for the conscien-tiousness domain. (Detailed analyses anddiscussion of these data for both traits are inPeake & Lutsky, Note 4.)As already noted, the bulk of Bern andAllen's data consisted ofinterrater agreementacross CSBS trait ratings. Table 5 shows thaton these measures, their results are nicely

replicated regardless of the classification pro-cedure used. Raters agreed more about sub-jects classified as low variability by either ofthe Bern-Allen techniques. Using the self-re-ported variability procedure, the mean in-tercorrelation for low-variability subjects*was.68compared to .22 for high-variability sub-jects. The comparable mean coefficients us-ing the ipsatized variance index were .56 and.39 for the low- and high-variability groups,respectively.These findings replicate Bern and Allen'sfor those measures that worked best fo r them,providing support that' their techniques forclassifying low- versus high-variability peopleallow one to select those individuals fo rwhom raters will tend to agree when makingglobal personality judgments. But more rel-evant to the issues of behavioral consistencyare the cross-situational behavior data fromthe Carleton project summarized in previoussections on the applications of the reliabilitysolution to the conscientiousness data. Not-ing the .13 mean consistency coefficient ob-tained in the Carleton College data after ag-gregating over occasions, Bern an d Allenmight reasonably argue that even perfect re-liability will be of little value as long as re-searchers proceed with the nomothetic as-sumption that all traits belong to all individ-uals. Rather, now that adequate reliability isestablished, the search for consistency mustadopt the idiographic stance and must selectfor study only that subset of individuals fo rwhom the particular trait is relevant: Greaterconsistency should be obtained, but only forthose people who are identified as low vari-ability on the trait.To explore this possibility, separate cor-relation matrices were generated for the 19

7/28/2019 Class 5 - Mischel 1982

18/26

CROSS-SITUATIONAL CONSISTENCY 747measures of conscientiousness fo r both thehigh- and low-variability subgroups identi-fied with the Bern and Allen classificationprocedures. The summary coefficients in Ta-ble 6 suggest that the Bern-Allen classifica-tion procedures and approach provide noappreciable gain over the traditional yieldwhen one turns from data based on interrateragreements about the subject's conscien-tiousness to more direct measures of cross-situational consistency in the referent behav-iors. Thus, the mean cross-situational con-sistency coefficients for the low- versushigh-variability subgroups are .11 versus .14using the self-reported variability index and.15 versus .10 using the ipsatized index. Al-though Bern and Allen subtitled their article"The Search for Cross-Situational Consisten-cies in Behavior," it is precisely in this searchthat their technique fails to meet its promise.W hat do these data imply? First, note thatthe current results parallel Bern and Allen'squite closely. In both studies, interrateragreement about the subjects' conscientious-ness was substantial for those students clas-sified as low variability, but this agreementwas not reflected in substantially highercross-situational consistency in their ob-served behavior. W e are left then with a rep-licated paradox: Subjects classified as lowvariability (consistent) tend to show high lev-els of interrater agreement when rated onpersonality indexes by relevant others (in-tuitively implying some consistency), yetthey do not show appreciably higher levelsof cross-situationalconsistency in behaviorth e very data on which their variability judg-ments were presumably based. W e believethat this "replicable paradox" is a key com-ponent of the puzzle that needs to be solvedto help untangle the consistency problem.

Toward a Resolution of theConsistency ParadoxReviewing th e hoary history of the con-sistency paradox, Bern and Allen (1974)"appreciate the sense of dejS vu that mustcurrently be affecting psychology's elderstatesmen now that the 'consistency prob-lem' has suddenly been rediscovered" (p.507). Dejii vu may be experienced no,t onlyat the rediscoveryof the consistency paradox

Table 6Overall Mean Correlations by VariabilityClassification for Correlations ReflectingInterrater Agreements and Cross-SituationalConsistencyType of correlations

Type ofClassificationNomothetic(all subjects)Low variability(high variability)Self-reportedIpsative

Interrateragreement

.52

.68 (.22).56 (.39)

Cross-situationalconsistency

.1 3

.1 1 (.14).1 5 (.10)Note. Correlations fo r high-variability groups are in pa-rentheses.

but also at the contemporary proposals fo rits solution. In one sense, this feeling of fa-miliarity has a comfortable side: It is reas-surring to see continuity in the fundamentalquestions and struggles fo r progress in thefieldof personality. Each of the "better meth-ods" proposals has a long history but also amajor lesson to teach, one that is too oftenforgotten. But if the study of personality isto be a cumulative enterprise in whichknowledge and insight are not merely recy-cled, then these lessons must be distinguishedfrom premature conclusions about theirtheoretical implications. In our view, theconsistency paradox may now be at the brinkof a resolution, if the relevant lessons of ourfield are properly read and integrated witha theoretical reconceptualization fo r whichthe outlines are already available.The State of th e Paradox

Consider again the proposed solutions re-viewed here. First, aggregation over occa-sions, as advocated by Epstein, is a necessarystep that will surely increase the reliability ofmeasures, as the Spearman-Brown Formulahas long predicted. Further aggregation,across response forms and especially acrosssituations, will serve to highlight stable meanlevels of behavior by eliminating the vari-ability due to contexts. We do not doubt theoccurrence of stable means, but we areequally impressed by the occurrence of sub-

7/28/2019 Class 5 - Mischel 1982

19/26

748 W A L TE R MISCHEL AND PHILIP K. PEAKEstantial variance around those means. Sam-pling behavior extensivelyin a domain oftenallows useful predictions of individuals' ag -gregated mean levels of behavior in that do-main. The fact that relevant past behavior isoften the best predictor of future behavior isnot in doubt (Mischel, 1968). Just as the oc-currence of stable mean levels of behavior ina domain does not deny within-person vari-ability, so should the appreciation of suchvariability not be mistakenly read to preemptthe existence of stable personal qualities.But as numerous analyses (including ours)have shown, such increases in reliability ofbehavior measures within specific situations,rather than assuring impressive increases inthe associations between them, suggest againthat th e discriminativeness of behavior is avalid phenomenon rather than a reflectionof poor methodology. It is old wisdom thatthe prediction of single acts is as difficult andgenerally as unlikely as it is challenging. (Yetit would hardly be of trivial interest to be ableto predict single acts like suicide, coronarydeath, and homicide.) Usually w e must settlefor trying to predict an average of repeatedobservations, to infer a tendency within agiven situation. With that goal, one samplesthe behavior of interest multiply in the sit-uation, as we did in the reported Carletonproject analyses. Such sampling of repeatedobservations within a given setting shouldnot be confused, however, with aggregationacross situations in the search for cross-sit-uational consistency. To aggregate cross sit-uationally is to circumvent the issue ratherthan to demonstrate its resolution. The riskis that such aggregation may tempt one tosubstitute self-evident psychometrics (whichmagnify even trivial consistencies by elimi-nating situational variance) fo r more com-plex (and less obvious) psychological analy-ses of the nature of perceived similarities an dappropriate equivalence groupings in the or-ganization of behavior and the constructionof personality.

W e share Bern and Funder's (1978) desirefor a language that can be used to describeboth persons and situations commensur-ately. But we doubt the viability of their ap -proach to this goal as they applied it to theirsearch for consistency in the domain of delayof gratification. Searching to identify the per-

sonality of situations and unraveling thestructure of person-situation interactions issurely a fundamental problem shared by ourfield. The long and rich history in the Meehlcookbook tradition shows that understand-ing situations will require more than iden-tifying th e correlates of performance withinthem. In our view, an adequate resolution ofth e many issues raised in the pursuit of per-son-situation interaction will require a theo-retical reconceptualization of both person-ality and situation constructs themselves, notjust more clever methods fo r applying every-day trait terms to people's behavior in par-ticular contexts. W e believe that such a re-conceptualization will unify the analysis ofperson characteristics with the analysis ofcognitive-learning processes and requires thatth e person and the situation be analyzed inlight of the same psychological principles andnot merely described with the same traitterms (e.g., Mischel, 1973). It also requiresa deeper analysis of the nature of person andsituation categories (Cantor et al., 1982a,1982b; Cantor & Mischel, 1979). In the ab-sence of an appropriate theoretical frame-work, the search fo r consistencies can be-come an ultimately uninteresting hunt fo rstatistically significant coefficients that ne-glects their psychological significance andtheir links to psychologically interesting pro-cesses.Finally, our attempts to identify a subsetof individuals fo r whom conscientiousness\s relevant, and who will thus show appre-ciably more consistency across contexts,guided by Bern and Allen (1974), led to aproblematic conclusion. Like Bern and Al-len, we found clear support that raters agreewell with each other about people who seethemselves as generally consistent with re-gard to the particular dimension. Conversely,raters agree much less about th e attributesof people who view themselves as highly vari-able on the relevant dimension. Less obvious,but more challenging theoretically, is thefinding that people's global perceptions oftheir own overall consistency or variabilityon a dimension do not appear to be closelyrelated to the observed cross-situational con-sistency of their behavior. Although inter-judge agreement w as greater fo r people whosee themselves overall as consistent in con-

7/28/2019 Class 5 - Mischel 1982

20/26

CROSS-SITUATIONAL CONSISTENCY 749scientiousness, cross-situational consistencyin their behavior was not significantly greaterthan it was for those who see themselves asvariable or for the entire group as a whole.This pattern occurred in the Bern-Allen dataas well as in the Carleton data.From Paradox to Paradox?

Our pursuit of cross-situational consis-tency in behavior through the use of the bet-'ter methods proposals has brought us fullcircle. We began by noting the paradox thatexists between our intuitions of consistencyin behavior on the one hand and researchthat documents specificity on the other. Wereviewed and examined the utility of threeof the most popular methodological refine-ments that have been proposed as possiblesolutions to the consistency paradox. Theend result of our conceptual and empiricalendeavors is another paradox that closelyparallels the one we set out to resolve. Theresults of the replication of the Bern andAllen work suggests that raters agree sub-stantially more about persons who identifythemselves as cross-situationally consistent.However, these individualsdo not show sub-stantially greater cross-situational consis-tency in behavior than people identified asmore variable. Here, again, shared intuitionsabout persons do not agree with the data.W e might account for these results in avariety of alternative ways. On the one hand,one might suggest that the replicable paradoxresults from methodological problems com-monly associated with the use of behavioraldata and dismiss the behavioral results, rest-ing the case fo r personality structure on theimpressive findings among the rating data(e.g., Block, 1977). Alternatively, one mightargue that the behavioral data accurately re-flect the complex structure of behavior andthat the substantial interrater agreements re-flectshared theories about persons and otherheuristics that bias our judgments about thecoherences that actually exist in the behaviorof others (e.g., Chapman & Chapman, 1969;M ischel, 1968, 1979; Nisbett & Ross, 1980;Ross, 1977; Schneider, 1973; Shweder &D'Andrade, 1980). Both of these interpreta-tions have some merit. Nevertheless, grant-ing both the methodological problems of be-

havioral data and the existence of cognitiveeconomics, our perceptions of others are stillunlikely to be entirely illusory. They mayderive rather directly from the behavior ofthe individual but not from those aspects thatwe expect or to whichthe consistency debatehas pointed so far.Com ponents for a Reconceptualization

Our approach to understanding the con-sistency paradox isguided simultaneously bya cognitive social-learning conceptualizationof behavior organization (e.g., Mischel, 1973)and a cognitive prototype view of person cat-egorization (e.g., Cantor & Mischel, 1979).First, consider the nature of the regularitiesrevealed from the study of behavioral con-sistency. We read these data as repeatedlyshowing temporal stability more impres-sively than cross-situational consistency.Greater temporal'than cross-situational con-sistency seems sensible from the perspectiveof cognitive social-learning theory. Becausethe contingencies in a given situation oftenremain unchanged over time, stability overtime is expected and predicted in much socialbehavior (e.g., Bandura, 1969; Mischel, 1968).Moreover, from this perspective, temporalstabilitywouldbe expected to the degree thatsuch qualities as the person's competencies,encodings, expectancies, values, and plansendure (Mischel, 1973). The pursuit of du-rable values and goals with stable skills andexpectations for long periods of time surelyinvolves coherent and meaningful pattern-ings among the individual's efforts and en-terprises. The degree ofcross-situational con-sistency, however, might be high, low, orintermediate, depending on many consid-rations, including the structure of theperceived cross-situational contingencies andthe subjective equivalences among the di-verse situations sampled. Distinctive contin-gencies may be expected to occur even inslightly different situations, producing highdiscriminativeness cross situationally. If so,if cross-context behavioral discriminative-ness is the rule rather than the exceptionthe phenomenon rather than the errorthenthe search for consistency across situationswill continue to yield slim results.But then how can weunderstand the other

7/28/2019 Class 5 - Mischel 1982

21/26

750 W A L TE R MI S C H E L ANP PHILIP K. PEAKEside of the consistency paradox, the intuitiveconviction of consistency? Our attempts toemploy Bern and Allen's idiographic solutionto the Carleton College data showed that peo-ple who see themselves as consistent on adimension are indeed rated with greater in-terjudge agreement by others even though(and this is the key point) their behavior doesnot necessarily show appreciably greateroverall cross-situational consistency. Whatare the basesthe ingredientsof the seem-ingly pervasive and shared perception of con-sistency in a personality disposition if theperception is not related to the level of cross-situational consistency in the reliably ob-served referent behaviors? To try to answerthis question, we turn to the cognitive pro-totype approach (e.g., Cantor & Mischel,1979). Guided by cognitive theories of thecategorization of everyday objects (e.g.,Rosch, Mervis, Gray, Johnson, & Boyes-Bfaem, 1976), this prototype approach to thecategorization problem appreciates the real-ity of individual differences but seeks to re-conceptualize the nature of the consistenciesthey reflect in an interactional f ramework(Cantor et al., 1982a, 1982b; Cantor & Mis-chel, 1979). The prototype approach recog-nizes the especially fuzzy nature of naturalcategories and, along the lines first traced byWittgenstein (1953), searches not for any sin-gle set of features snared by all members ofa category but rather, for a family-resem-blance structure, a pattern of overlappingsimilarities. The recognition of fuzzy sets alsosuggests that categorization decision will beprobabilistic, with many ambiguous border-line cases that produce overlapping, fuzzyboundaries between the categories, and thatmembers of a category will vary in degree ofmembership (prototypicality).The cognitive prototype approach appliedto the consistency paradox suggests that con-sistency judgments with respect to a categoryare made not by seeking the average of allthe observable features of a category but bynoting the reliable occurrence of some fea-tures that are central to the category, or moreprototypic. That is, we suggest that consis-tency judgments rely heavily on the obser-vation of central (prototypic) features so thatthe impression of consistency will derive notfrom average levels of consistency across all

the possible features of the category butrather from the observation that some centralfeatures are reliably (stably) present. Fromthis perspective, extensive cross-situationalconsistency may not be a basic ingredient foreither the organization of personal consis-tency in a domain or for its perception.The Construction of Consistency

In accord with the cognitive prototypeview, we propose that the shared globalimpression of trait consistency (in self andin others) arises not primarily from the ob-servation of cross-situational consistency inrelevant behaviors. Rather, we propose thatto assess variability (versus consistency) withregard to a category of behavior people scanthe temporal stability of a limited numberof behaviors that are most relevant (centralor prototypic) to that category for them.Thus, the impression of consistency is basedsubstantially on the observation of temporalstability in those behaviors that are highlyrelevant (central) to the prototype but is in-dependent of the temporal stability of be-haviors that are not highly relevant to theprototype. Conversely, the perceptipn ofvariability arises from the observation oftemporal instability in highly relevant fea-tures.To explore this hypothesis in the CarletonCollege data on conscientiousness, we pre-dicted that people who judge themselvesoverall as consistent across situations (lowvariability on Bern and Allen's, 1974, globalself-report measure) will show greater tem-poral stability but not greater cross-situa-tional consistency than those who view them-selves as less consistent.9 In addition, becausewe believe that the judgment of consistencyis independent of the temporal stability ofbehaviors that are not highly prototypical, wepredicted this difference in temporal stabilityto be more pronounced on the more proto-typic features of conscientiousness than onthe less prototypic features.

9 The measure of global self-perceived consistency w asthe subject's answer to the question (on a 0-6 scale),"How much do you vary from situation to situation inhow conscientious you are about daily matters and re-sponsibilities?"

7/28/2019 Class 5 - Mischel 1982

22/26

CROSS-SITUATIONAL CONSISTENCY 751To test these hypotheses, temporal-stabil-ity coefficients were obtained separately oneach of the behavioral measures employedat Carleton fo r subjects who rated themselves

as high (versus low) in variability. Subjectswho perceived themselves as highly consis-tent rather than as more variable across sit-uations had somewhat higher temporal sta-bility across all the behavior measures. Themean temporal-stability coefficients were .68and .55 for those high versus low in self-per-ceived consistency, respectively, t(32) = 1.82,p < .10 (two-tailed). There were no appre-ciable differences in the behavioral cross-sit-uational consistency of those who saw them-selves as high (r = .11) or low (r= .14) inconsistency.Most interesting, and central to our hy-pothesis, is the linkage between the globalself-perception of consistency and the tem-poral stability of more prototypic behaviors.Ratings of prototypicality were available for17 of the 19 Carleton behavior measures andallowed us to divide these measures into the"more" and the "less" prototypical (at themedian of the total ratings for all items).10Table 7 presents the links between the globalself-perception of consistency and the behav-ioral data, divided into more prototypic ver-sus less prototypic behaviors. The pattern ofresults was exactly as expected by the hy-pothesis. First, consider the more prototypicbehaviors. Students who saw themselves ashighly consistent in conscientiousness weresignificantly more temporally stable on theseprototypic behaviors than those who viewedthemselves as more variable (low variability,mean r = .71; high variability, mean r = .47),t ( 15) = 2.97, p

7/28/2019 Class 5 - Mischel 1982

23/26

75 2 W A L TE R MISCHEL AND PHILIP K. PEAKEThese results support the view that theimpression of consistency in behavior m aybe rooted in temporally stable prototypic be-haviors rather than in pervasive overall cross-

situational consistencies. The findings sug-gest that individuals judge their degree ofconsistency from the temporal stabilityof therelevant, more prototypic behaviors. Iriter-estingly, the tw o groupsjio not differ in tem-poral stability on the less prototypic behav-iors, suggesting that those behaviors do notenter into the judgment of one's variability.It seems then that th e locus of the perceptionof variability may be in the temporal stabilityof highly prototypic behaviors, regardless ofcross-situational consistency. A tendency toovergeneralize from the observation of tem-poral stability in prototypic features to animpression of overall consistency would cer-tainly be congruent with other tendencies togo well beyond observations in social infer-ences and attributions (e.g., Mischel, 1979;Nisbett, 1980; Nisbett & Ross, 1980; Ross,1977; Tversky & Kahneman, 1974).The consistency debate has been aptlycharacterized as reflecting a continuous con-flict between the findings of research and theconvictions of our intuitions (Bern & Allen,1974, p. 508). After reviewing the issues anddata on the debate, Bern and Allen concludedthat "in terms of the underlying logic andfidelity to reality, we believe that our intu-itions are right; the research, wrong" (p. 510).W e hope that our present analysis helps toidentify some of the roots of the conflict andthe routes toward its resolution. We believethat both the intuitions and the research havevalidity, but they are based on different data.The intuitions of cross-situational consis-tency are grounded in data, but these data,we suggest, are not highly generalized cross-situational consistencies in behavior: Rather,we propose the intuitions about a person'sconsistency arise from the observation oftemporal stability in prototypical behaviors.The error is to confuse the temporal stabilityof key behaviors or central features with per-vasive cross-situational consistency and thento overestimate the latter, a common ten-dency hardly confined to the layperson. Ourcompelling intuitions are based on consis-tencies in behavior, but perhaps not on theconsistencies that the debate pursued for somany years.

The consistency paradox has been a puz-zling and persistent barrier in the search forpersonality structure. But a theory of per-sonality structure does not require everyoneto be characterized by high levels of pervasivecross-situational consistency in behavior. Itdoes require a structure fo r behavior: TheCarleton data suggest to us that such struc-ture may be rooted in the occurrence of tem-porally stable but cross-situationally discrim-inative features that are prototypic for theparticular behavior category as perceived bythe particular person. A close analysis of thepatterning and organization of such featureswithin individuals should be most interest-ing, and w e plan such an analysis. W e expectthat the most consistent and prototypic ex-emplars of a category like conscientiousnesswill be those individuals who stably exhibta numberbut not necessarily manyof itsprototypic features, as they themselves definethat prototypicality. We expect that the par-ticular constellations of features will be idio-graphically patterned so that no individualsnecessarily share the identical configuration,although considerable between-person over-lap occurs. Although the exact pattern thatdefines conscientiousness may not be iden-tical for any two persons, each individualwho is characterized as consistently conscien-tious will display some of its features withtemporal stability, albeit with a distinctivecross-situational constellation.12 If so, theroute may be open not only fo r seeing theuniqueness of each personality (which per-sonologists have long appreciated) but alsofor ultimately understanding its commonstructure.

ConclusionsIt is tempting to tire of the consistencydebate, to trivialize it by focusing on its ob-

12 If these hypotheses prove valid, they also would helpone understand why the issue of cross-situational con-sistency is a more serious problem for the nomothetictrait psychologist than for the laype

Class 5 - Mischel 1982

Documents

Transcript of Class 5 - Mischel 1982