ResnickBergerSystemModel

8/8/2019 ResnickBergerSystemModel

1/60

Created by Educational Testing Service (ETS) to forward a larger social mission, the Center for K 12 Assessment &

Performance Management has been given the directive to serve as an independent catalyst and resource for the

improvement of measurement and data systems to enhance student achievement.

pyright 2010 Wireless Generation, Inc. and Institute for Learning.. All rights reserved. No reproduction, use or distribution of any part of this material without the specific authorization of Educational Testing Service. 1

An American

Examination System

Lauren B. Resnick and Larry Berger


2/60


3/60


4/60

National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management

4

The Problem

Overthepasttwodecades,ourcountryhasbeentryingtobuildastandardsbasedaccountability

systemasafoundationforamoreequitableandhigherachievingeducationsystem.Inpractice,

however,we

have

created

a

test

based

accountability

system

that

does

not

reflect

the

standards

we

aimedforatthebeginningofthe1990s,muchlesstodaysfewer,clearer,higherCommonCore

Standards.

Severalstudies,usingseveraldifferentmethodologies,haveshownthatthestatetestsdonotmeasure

thehigherorderthinking,problemsolving,andcreativityneededforstudentstosucceedinthe21st

century.Thesetests,withonlyafewexceptions,systematicallyoverrepresentbasicskillsand

knowledgeandomitthecomplexknowledgeandreasoningweareseekingforcollegeandcareer

readiness.2

Themisrepresentationofstandardsbymostcurrentaccountabilitytestshashadnegativeeffectson

teaching

and

learning,

especially

for

poor

and

minority

students.

The

tests

carry

consequences,

and

manyeducatorsservingpoorstudentsaimtoraisetestscoresinthemostdirectinsomecases,the

onlywaytheyknow:Theyprovidepracticeonexercisesthatsubstantiallymatchtheformatand

contentoftheirstatesendofyearaccountabilitytests.Theseexercisesoftendepartsubstantiallyfrom

bestinstructionalpractice.Somestudieshavedocumentedasystematicdeclinefromfalltospringinthe

qualityofinstruction.Inreading,forexample,thecomplexityoftextsthatstudentsengagewithis

lowerinthesameclassroomswiththesamechildreninMarchthaninOctober.Andthereisless

discussionoftextandwordmeaningasteachersdirectchildrenthroughworkbookexercisesthatmimic

statetestitems(Anagnostopoulos,2003;Koretz&Hamilton,2006;McNeill,2002).Principalsanddistrict

administratorsencouragethispractice.Theyintroduceinterimassessmentsthatlargelymirrortheend

ofyear

tests

rather

than

model

the

kinds

of

performance

intended

by

the

standards.

They

do

this

becausethetestscount,andtheyareafraidthatwithoutpractice,studentswillnotdowellenoughto

meetadequateyearlyprogress(AYP)requirements.

Callsnowaboundforevenmorefrequenttestingandforfocusingteachersattentionearlyandoftenon

whichitemstheirstudentsarehavingdifficultyansweringontheinterimassessments.Butunlessthe

processisguidedbyafundamentalunderstandingofwhatkindofteachinghelpschildrenacquire

robustcompetence,weshouldnotbesurprisedwhenthemostfrequentresponsetoweakearlytest

scoresistopracticethetest.Thoughnooneintendedtodoso,wehavecreatedatestingbindthat,asit

tightens,drivesattentionawayfromtheintendedstandards.Theeffectsaregreatestinthepoorest

schools.Thenationscurrentapproachtoraisingachievementandincreasingequityintheeducation

systemis

having

an

effect

opposite

from

the

intended

one.

It

is

trapping

poor

children

in

a

basic

skills

teachingprogramthatgivesthemlittlechancetoacquirethedeeperknowledgeandabilitiesweseek

foreveryone.Anditmaybeloweringthelearningopportunitiesevenformanymoreprivilegedchildren

asschoolsturntheirenergiestothetestbasedbasicskillsprogram.

2Theproblemcannotbefixedbychangingcutscoressothatstatesnolongerdeemasbeingproficienttest

performancesthatbarelymeetNAEPstandardsforbasiclevelsofachievement.Thetestsarefundamentally

misalignedwith21st

centuryexpectations.Forananalysis,seeResnick,Stein,andCoon(2008).


5/60


5

Manyeducators,parents,andcitizenshaverespondedbyclamoringforanendtotestbased

accountability.WitnesstheonesidedreactiontoarecenteditorialintheNewYorkTimeswrittenby

SusanEngel(2010)callingforlesstestingandmoreplay(andbyimplication,lessdirectinstruction)for

children.Astreamofsupportivecommentarybyreadersensuedbutnoneexpressingconcernabout

howto

educate

poor

children,

minority

children,

or

English

language

learners

to

college

ready

levels

of

achievement.MostofthechildrenofresponderstoEngelsarticlewouldnotbeharmedmighteven

benefitbyaweakenedaccountabilitysystem.ButtheotherstheonesnoonespokeforintheNew

YorkTimesexchangecouldloseeventheslenderchanceswenowofferthem.

A Solu t ion

Testingandaccountabilityshouldremainattheheartofnationaleducationpolicy.Equityandnational

prosperitydependonasystemthatwillstretcheducators,theeducationsystem,andcommunitiesto

worktowardhighachievementandthatwillenableclearaccountabilitywhenachievementgoalsare

missed.Butthereshouldbenewformsofassessment,functioninginnewwayswithintheeducation

system,tomeettheneeds.Asearlyas1992,scholarsshowedhowinmanycountriesoftheworld,

tightlylinkedexaminationandcurriculumsystemskeptaspirationshigh,guidedteachersintheirwork,

andsometimescreatedpathwaysforyoungpeoplewhodidnotcomefromprivilegedfamilies

(Resnick&Resnick,1992).Thesecretlayinchargingteacherstopreparetheirstudentsforexamsand

makingsurethattheexamswereworthstudyingfor.Forthesystemtowork,teachersandstudents

neededtohavearoughideaofthekindsofquestionsthatwouldbeposedontheexamsalthoughnot

thespecificquestionsthatwouldappear.Thesystemsalsorequiredtrustthatexamgradeswouldbe

fairthatis,studentswouldlikelyreceivethesamegradenomatterwhoscoredtheirwrittenwork

(writtenessayspredominatedovershortanswerandmultiplechoiceitemsbecausethecountries

valuedthekindsofthinkingthatweredisplayedinsuchessays).Systemsforcheckingongradefairness

(andallowingchallengesinafewcases)variedamongthecountriesstudied,butallfoundwaysof

maintainingpublictrustinthesystem.

Inthispaper,weoutlineanAmericanExaminationSystem,onethatreflectskeyaspectsofthe

substantive,cognitivelydemandingEuropeansystems,whilemaintainingstandardsofpsychometric

rigornecessarytosupportAmericasaccountability,comparability,andequityagendas.

TheAmericanExaminationSystemwehaveinmind:

modelsthekindsofinstructionthatarevaluedsothatpreparingstudentsforassessmentworksforratherthanagainsthighcognitivedemandinstruction;

situatesexamswithinthestreamofongoinginstructionsothatassessmentssupportteachingratherthandistractfromit;

ensurescontentandinstructionalvalidityofallassessmentssothatthealignmentproblemsthathaveplaguedstatetestingsystemscanberesolved;

providesreliableandvalidaccountabilitymeasuresforstudent,school,andeducatorperformance;

includesdiagnostictoolsforinstructiontomeetindividualstudentneeds;


6/60


6

leveragesadvanceddatacollectionandcomputationalresourcestomasspersonalizetheformativeassessments,improvingtheirprecisionandusefulness.

TheAmericanExaminationSystemweoutlinewouldbeeducativeforthosewhouseit.Itwouldnotjust

tellus

how

well

students,

teachers,

and

schools

are

performing,

but

also

teach

teachers

how

to

teach,

teachstudentshowtolearn,andteacheducationorganizationshowtodevelopteachingexpertise.It

wouldmeetthiseducativegoalthroughasystemthatcombinesdistributedaccountabilityexamslinked

tospecifictopicsforinstructionwithdiagnostic,formativeassessmentsdesignedforteacheruseduring

instruction.

Anonlineplatformwillmakeitpossibletodeployandmanagealloftheseelementsatscaleinacost

effectivewaywhileminimizingadditionalburdensforteachers,students,andadministrators.This

onlineplatformwouldbemuchmorethanasystemforadministering,scoring,andreportingon

assessments.Itcansurroundthewhatofassessmentoutcomeswithusefulrepresentationsofsowhat?

(professionaldevelopment)

and

now

what?

(more

targeted

instructional

resources)

so

that

everyone

focusesontheconsequentialandinstructionalvalidityofassessmentresultsandnotjustthe

accountabilitypressure.

Distributed Accountability Exams (DAEs)

Accountabilitydatainthissystemwouldbederivedfromexamsthatareadministeredatintervals

throughouttheschoolyear,occurringafterstudentshavecompletedaunitofstudyonparticular

contentandskillsasidentifiedintheCommonCoreStandardsandstatestandards.Accountabilitydata

wouldbereportedonthebasisofindividualstudent,subgroup,class,school,anddistrict,aswellas

acrossclasses,schools,anddistricts.Thetypesoftasksontheexamswouldbelargelyfamiliarto

students,who

would

have

worked

on

similar

tasks

in

the

course

of

instruction.

But

neither

teachers

nor

studentswouldknowpriortotheDAEexactlywhatquestionswouldappear.Basedonwhatisrequired

fromthenewCommonCoreStandards,weexpectthreetofiveDAEsperyearinmathematicsand

literacyateachgrade,witheachexamassessingmaterialcoveredthrough37weeksofinstruction,but

thespecificsofnumberandtimingwouldneedtobeworkedoutwithstates.

TheDAEswouldmodelthekindofhighcognitivedemandperformancesintendedbytheCommonCore

Standardsandrigorousstatestandards,aswellastestbasicproceduralskills.Inliteracy,theywould

includeextendedwrittenworkandotheropenendedexpressionsofstudentreasoningandthinking;in

mathematics,theywouldincludedrawings,graphs,mathematicalexpressions,andexplanations.They

would

assess

basic

knowledge

both

within

these

constructed

performances

and,

where

appropriate,

in

clustersofmultiplechoiceitems.Inadditiontomodelinghighcognitivedemandinstruction,theDAEs

wouldreflectwhatshouldbetaught(specifictopicsdeterminedbystateandCommonCoreStandards).

TheCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystemthatis

closelytiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecoregrade


7/60


7

levelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.3Theyarespecified

atagranularsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly

meaningfulassessments.

Tasks

or

items

for

the

DAEs

would

be

pre

tested

and

calibrated

using

standard

classical

and

multi

dimensionalitemresponsetheory(IRT)frameworks.Inaddition,eachDAEwouldundergoarigorous

processofestablishingcontentvalidityandinstructionalvalidityprocessesthattesttheoryoftencalls

forbutarenotpartofstandardprocedureinmostinstancesofeducationtestdesign.Astheproject

matures,taskswouldbecollectedintoitembanksforuseinfutureconstructionofDAEs.Informationon

studentperformancedata,instructionaltargets,andtheformsofinstructionthatresultreliablyin

studentlearningwouldbesharedwithstakeholdersincludingparentsandstudents,teachers,schools,

testingadministrators,andthoseresponsibleforpreparingandselectingteachers.

Ideally,everystudentwouldtakeeachDAEwhenheorsheisreadyandnotbefore.Sothelongterm

goalshouldbetohavesufficientalternateexamsthatstudentshavemorethanonechancetotakean

exam(as

they

do

for

New

York

State

Regents).

Attheoutset,amorelimitedsetofequivalentexamstwoversionsofeachDAEwouldbedeveloped.

Thetwoversions,oneadministeredbeforeinstructionandoneafterwards,wouldbeusedbythe

assessmentdeveloperstoestablishinstructionalvalidityoftheexams.Availabilityofmultipleformsof

theDAEswouldallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,along

withteacherandschooleffectiveness.Inaddition,preinstructionresultscouldbeusedbyteachersas

partoftheformativedatatheyusetoplananinstructionalunit.

Figure1,adiagramofhowtheDAEsmightprogressthroughtheschoolyear,showshowDAEsinteract

withformativeassessments(describedinsubsequentsections)thatarealsointegratedintothesystem.

Figure1.ExampleofHowDistributedAccountabilityExams(DAEs)MightProgressinaSchoolYear.

3Someofthelearningsequencesinthestandardsarebasedonresearchconductedbymultiplescholarsoverthreedecades.

Othersarebasedonwellhonedintuitivejudgmentsbyexpertscholarsandpractitioners.Allwillrequirefurthervalidationin

useoverthecomingyears.Whatisnewandimportantinthecurrentcorestandardseffortisthatthestandardsareorganized

intomultidimensionalsequencesoflearningthatcaninformbothassessmentandinstruction.


8/60


8

ThesixthandseventhgradeCommonCoreStandardsformathematicsspecifyfivecontentareas:

RatiosandProportionalRelationships TheNumberSystem ExpressionsandEquations Geometry(insixthgrade,PropertiesofArea,SurfaceArea,andVolumeareexplicitly

named)

StatisticsandProbabilityTheRatiosandProportionalRelationshipssectionforsixthgrademathematics(seeAppendixA)includes

twoparallelsetsofstandards,oneforMathematicalUnderstandingandoneforMathematicalSkill.In

addition,thereisasetofstandardsforMathematicalPracticethatthestandardswritersintendtoapply

atall

grade

levels,

although

it

is

understood

that

the

student

performances

representing

good

mathematicalpracticewilllooksubstantiallydifferentatdifferentage/gradelevels.OurDAEswould

provideavalidandreliablepictureofhowstudentsareprogressingontheMathematicalPractice

standardsaswellasonthespecificcontentstandards.

Figure2displaysthesixthgradestandardsinavisualizationwecallthehoneycombthatspecifiesour

hypothesesabouttheinterdependenciesamongthem.Thehoneycomb,whichwedescribemorefully

below,servesasavisualrepresentation(interactivemap)oftheinstructionalandassessmentspacethat

needstobetraversedinallgrades,includingthesixth,andalsoasaframeforassemblingdataon

studentperformanceinamannerthatwillsupportinferencesabouttheprogressofindividualstudents,

classesof

students,

schools,

and

school

districts.

Takentogether,theMathematicalUnderstanding,MathematicalSkill,andMathematicalPractice

standardsinformandconstraintheassessmentsthatwouldbebuiltfortheDistributedAccountability

Exams.Assume,forpurposesofdevelopinganexample,thatthesixthandseventhgrademathematics

teachingwillbedividedintofiveunitsofinstruction,oneunitforeachofthefivecontentareas.One

wouldthusneedfivecontentspecificexamsinmathematicseachyearforsixthandseventhgrades.The

exams(liketheinstructionalunitstheyreference)mightnotbeofequallength,becausesomeofthe

standardscovermorematerialthanothers.Butweenvisionexamsof40to75minutesinlength,each

gearedtoateachingunitof3to7weeks.

An

example

of

an

exam

covering

the

sixth

grade

unit

on

Ratios

and

Proportional

Relationships

is

includedinAppendixA.


9/60


9

Figure2.VisualizationofSixthGradeStandardsasaHoneycomb.

An Engl i sh Languag e Ar ts Exam ple

TheEnglishlanguagearts/literacystandardscansimilarlybeusedtospecifysequencesofinstructional

unitsandassessments.Thecorestandardsareorganizedingradebandsratherthangradebygrade.As

inmathematics,skillsandunderstandingsareexpectedtodevelopovermultipleyears.Inaddition,

guidelinesexistforchoosingtextsthatusemodernquantitativemethodstocharacterizethecognitive

andlinguisticcomplexityofwritinginseveraldifferentgenres.

UsingalloftheseresourcesoftheCommonCoreStandards,wehavesketcheddistributedexamsfor

Englishlanguagearts;oneexampleofanexamisinAppendixB.

Validity and Reliability in Distributed Examinations

TheDAEswouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.Eachexamwouldprovide

areliableestimateofstudentknowledgeonthecontentofaninstructionalunitthatisexplicitly

targetedtoastandard,orsetofstandards,intheCore.Thecollectionofexamscoresforayear(e.g.,

fivemathematicsexamsineachofGrades6and7)wouldprovideavalidestimateoftheextentto

whichastudent(class,school)hasmasteredthecontentspecifiedbythestandardsforthatyear.


10/60


10

Cont ent Va l id i t y

Theexamswouldmatchclosely,inbothcontentandform,thecontentthatisexpectedtobetaughtin

eachoftheinstructionalunits.Newinstructionalunits,explicitlylinkedtotheCorestandards,wouldbe

created

to

anchor

the

content

validity

of

the

units.

Teams

of

independent

content

and

instructional

expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsandareof

highinstructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodel

instructionalunits.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandards

thatnowtroublesmanystateassessments.(Stateswouldnot,however,berequiredtousethemodel

instructionalunitsintheiractualclassrooms.)

I ns t ruc t i ona l Val i d i t y

Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality

instructiononthecontentoftheassessment.Althoughinstructionalvalidityispartofthegoldstandard

for

educational

testing,

it

is

almost

never

established

in

current

assessment

practice.

We

can

do

better.

Wewillapplystrategiesofinvivo(liveclassroom)researchdevelopedbythePittsburghScienceof

LearningCenter.Thesescientific(experimentbased)researchstrategiescanbeusedtoestablish

whethereachparticularDAE,infact,respondstogoodteaching.Statesandschooldistrictsusingthe

DAEsystemwouldbeabletovalidateDAEsagainstbestpracticeinstructiondevelopedbytheirmost

effectiveteachers.

Rel iabi l i ty

DAEswouldcontainamixofshortconstructedresponseitemsandmoreextendedwrittenresponses,

alongwithsetsofmultiplechoiceitemsasappropriatetothestandardbeingexamined.Shortandlong

constructed

response

components

would

require

human

scoring.

Research

has

established

that

when

constructedresponsetasksarewelltargeted,scoringrubricsarespecific,andgradersaretrained,ahigh

levelofinterraterreliabilitycanbeattained(Mariano&Junker,2007;Patz,Junker,Johnson,&Mariano,

2002;Rayn&Shepard,2008).

Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut

notbythestudentsownteacher)orbygeographicallyandsociallyremotescorers(includingteachers

elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods

thathavebeenusedinEuropeancountries(e.g.,crossschoolorcrossstategradingexercises;re

gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationingradingexamsandthe

relatedvalidationexercises(someofwhichcouldbefacetoface)createsagoodprocessfor

professionallearning,

one

that

many

countries

use.

DAEsopenthepossibilityforincreaseduseofconstructedresponsesbecausetheyaredistributedover

thecourseoftheyear,yieldingseveraltimesmoreopportunitytocollectdatathancurrentendofyear


11/60


11

tests.Thisalsobringsbenefitsintermsofincreasedtestreliability.4Yettoobtainthesemorereliable

results,studentswouldnothavetositfora5hourexamoreventakeanendofyearexam,depending

onhowaparticularstatesystemisdesigned.Theyjustwouldhavetotakeunitexamsastheynormally

wouldinthecourseofteaching,butnowwiththeunitexamcontributingtoanoverallaccountability

score.

Inaddition,weproposetouseearlierassessmentdatatohelpproducemorepreciseproficiency

estimatesforeachDAE.Thisapproach,similartowhatisusedinsomeonlinetutoringsystemsand

adaptivetestingsystems,couldmakeitpossibletoshortenmanyoftheassessmentswithnolossin

measurementprecision(seeAppendixC).

Distributedcontentandinstructionallyvalidatedexamsareanextlogicalstepinendingthetestingbind

anddevelopinganassessmentsystemthatwilldetectandrewardhighquality,effectiveteaching.

Insteadofsupportingtheuseofpracticematerialsthatmimictheoldendofyeartests,statescan

providehighqualityinstructionaltoolsthathelpteacherspreparestudentsforDAEexaminations.5

Therewill

be

no

need

for

the

current

crop

of

interim

tests

that

simply

mirror

the

end

of

year

test,

since

DAEsandrelatedformativeassessmentswilloccurthroughouttheschoolyearattimesthatmake

instructionalsense.Withthissystem,wegainabilitytomeasureasetofhigherorderskillsthatarenot

otherwiseeasilytested,includingskillsessentialtocollegeandcareerreadyperformanceinreading,

writing,andmathematics,withoutaddingenormousburdenoftesting.

Educative Formative Assessments

TheAmericanExaminationSystemwillfosterarichenvironmentofformativeassessmentsthatare

educativeinwaysthatdirectlyresemblethesummativesystem,butwithmoredirectapplicationto

dailyandweeklyinstruction.

TheywouldbealignedwiththelearningtrajectoriesderivedfromtheCommonCoreStandards,andthusalignedwithwhatteachersneedtoteach.

Theywouldmodelapproachestohowtoteach,andwould,attherequestofeducators,provideteachersstructuredopportunitiesforgainingexperienceinusingthoseteaching

methods.

Teacherswouldmaketheseassessmentspartoftheirinstructionalroutine,ratherthanan

addition

to

it.

Data

entry/record

keeping

burdens

will

be

minimal,

and

teachers

will

haveeasyandquickaccesstostudent andclasslevelreportingaswellastoolsto

4Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEstakentogetherwouldbe

5*(0.7)/(1+4*0.7)=0.92.If,instead,halfofeachDAE'stestingtimewereusedforapretestonthenextinstructionalunitor

simplyforcalibratingfuturetestitems,theimprovementwouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.5Foradescriptionofapproachestoprovidingthiskindofinstructionalguidanceinformsthatdonotsuppressteacher

ingenuityandjudgment,seeResnick(inpress)andMcConachieandPetrosky(2009).


12/60


12

understandtheinstructionalsignificanceofthatdata.Bytrackingfidelityintheuseof

thesediagnostictools,thesystemwillhelpteacherstousethemappropriately.

Formativeassessmenttasksthatcannotbemachinescoredwillbeaccompaniedbysimplerubricsfor

quicklyanalyzing

the

student

work.

Teachers

will

be

able

to

use

digital

devices

to

record

these

analyses.

Throughthosedevices,theteacherswillalsobeprovidedwithsamplesofanswersthatcorrespondto

eachlevelontherubric,tohelpthemcalibratetheirownanalyses.Asaformofprofessional

developmentandtoimprovethereliabilityofanalyses,teacherscouldalsouploadthestudentwork

intothesystem,alongwiththeiranalyses,togetfeedbackfromotherteachersorsubjectmatter

experts.

A Mathematics Example

Educativeformativeassessmentsinmathematicswillbedesignedrecognizingthatcognitively

demandingtaskscantypicallybesolvedinmanydifferentways.Fromadiagnosticperspective,itcanbe

asimportant

to

know

how

a

student

is

attempting

to

solve

a

problem

as

it

is

to

know

his

or

her

answer.

Theproblemsolvingtechniqueis,inmanycases,partofwhatisspecifiedinthestandards.The

sequenceofhowthesetechniquesareusedovertimewilloftenindicateastudentsprogressin

understandingconceptsandmovingalongalearningtrajectory.SotheEducativeFormative

Assessmentswouldincludeitemsthatcapturethisinformationandempowerteacherstolearnto

recognizethedifferentapproachesthatstudentstakeandtheirsignificancefordifferentiated

instruction.

AnexampleofthisapproachistheOngoingAssessmentProject(OGAP)6inmathematics,aframework

andsystemforanalyzingmathematicalreasoningofelementaryandmiddleschoolstudentsasthey

solveproblems.

Teachers

analyze

written

student

work

looking

for

evidence

of

mathematical

reasoning

andincreasinglevelsofsophisticationasstudentsprogressalonglearningtrajectories.Thediagnostic

andinstructionalutilityoftheitemsareenhancedbyexaminingthethinkingandstrategiesthatwent

intosolvingthem.Feweritemscanbeusedtoproducefarricherresultsbecausetheunderlyingthinking

issurfacedandmadeapparenttotheteacher.Figure3illustrateshowteacherfacingsoftwareenables

quickanalysisandrecordingofmeaningfulattributesofstudentworkcorrectnessofresponse,

sophisticationofthereasoning(alongatrajectoryfromadditivetransitionalmultiplicativestrategies),

andanyerrorsormisconceptionsthatemerge;thesetoolsandinterfacescanalsosupportremote

analysiswhenstudentworkisdigitizedandrouted.

Ingeneral,itwillbeessentialtoensurethatformativeassessmentresultsarenotincludedin

accountabilityreportingtoeliminatetheincentivesformisuse.Weenvisionthatthestudent,class,

andschoollevelresultswouldbeavailabletoteachers,coaches,andperhapsprincipals(toinform

professionaldevelopmentaswellasinstruction),butnottodistrict/stateadministrators.

6OGAPwasdevelopedasapartoftheVermontMathematicsPartnershipfundedbytheU.S.DepartmentofEducation(Award

numberS366A020002)andtheNationalScienceFoundation(AwardnumberHER0227057).


13/60


13

Figure3.TeacherFacingSoftware.

However,metricsoffidelityinimplementingtheformativeassessments(andtheirassociated

instructionalrecommendations)couldbeusedaspartofteacher/schoolperformance

management/accountability.For

instance,

are

teachers

doing

progress

monitoring

with

the

frequency

appropriateforeachstudent,giventhelongitudinaldataaboutthatstudent?Principalsand

district/stateofficialsshouldhaveaccesstothistypeofinformationinrealtime,sotheycanspotwhere

theremaybeweakinstructionalcapacityandprovidetimelyinterventions(includingtargeted

professionaldevelopment).Theywillwanttospotifteachersareusingtheformativesystemthewayin

which,andasoftenas,itshouldbeused.(DCpublicschoolsisanexampleofaschoolsystemthatis

alreadyusingthesetypesofformativeassessmentmetricsaspartofitsSchoolStatapproachto

continuous,districtwideperformancemanagement.)

Inaddition,theAmericanExaminationSystemplatformwouldprovidetoresearcherslongitudinaldata

includingformative

assessment

data,

organized

by

student/teacher/school/subgroup.

7

Inparticular,

this

datawouldbeusedaspartoftheresearchtosupportcontinuousimprovementofthesystem:tofine

tunethelearningtrajectories,measuresofproficiencyforeachstandard,andalgorithmsformass

customizationofassessments.

7Alldatawouldbeanonymoustoprotectprivacy(andpreventtheformativedatafrombeingusedforaccountability).

ResearcherswillbeabletoseethatStudentAhadTeacherXinSchoolYandseedataavailableforA,X,andY,butnotthe

identityofthoseindividualsandinstitutions.


14/60


14

Weexpectthatformativeassessmentfidelitydatawillbeespeciallyusefultoresearchers.Many

instructionalinnovations,whentestedunderrealclassroomcircumstances,failtoshowimpact:

researcherswonderwhetherthelackofresultswasbecauseofpoordesignorsimplybecausethe

teachersdidnotimplementitcorrectly.Inthefieldoflearningresearch,scholarsarepointingtothe

needfor

researchers

to

distinguish

between

poor

design

and

poor

implementation.

They

make

the

comparisonwithpharmaceuticaltrials,whereaprerequisitefortestingmedicalefficacyisknowing

whichofthetrialpatientstookthecorrectdosage(Rowan,Correnti,Miller,&Camburn,2009).

A New Paradigm for Educational Measurement: Adaptive Mass

Personalization

Webelievethatanadvancedmodelofeducationalmeasurementcanbebuiltonafoundationof

gatheringanorderofmagnitudemoredatabothinformalandformalabouteachstudentinthe

courseoftheyearsothateachtestmerelyenhancestheresolutionofapicturethatissubstantially

completebefore

each

test

begins.

Moreover,

by

applying

the

tools

of

mass

personalization

already

so

prevalentinInternetbasedcommerceandsocialnetworking,wewilleventuallybeabletopersonalize

eachassessmentattheindividuallevelsothattheenhancedresolutionitprovidesistargetedtoan

individualstudentscurrentlearninglevelaswellastoappropriatestandardsofreliabilityandvalidity.

Thatis,thesystemcankeepaskingquestionsuntilitknowsenoughtobeinstructionallyhelpfultothe

studentandtheteacheranduntilitknowsenoughtosupportrelevantpolicyandaccountability

decisions.

Stan dard izat ion Versus Personal i za t ion

Standardizationwastheengineofthefactorymodelthatdrovetheeconomyofthe19thand20th

centuries(Resnick

&

Resnick,

1977,

1980).

Now

the

powerful

drivers

of

the

economy

are

personalization

andcustomizationoftenappliedindirectcontradictiontoapreviouslyvaluedstandardizedoffering.

Amazon.com,forexample,learnswhatyouliketoreadandoffersanincreasinglypersonalized

bookstorejustforyouthatbecomesmorepreciseovertime.ThevideorentalchainNetflixhasnow

hostedseveralinternationalcompetitionsforimprovingtheirpersonalizationengine.

ThestatisticalenginesunderlyingpersonalizationontheWorldWideWebaredistinctfromthose

underlyingstandardizedtesting,buttheyarenowentirelyrobustandprovenindeedtheyaretested

andrefinedonadailybasisinlargescalecommerce,largescalemedicalresearch,andfinancialmarket

predictions.

It

is

time

to

bring

these

ideas

to

education

in

ways

that

will

dramatically

improve

the

precision

with

whichourformerlystandardizedtestsfulfilledtheirstandardpurposes,whilesimultaneouslyexpanding

theirusefulnesstoinformdailyinstruction,todiagnoseindividualpatternsinstudentlearning,andto

surroundstudentswithsupportsthatarepersonalizedtotheirneeds.

BecausetheAmericanExaminationSystemaimstoadministeralltypesofassessmentsforaverylarge

numberofstudentsoveraperiodofmultipleyears,acrossmultiplestates,andcantakeaccountof

variousothereducationdata,itshouldbeabletoserveasanengineformasspersonalizationofthese


15/60


15

assessments.Attributesthatcouldbethebasisofpersonalizationincludepaststudentperformanceon

assessments,teacherandschoolcharacteristics,aggregatedassessmentperformanceofstudentsina

school,previouseffectivenessofteacher,whichcurriculumwasused,andwhichassessmentshavebeen

used.Thistechnologyisscalablecomputingpowerissuchthatthereisnopracticallimitonthe

amountof

education

data

that

could

be

includedso

that

as

more

states

and

more

types

of

data

are

included,themoreprecise(anduseful)thecustomizationbecomes.

Thisinitialgoalformasspersonalizationwouldbetoapplyittoformativeassessment.Thereare,

alreadyinuse,manymodalitiesofformativeassessment(diagnostic,progressmonitoring,screening),

eachincludingamixofassessmenttypes(multiplechoice,constructedresponse,observation).Someof

thesearebestdeliveredaspartofgroupactivitiesandsomeoneononebetweenasinglestudentand

teacher.Manyteachers/districtsuseablendoftheseformativeassessments,whichmakessensegiven

thediverseneedsofparticularstudentsatdifferentmomentsoftheiracademicdevelopment;butmany

otherteacherswhoarenotthemselvesexpertsinformativeassessmentmethodologiesstruggleto

decidehow

best

to

integrate

all

of

these

choices

into

their

teaching

routines

for

their

particular

students.

So,inadditiontoprovidingneweducativeformativeassessments,theAmericanExaminationSystem

wouldmasscustomizeamuchwiderrangeofformativeassessmentsatthestudentandclasslevel.This

isadaptiveassessmentatthelevelaboveindividualitemsitfiguresoutwhichformativeassessmentto

giveandwhenenablingteacherstogetjusttherightnextpieceofinformationtheyneedabouttheir

students,withoutwastingalotofclassroomorotherschooltime.Withthisplatform,teacherswillbe

blendingmodesofassessmentinindividualizedwaysvaryingwhatdatatheycollectandhowbased

onwhatisknownsofarabouteachstudent.Tosupportthis,thesystemwillhostabankofformative

assessmentmaterials,tocoverthefullrangeofdiagnosticoptionsastateorschooldistrictwishesto

use,from

open

source

or

commercial

sources.

ThemasspersonalizationprocesscanalsoaddtothereliabilityandefficiencyofDAEs.AppendixC

showshowastandardstatisticalmodelcanusedatafrompreviousDAEstomakethenextDAEmore

efficient,aslongasthestudentisbehavingconsistentlyfromoneunittothenext.Ifthestudentseems

tobeperformingunusuallywell(orpoorly),thenthemodelcandetectthisandsuggestacustomization

oftheDAEtofurtherexplorewhatthestudentknowsandcando.

The Assessment P latform

TheassessmentPlatformmanagesbothpartsofthesystemtheDAEsandtheeducativeformative

assessmentstoenable

assessment

delivery,

scoring,

reporting,

and

analysis.

Based

on

widespread

classroomexperiencewithexistingproductsandoncurrentdesigns8(someofwhichhavebeenfunded

bytheGatesFoundation),itwillbeablehandlealloftheseelementsatscaleinacosteffectiveway,

whileminimizingadditionalburdensforteachers,students,andadministrators.

8TheauthorswishtoacknowledgethesupportoftheGatesFoundationinconceptualizinganextgenerationassessment

platformandformoregenerallyadvancingthefieldofalignedunitsofcurriculumandassessment.


16/60


16

Honeycomb

TheAmericanExaminationSystemwillprovideahoneycombaninteractivemapoflearning

trajectoriesandourhypothesesaboutthedependenciesamongthem.Thehoneycomboffersavisual

representation

of

the

instructional

and

assessment

space

that

needs

to

be

traversed

in

each

grade

as

wellasacrossgrades,allthewayfrompreKthroughGrade12.Itprovidesaframeforassemblingdata

onstudentperformanceinamannerthatwillsupportinferencesabouttheprogressofindividual

students,classesofstudents,schools,andschooldistricts.Itwillalsosupportresearchtovalidate/refine

thehypothesesaboutdependenciesamongtheskills(withinandacrosstrajectories)intheCommon

CoreStandardsandsimilarstatestandardsforinstance,identifyingwhatlevelofwhichspecificliteracy

skillsareneededtoachievemasteryofwhichmathematicsskills.

TheAmericanExaminationSystemwouldgiveeducatorssummativeandformativeassessmentsforeach

skillstepalongeachlearningtrajectory,startingwithmathematicsandliteracyforGrades310.Other

assessmentdataforinstance,existingformativeassessmentsforpreKthroughGrade3studentsor

highschool

examscan

also

be

mapped

onto

the

learning

trajectories.

All

of

this

data

can

be

included

in

thehoneycombsothatteachers,parents,andthestudentsthemselvescantrackindividualstudent

progress(andextenttowhichstudentsareontrack)towardcollegeandcareerreadiness.

ThehoneycombbuildsononeoftheintrinsicadvantagesoftheAmericanExaminationSystem,whichis

thatitoffersahighlycoherentandintegratedpackageofsummativeandformativeassessments.In

particular,thesystemsrapidscoringworkflowandreportinginterfacewouldenableeducatorstouse

theDAEresultsfordiagnosticpurposesattheindividualstudentandclasslevel.Forexample,where

studentshavewrittenanessay,teacherswouldbeabletoseewhetherstudentscanwritethesortof

complexsentencesandcanmakeargumentsoutofideasthatareappropriateforthegradeslearning

trajectory.

The

pre

tests

for

each

exam

would

be

especially

useful

in

this

regard

because

the

pre

tests

assessthetopicsandstandardsthatteacherisabouttoteach.

Eachhexagonofthehoneycombcouldalsolinktoinstructionalresources(includingvideoexemplars

andsocialnetworking/collaboration).SeeFigures4and5.

Thistoolcanbeadaptedforuseinanystatewhosestandardsincludelearningtrajectoriescomparable

tothoseintheCommonCoreStandards.Weenvisionthattherewouldbetwomeasuresofproficiency

indicatedforeachskill/hexagon:thefirstbasedonformative(nostakes)dataandthesecondbasedon

summative(highstakes)data.

Put t in g Power and Choice in the Hand s o f Teachers

Theplatformwillincludeanassignmentbuilder,sothateducatorscanselectformativeassessment

itemsastasksforusebythestudentsintheclassroomorashomework.Thisallowstheteachersto

focusstudentworkontheparticularconceptsandskillsthattheyneedtodevelop.So,forinstance,a

teachercoulddrilldownfromaspecifichoneycombhexagon(CommonCoreStandard)tobuildan

assignmentforasubsetofherstudents.SeeFigures6and7.


17/60


17

Figure4.HoneycombforMathematicsSixthGrade.

Figure5.EachHexagonoftheHoneycombCouldAlsoLinktoInstructionalResources.


18/60


18

Figure6.Assignmentbuilder.

Figure7.Individualassignment.


19/60


19

Other P la t for m Tools

Inadditiontoprovidingthehoneycombandtoolstosupportmasscustomization,theAmerican

ExaminationSystemplatformwill:

Enablestudentstotaketheassessmentsonlineoronpaper; Enableteachers/schoolstoscananduploadpaperbasedassessmentsandother

studentwork;

Manageremotescoringworkflowandprovidescoringinterfaceforremoteraters; Provideteacherswithascoringinterface(couldincludeabilitytomarkupstudentwork

andrecordnotes)andareporting(gradebook)interface;

Providedashboardtoolsfortrackingandanalyzingtheprogressofparticularstudentsand

groups

and

students;

Provideprincipalsanddistrict/stateadministratorswithareportinginterfacethatincludesaggregateanalysis(includingcrossclass,crossteacher,crossschool,cross

districtandcrossstateanddemographiccomparisons,withthelongitudinal

dimensionsincludingvalueaddedonendofyearhighstakesincludedineach);

Allowuserstogeneratecustomreportsinrealtimeondemandwithbothteacherandprincipal/administratorinterfaces;

Allowteacherstoshareformativeassessmentswitheachotherandexpertstogaininstructional

advice

and

create

opportunities

for

professional

development;

and

Providerolebasedaccessrights(includingtoprotectstudentprivacy).9Thus,thesystemwillgatherandprovidereadyaccesstoaccountabilityinformation,andalsohelp

teachersandschoolstoimprovelearningmeasuredbyrigorousstandardsandgoodinstructional

practices.ItwouldcoverthefulltrajectoryfromPreKthroughGrade12.

TheAmericanExaminationSystemwouldnotassumethatallassessmentswillalwaysbeconductedwith

studentssittingatcomputers.Givencurrentschoolinfrastructure,andgiventhechallengeofshowing

mathematicsworkviakeyboard,itmaybemoreefficienttocontinuetorelytosomeextentonpaper

andpencil

inputs

to

an

otherwise

digital

system.

The

continued

value

of

these

primitive

recording

toolsseemsespeciallycompellingwhenoneconsidersthatmuchofthevalueofthenewgenerationof

assessmenttasksdependsonsolicitingopenendedexpressionsofstudentreasoningandthinkingand

inthecaseofmathematicsthisincludesdrawings,graphsandexplanations.

9Toensureprotectionofstudentprivacyrights,thesystemhasthecapacitytomakedigitizedstudentworkanonymousbefore

routingittoremotescorers.


20/60


20

SotheAmericanExaminationSystemwouldincludeaprocesstoenablescanning/digitalphotographing,

uploadingandarchivingofverylargevolumesofpaperbasedstudentwork,includingforDistributed

AccountabilityExams,toenableremotescoringaswellasonlinestudentportfolios.The

scanning/photographingprocess,whichhasalreadybeentestedinNorthCarolinaclassrooms,puts

minimalburdens

on

teachers

or

other

school

staff

and

does

not

require

large

per

school

investments

in

hardwareornetworkinfrastructure.

Fortheforeseeablefuture,assessmentofopenendedexpressionsofstudentreasoningandthinking

willrequireatleastsomeelementofhumanscoring.Doingthisrigorouslyandreliably,especiallyina

summativecontextwheretherearestakesforteachersandschoolsaswellasforstudents,requires

findingacosteffectiveandtimeeffectiveworkflowfordirectingtheworktoremotescorers(including

crossschoolorcrossstategrading/validationexercises;regradingofasampleofstudentpapersatthe

statelevel).

TheAmericanExaminationSystemplatformenablesthisworkflow.Itautomatesdeliveryofdigitized

studentwork

(including

paper

and

pencil

work)

to

raters

and

those

validating

the

ratings.

Student

identityiskeptprivate(theratersdonotknowwhoseworkitis).Theonlineinterfaceforremoteraters

presentsthemwiththestudentworkalongsidescoringformsbasedontherubricappropriateforthat

typeofwork.SeeFigure8.

Theplatformwillallowteachers,principals,districtsandpotentiallyparentsandthestudents

themselvestogeneratecustomreportsinrealtimeondemand.Thesereportswouldaggregate

longitudinaldatafromdifferentDistributedAccountabilityExamsandformativeassessmentstoprovide

amorecompletepictureofeachstudent,class,andschool.

Figure8.TheOnlineInterfaceforRemoteRatersfortheAmericanExaminationSystem.


21/60


21

Development and Costs

OurvisionfortheAmericanExaminationSystemisambitious.Whatmakesitrealisticisthesubstantial

amountofworkthathasalreadybeendoneindevelopingthecontentandtoolsneededtomakeit

work.

Forinstance,IFLhasextensiveexperienceindevelopingmodelinstructionalunitssuchastheonesthat

willbepartofthesystemandinworkingwithschoolsystemstotailortheunitstolocalneedsand

preferences(McConachie&Petrosky,2010).IFLunits(andaccompanyingassessments)arebuiltintothe

curriculumguidancesystemofseveralurbanschooldistrictsandhavebeenshowntoproducehigh

levelsofteacherengagementandimprovedinstructionwhenaccompaniedbyappropriateformsof

professionaltraining(David&Greene,2008;Resnick,inpress;Talbert&David,2008).

Manyoftherequiredtechnologiesarealreadyinuseinexistingassessmentanddatamanagement

applicationsorarenowbeingdevelopedthroughWirelessGenerationandthroughvariousinitiativesof

the

Bill

and

Melinda

Gates

Foundation

to

create

aligned

systems

of

curriculum

and

formative

assessment.Thus,forinstance,muchoftheplatformforauthoringandadministeringcognitively

demandingassessmentitemsatscalewillbeavailableforonlineuseinDecember2010.

Thesystemwedescribeisonethatwilloperatefullyabout3yearsfromthebeginningoftheprocess,

withmasspersonalizationofsummativeassessmentplayingalargerroleattheendofthattimeframe.

Muchofthesystem,includingtheDistributedAccountabilityExams,EducativeFormativeAssessments,

andotheraspectsofthetechnologyplatform,willbeoperationalin2years.

Plat form

BasedonthedirectexperienceofWirelessGenerationinbuildingasystemofcomparablecomplexity

(ARIS,the

education

information

system

for

the

countrys

largest

public

school

system),

we

estimate

thatasecureandscalableversionoftheinitialplatformcanbeavailableforusein6monthsafterwork

ontheprojectformallybegins;additionalfunctionalitywouldbeavailableafter12months;anda

comprehensivesysteminuseatscalein18months.Additionaldevelopment,relatedtotheresearch

androlloutofthemasspersonalizedaspectoftheassessments,wouldtakeplacewithinalongertime

frame(36months).

Assessments

Theplatformcouldbeusedtodevelopandrolloutassessmentsaccordingtothefollowingthreephases:

establishmentofcontentvalidity(forboththeDAEsandEducativeFormativeAssessments);

establishmentof

instructional

validity

(for

the

DAEs);

and

then

the

use

of

the

system

for

summative

accountabilitypurposes.Statesshouldbeconsultedtodeterminewhichgradesandwhatsubjectsto

prioritize.

Weanticipatethat,after12months,theDistributedAccountabilityExams,includingtheunitsformodel

instruction,foruseforGrades310,willreceivesignoffoncontentvaliditybyStateDepartments.The

EducativeFormativeAssessmentscouldbegintobeusedduringthisfirstyearafterthecontentis

validated.


22/60


22

Duringmonths1324,orsoonerifpossible,wewilldotheexperimentstocaptureinstructionalvalidity,

beginningassoonascontentvalidityisestablished.

Afterthis24monthperiod,theDistributedAccountabilityExamswouldbeusedforsummative

accountability

purposes.

Operat iona l Costs

Weestimate,foratypicalstate,theongoingcostsofthesystemwillbeaboutthesameasforcurrent

NCLBtests.Currentexpendituresaretypically$20$30perstudent,andinsomecaseshigherthan$80,

tocoverreadingandmathematics(U.S.DepartmentofEducation,2010).

AdministeringtheDistributedExams(includingthepreinstructionversion)willcostmoretodevelop

andscorethanthecurrenthighstakestests,iffornootherreasonthantheirfrequency.Butthecurrent

interimexams,andexpensesassociatedwiththose(typically$15$20ormoreperstudentperyear),

couldbeeliminated.

Teacherswithintheschooldistrictcouldscoretheexamsfromeachothersstudents,butasignificant

portionoftheongoingcostwouldbefromvalidationofsamplesofteacherscoring.

Apartfromprovisionoftablet/handheldandscanningdevicesforteachers(offtheshelf,industry

standardtechnologiesthatarecomingdowninpriceeachyear),costsassociatedwiththemaintenance

ofthetechnologyplatformwouldbeminimalwhenconsideredonaperstudentbasis.

Key System Characteristics

Rigorous Standards and Good Instructional PracticesThenewCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystem

thatiscloselytiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecore

gradelevelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.Theyare

specifiedatagrainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly


TheAmericanExaminationSystemincludesDistributedAccountabilityExams,foruseoverthecourseof

theschoolyear,whichmeasurethespecifichigherorderskillsthatarearticulatedintheCommonCore

Standardsandstatestandards,aswellasbasicknowledge.TheDistributedAccountabilityExamswill

include

extended

written

work

and

other

open

ended

expressions

of

student

reasoning

and

thinking;

in

mathematics,thesewouldincludedrawings,graphs,andexplanations.Theywillassessbasicknowledge

bothwithintheseconstructedperformancesand,whereappropriate,inclustersofmultiplechoice

items.After24months,thesetestswillbegintoreplacecurrentsummativetestsforaccountability

purposes.

TheDAEswillreflectwhatshouldbetaught(specifictopicsdeterminedbystateandCommonCore

Standards).DistributedAccountabilityExamswilladdresseachoftheskills/topicsarticulatedforeach


23/60


23

yearofthestateandcommonstandards.Inthefirstwave,therewillbeDistributedAccountability

ExamsformathematicsandliteracyforGrades310inliteracyandmathematics.Afterthat,sample

itemswouldbepublishedandinvitationsextendedforparticipatoryauthorshipofassessmentitems

thatrelatetothestandardsthatarebeingtestedandtheparticularitemandassessmenttypes.

TheDAEswouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.Eachexamwouldprovide

areliableestimateofstudentknowledgeonthecontentofaninstructionalunitthatisexplicitly

targetedtoastandard,orsetofstandards,intheCore.Thecollectionofexamscoresforayear(e.g.,

fivemathematicsexamsineachofGrades6and7)wouldprovideavalidestimateoftheextentto

whichastudent(class,school)hasmasteredthecontentspecifiedbythestandardsforthatyear.


Theexamswouldmatchcloselyinbothcontentandformthecontentthatisexpectedtobetaughtin

eachoftheinstructionalunits.Newinstructionalunits,explicitlylinkedtotheCorestandardswouldbe

createdto

anchor

the

content

validity

of

the

units.

Teams

of

independent

content

and

instructional

expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsandareof

highinstructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodel

instructionalunits.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandards

thatnowtroublesmanystateassessments.



instructiononthecontentoftheassessment.Ourdevelopmentprocesswouldincludetestsof

instructionalvalidity,similartotheexperimentbasedonesusedbythePittsburghScienceofLearning

Center.These

tests

would

involve

panels

of

teachers

with

good

knowledge

of

an

instructional

units

contentaswellasdemonstrablygoodpedagogicalskills(asjudgedbyanexpertpanel).Theseteachers

wouldbeputintofourgroups.Twoofthegroupswouldteachtheinstructionalunitthatcorrespondsto

theDistributedAccountabilityExam.Inoneofthesegroups,theywouldgetPretestAfortheirstudents

beforetheunitistaughtandthenthestudentswouldtakeTestB.Inthesecondofthesegroups,the

testsareflipped:TestBisthepretestandTestAisgiventostudentsaftertheunitistaught.Inthethird

andfourthgroups,studentswouldnotbetaughttheparticularinstructionalunitatthattime,butwould

stillbegiventhepretestsandposttests(onegroupwithAasthepretestandBastheposttest,the

otherwithBasthepretestandAastheposttest).Onlyteststhat,throughtheseexperiments,

systematicallyregisterimprovementsinstudentperformanceasaresultofcorrespondinginstruction

(anddemonstrate

equivalence

through

the

pre

and

post

test

swaps)

will

be

included

in

our

Distributed

AccountabilityExams.Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabank

oftasksthatwillbedevelopedaspartofthisvalidationprocess.

Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabankoftasksthatwillbe

developedaspartofthisvalidationprocess.ItemsortasksfortheDAEswillalsobepretestedand

calibratedusingstandardclassicalandmultidimensionalIRTframeworks.Availabilityofmultipleforms

oftheDAEswillallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,along


24/60


24

withteacherandschooleffectiveness.10Inaddition,preinstructionresultscanbeusedbyteachersas

partoftheformativedatatheyusetoplananinstructionalunit.

Rel iabi l i ty

DistributedAccountability

Exams

would

contain

a

mix

of

short

constructed

response

items,

and

more

extendedwrittenresponses,alongwithsetsofmultiplechoiceitemsasappropriatetothestandard

beingexamined.Shortandlongconstructedresponsecomponentswouldrequirehumanscoring.

Researchhasestablishedthatwhenconstructedresponsetasksarewelltargeted,scoringrubricsare

specificandgradersaretrained,ahighlevelofinterraterreliabilitycanbeattained(Mariano&Junker,

2007;Patzetal.,2002;Rayn&Shepard,2008).


notbythestudentsownteacher),orbygeographicallyandsociallyremotescorers(includingteachers


thathave

been

used

in

European

countries

(e.g.,

cross

school

or

cross

state

grading

exercises;

re

gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationinthegradingexamsand

therelatedvalidationexercises(someofwhichcouldbefacetoface)isagoodprocessforprofessional

learningandisusedinmostcountries.Thoughtheprocessismorecostlyindollarsthanmachine

scoring,itisaneducativeprocessworthbuildingintoourExaminationSystem.Gradevalidationatscale

wouldbesupportedbytheAmericanExaminationSystemplatform,whichcanenablerapid,cost

effectiveremotescanning,transmission,grading,validation,andreporting.Toensureprotectionof

studentprivacyrights,thesystemhasthecapacitytoanonymizethedigitizedstudentworkbefore

routingittotheremotescorersandvalidators,aswellas,forlimitedpurposes,automaticessayscoring

technologies.

TheDistributed

Accountability

Exams

open

the

possibility

for

increased

use

of

constructed

responses

becausetheyaredistributedoverthecourseoftheyear,yieldingseveraltimesmoreopportunityto

collectdatathancurrentendofyeartests.Thisalsobringsbenefitsintermsofincreasedtestreliability.

Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEs

takentogetherwouldbe5*(0.7)/(1+4*0.7)=0.92.Ifinstead,halfofeachDAEstestingtimewereused

forapretestonthenextinstructionalunitorsimplyforcalibratingfuturetestitems,theimprovement

wouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.

Yettoobtainthesemorereliableresults,studentswouldnothavetositfora5hourexam,oreventake

anendofyearexam.Theyjustwouldhavetotakeunitexamsastheynormallywouldinthecourseof

teaching,but

now

with

the

unit

exam

contributing

to

an

overall

accountability

score.

Another

advantage

isthatstudentswouldbetestedonrecentlylearnedmaterialatalltimes,sothatnuisanceeffectsof

delayedrecallwouldnotinfluencemeasuresofhowwellstudentswerelearningwhattheteachers

taught;thiswouldprobablyincreasereliabilityevenmore.

10Ifthepreinstructionversionsarenotlongenoughtobereliabletoestimateinstructionaleffectsonindividualstudents,then

thoseeffectswillbeestimatedonsomeaggregatelevel.


25/60


25

Althoughtheillustrationaboveisuseful,inrealityitlikelywillnotbepossibletostringtogetherDAEs

intoasingleunidimensionalmeasurementtowhichclassicalreliabilitycalculationsapply.Insteadwe

believetheDAEswithinasubjectwillbeatleastmildlymultidimensional;ifweconsidereachDAEwithin

asubjectwithinayearasameasureofoneproficiency,fiveDAEswouldbemeasuringfivedifferentbut

substantivelyrelated

proficiencies.

These

proficiencies

are

likely

to

be

statistically

related

as

well.

For

exampleinNAEP,proficiencysubscaleswithinthesamesubjectareaaretypicallycorrelated0.8or

higher,andseldomlowerthan0.50.6.Wecanexploitthesecorrelationsbybuildingamultidimensional

BayesianlatentvariablemodeltotakeadvantageofproficiencyestimatesfromoldDAEstohelp

producemorepreciseproficiencyestimatesforthenextDAE,orindeedtoshortenthenextDAEwithno

lossinmeasurementprecision.

Forexample,suppose11wewishtoestimateastudentsproficiencywithamarginoferrorof0.2(SEM=

0.1),andeachitemcontributesroughlyoneunitofFisherinformationtoproficiencyestimation(here

weareborrowinganIRTformulationforspecificity),thenthestudentwouldneedtoanswerroughly

100items.

However,

if

we

could

already

predict

the

proficiency

on

this

DAE

with

a

margin

of

error

of

0.4

usingpastDAEperformance,wewouldneedonlyroughly20moreitemstoobtainamarginoferrorof

0.2onthisDAE.

ThiscalculationdependsonthestudentsperformanceonthenewDAEbeingconsistent,inawaythat

canbemadepreciseusingBayesianmodeling,withhis/herperformanceonpastDAEs.Ifthestudents

responsesonthenextDAEareinconsistentwithhisorherolderDAEresults,wewouldneedtodo

followuptestingtogetamorepreciseestimateofthestudentsproficiency.Thusforstudentswho

learnconsistentlyfromoneunittothenext,wecanexploitpastperformancetohelpestimate

proficiencyonthecurrentunitofinstruction.However,forexample,forthestudentwhoperforms

unusuallywell(orpoorly)onthecurrentunit,wecanusetheBayesianmachinerytoseethe

inconsistency,and

offer

another

block

of

items

in

order

to

more

precisely

assess

that

students

learning.

Asimilarprocessisusedinonlinetutoringsystemsandadaptivetestingsystems,andisanillustrationof

thekindofusefulcustomizationthatisdiscussedbelow.

Distributedcontentandinstructionallyvalidatedexamsareanextlogicalstepinendingthetestingbind

anddevelopinganassessmentsystemthatwilldetectandrewardhighquality,effectiveteaching.

Insteadofsupportingtheuseofpracticematerialsthatmimictheoldendofyeartests,statescan

providehighqualityinstructionaltoolsthathelpteacherspreparestudentsforDAEexaminations.12

Therewillbenoneedforinterimtests,sinceDAEsandrelatedformativeassessmentswilloccur

throughouttheschoolyearattimesthatmakeinstructionalsense.Withthissystem,wegainabilityto

measurea

set

of

higher

order

skills

that

are

not

easily

otherwise

tested,

including

ones

essential

to

collegeandcareerreadyperformanceinreading,writingandmathematics,withoutaddingenormous

burdenoftesting.

11Thenumbersarechosenheremostlyforcomputationalconvenience,andmaynotreflecttheactualvaluesobtainedfrom

itemprecalibration,etc.12

Foradescriptionofapproachestoprovidingthiskindofinstructionalguidanceinformsthatdonotsuppressteacher

ingenuityandjudgment,seeResnick(inpress).


26/60


26

AnotherwayofvalidatingtheDistributedAccountabilityExamsscoreswouldbetocomparethemto

NAEPscores.StatesmightexpandtheuseoftheNAEPtest(everyyearand/orincreasethepercentage

ofstudents).

The

American

Examination

System

will

also

foster

a

rich

environment

of

formative

assessments

that

are

educativeinwaysthatdirectlyresemblethesummativesystem,butwithmoredirectapplicationto

dailyandweeklyinstruction.

TheywouldbealignedwiththelearningtrajectoriesderivedfromtheCommonCoreStandards,andthusalignedwithwhatteachersneedtoteach.

Theywouldmodelapproachestohowtoteach,andwould,attherequestofeducators,provideteachersstructuredopportunitiesforgainingexperienceinusingthoseteaching

methods.

Teachers

would

make

these

assessments

part

of

their

instructional

routine,

rather

than

anadditiontoit.Dataentry/recordkeepingburdenswillbeminimal,andteacherswill

haveeasyandquickaccesstostudent andclasslevelreportingaswellastoolsto

understandtheinstructionalsignificanceofthatdata.Bytrackingfidelityintheuseof

thesediagnostictools,thesystemwillhelpteacherstousethemappropriately.

Formativeassessmenttasksthatcannotbemachinescoredwillbeaccompaniedbysimplerubricsfor

quicklyanalyzingthestudentwork.Teacherswillbeabletousedigitaldevicestorecordtheseanalyses.

Throughthosedevices,theteacherswillalsobeprovidedwithsamplesofanswersthatcorrespondto

eachlevelontherubric,tohelpthemcalibratetheirownanalyses.Asaformofprofessional

developmentandtoimprovethereliabilityofanalyses,teacherscouldalsouploadthestudentwork

intothesystem,alongwiththeiranalyses,togetfeedbackfromotherteachersorsubjectmatter

experts.Theformativeassessmentswouldingeneralnotbeusedforsummativepurposes,butmetrics

ofteacherfidelityinimplementingtheformativeassessments(andtheirassociatedinstructional

recommendations)couldbeusedaspartofteacher/schoolperformancemanagement/accountability.

Toenableteacherstomakebestuseofallofthese,thesystemwillprovideanonlineplatformwhich

includes:thehoneycomb(totrackstudentprogressonlearningtrajectoriestowardscollegeandcareer

readiness,andtoaccessdiagnosticandinstructionalsupportforeachstageofeachtrajectory);other

dashboardtoolsfortrackingandanalyzingtheprogressofparticularstudentsandgroupsandstudents;

andinterfacesforuploading,sharing,scoring,reportingandanalyzingstudentwork.

Becausethe

system

will

administer

both

types

of

assessments

(Distributed

Accountability

Exams

and

formative),foraverylargenumberofstudentsoveraperiodofmultipleyearsandpotentiallyacross

multiplestates,andcantakeaccountofvariousotherstudent,teacherandschooldata,itwouldalso

eventuallybeabletoserveasanengineforthemasspersonalizationofassessments.Mass

personalizationforformativeassessmentcouldbedoneacrossmanydimensionstoinclude:past

studentperformanceonassessments;teacherandschoolcharacteristicsincludingaggregated

assessmentperformanceofstudentsandothermeasuresofpreviouseffectiveness;andwhich


27/60


27

curriculumwasused.Thisadaptiveorpersonalizedapproachtoassessmentwillenablegreaterprecision

inthedata;closeralignmenttothetaughtcurriculum;andlesstesting.

Thisinitialgoalformasspersonalizationwouldbetoapplyittoformativeassessment.Thereare,

already

in

use,

many

modalities

of

formative

assessment

(diagnostic,

progress

monitoring,

screening),

eachincludingamixofassessmenttypes(multiplechoice,constructedresponse,observation).Someof

thesearebestdeliveredaspartofgroupactivitiesandsomeoneononebetweenasinglestudentand

teacher.Manyteachers/districtsuseablendoftheseformativeassessments,whichmakessensegiven

thediverseneedsofparticularstudentsatdifferentmomentsoftheiracademicdevelopment;butsome

teacherswhoarenotthemselvesexpertsinformativeassessmentmethodologiesstruggletodecide

howbesttointegrateallofthesechoicesintotheirteachingroutinesfortheirparticularstudents.

So,inadditiontoprovidingnewEducativeFormativeAssessments,theAmericanExaminationSystem

wouldmasscustomizeamuchwiderrangeofformativeassessmentsatthestudentandclasslevel.This

isadaptiveassessmentatthelevelaboveindividualitemsitfiguresoutwhichformativeassessmentto

giveand

when

enabling

teachers

to

get

just

the

right

next

piece

of

information

they

need

about

their

students,withoutwastingalotofclassroomorotherschooltime.Withthisplatform,teacherswillbe

blendingmodesofassessmentinindividualizedwaysvaryingwhatdatatheycollectandhowbased

onwhatisknownsofarabouteachstudent.Tosupportthis,thesystemwillhostabankofformative

assessmentmaterials,tocoverthefullrangeofdiagnosticoptionsastateorschooldistrictwishesto

use,fromopensourceorcommercialsources.

ThemasspersonalizationprocesscanalsoaddtothereliabilityandefficiencyoftheDistributed

AccountabilityExams.Above,weshowedhowaBayesianmodelcanusedatafrompreviousDAEsto

makethenextDAEmoreefficient,aslongasthestudentisbehavingconsistentlyfromoneunittothe

next.

If

the

student

seems

to

be

performing

unusually

well

(or

poorly)

then

the

Bayesian

machinery

can

detectthisandsuggestacustomizationoftheDAEtofurtherexplorewhatthestudentknowsandcando.

Technology

Del i very

Integratedonlinedeliveryofallassessments.Bothsummative(DistributedAccountability)and

formativeassessmentsdeliveredtoteachersand/orstudentsacrossandwithinstatesthroughasingle

softwareplatform.Thesystemenablesacoherentuseofmultipletypesofassessments(includingtypes

thatwillbeadministeredonpaperandthenscanned)aspartofeffortstohavestudentsmeetthe

standards

and

move

along

the

skill

trajectories

towards

college

readiness

and

career

readiness.

ThehoneycomboffersaninteractiveonlinemapoflearningtrajectoriesbasedontheCommonCore

Standards.Itprovidesanintuitiveandaccessiblewayforeducatorstounderstandandmakeuseof

thesetrajectoriesallthewayfromPreKthrough12.Itwillalsoenablethemtograspthedependencies

amongandwithinthetrajectoriesforinstance,identifyingwhatlevelofwhichspecificliteracyskills

areneededtoachievemasteryofwhichmathematicsskills.Thistoolcanadaptedforuseinanystate

whosestandardsincludelearningtrajectoriescomparabletothosethatwillbeintheCommonCore.


28/60


28

TheAmericanExaminationSystemwilldeliverDistributedAccountabilityExams,formativeassessments

andavailableinstructionaloptionsforeachstepalongeachlearningtrajectory,startingwith

mathematicsandliteracyforGrades310.Thehoneycomballowseducatorstovisualizethesequenceof

assessmentsandinstructionaloptionsalignedwiththelearningtrajectories;theywillbedisplayedfor

educatorsat

intervals

along

scales

that

include

the

entire

range

of

skills

to

be

taught

in

PreK

12.

Other

(nonDAE)formativeassessmentsandinstructionaloptions,includingforPreK2and1112,canalsobe

alignedanddeliveredthroughthesameinterfacetohelpeducatorsusetheminacoherentwayto

identifyandaddresstheparticularlearningneedsofeachstudentastheymoveonthepathstowards

collegeandcareerreadiness.

Masscustomizationofassessments.BecausetheSystemwilladministeralltypesofassessmentsfora

verylargenumberofstudentsoveraperiodofmultipleyearsandpotentiallyacrossmultiplestates,and

cantakeaccountoftakeaccountofvariousothereducationdata,itwillbeabletoserveasanenginefor

themasspersonalizationofassessments.(Dimensionsandbenefitsofmasscustomizationdiscussedin

RigorousStandards

and

Good

Instructional

Practices

section.)

This

technology

is

scalablecomputing

powerissuchthatthereisnopracticallimitontheamountofeducationdatathatcouldbeincluded

sothatasmorestatesandmoretypesofdataareincluded,themoreprecise(anduseful)the

customizationbecomes.

Scor ing

Enableteachers/schoolstoscananduploadstudentwork.TheAmericanExaminationSystemdoesnot

assumethatallassessmentswillalwaysbeconductedwithstudentssittingatcomputers.Givencurrent

schoolinfrastructure,andgiventhechallengeofshowingmathematicsworkviaakeyboard,itmaybe

moreefficienttocontinuetorelytosomeextentonpaperandpencilinputstoanotherwisedigital

system.The

continued

value

of

these

primitive

recording

tools

seems

especially

compelling

when

one

considersthatmuchofthevalueofthenewgenerationofassessmenttasksdependsonsolicitingopen

endedexpressionsofstudentreasoningandthinkingandinthecaseofmathematicsthisincludes

drawings,graphs,andexplanations.

SotheAmericanExaminationSystemincludesaprocesstoenablescanning/digitalphotographing,

uploading,andarchivingofverylargevolumesofpaperbasedstudentwork,includingforDistributed

AccountabilityExams,toenableremotescoringaswellasonlinestudentportfolios.The

scanning/photographingprocess,whichhasalreadybeentestedinNorthCarolinaclassrooms,puts

minimalburdensonteachersorotherschoolstaffanddoesnotrequirelargeperschoolinvestmentsin

hardwareornetworkinfrastructure.

Remotescoringworkflowandinterface.Fortheforeseeablefuture,assessmentofopenended

expressionsofstudentreasoningandthinkingwillrequireatleastsomeelementofhumanscoring.

Doingthisrigorouslyandreliably,especiallyinasummativecontextwheretherearestakesforteachers

andschoolsaswellasforstudents,requiresfindingacosteffectiveandtimeeffectiveworkflowsfor

directingtheworktoremotescorers(includingcrossschoolorcrossstategrading/validationexercises;

regradingofasampleofstudentpapersatthestatelevel).


29/60


29

TheAmericanExaminationSystemplatformenablesthisworkflow.Itautomatesdeliveryofdigitized

studentwork(includingpaperandpencilwork)toratersandthosevalidatingtheratings.Student

identityiskeptprivate(theratersdontknowwhoseworkitis).Theonlineinterfaceforremoteraters

presentsthemwiththestudentworkalongsidescoringformsbasedontherubricappropriateforthat

typeof

work.

Formativeassessmentinterface.Forformativeassessment,theplatformprovidesascoringinterface

forteacherssimilartotheoneforremotescoringofDistributedAccountabilityExams.Thisinterface

includestoolstomarkupstudentworkandrecordnotes.Teacherscanalsoeasilyemailthemarkedup

worktostudentsandtheirparents(sotheygetfeedbackonthesamedaythattheassessmentwas

delivered).Whenelectronicessayscoringtechnologieswillbeusedtoaddprecisionand/orcanhelp

teachersmanagethetriageassociatedwithknowingwhichpapersmightrequirespecialattention.

SimilartoWirelessGenerationsmClassplatform,theAmericanExaminationSystemplatformcouldalso

includemobiletoolsthatenableteacherstodigitallyrecordwhattheyareobservingwhiletheyare

activelyinvolved

with

the

class.

Because

formative

assessment

is

part

of

each

teachers

day

to

day

instruction,capturingtheresultingdataprovidesawaytotrackinstructionalfidelity(whetherthe

teachersareusingtherecommendedgoodinstructionalpractices).

Repor t i ng

PlatformprovidesreportsandreportinginterfacesdescribedintheReportingsectionbelow.

Summative Assessments That Measure Growth and That P roject

Readiness

TheCommonCoreprovidesafoundationforacriterionreferencedexaminationsystemthatisclosely

tiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecoregradelevel

standardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.13Theyarespecifiedata

grainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondinglymeaningful

assessmentstojudgeprogresstowardcollegeandcareerreadiness.

TasksoritemsfortheDAEswouldbepretestedandcalibratedusingstandardclassicalandmulti

dimensionalIRTframeworks.Attheoutset,twoversionsofeachDAEwouldbedeveloped.Thetwo

versions,oneadministeredbeforeinstructionandoneafterwards,wouldbeusedbytheassessment

developerstoestablishinstructionalvalidityoftheexams.AvailabilityofmultipleformsoftheDAEs

wouldallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,alongwith

teacherand

school

effectiveness.14

13Someofthelearningsequencesinthestandardsarebasedonresearchconductedbymultiplescholarsoverthreedecades.

Othersarebasedonwellhonedintuitivejudgmentsbyexpertscholarsandpractitioners.Allwillrequirefurthervalidationin

useoverthecomingyears.Whatisnewandimportantinthecurrentcorestandardseffortisthatthestandardsareorganized

intomultidimensionalsequencesoflearningthatcaninformbothassessmentandinstruction.14Ifthepreinstructionversionsarenotlongenoughtobereliabletoestimateinstructionaleffectsonindividualstudents,then

thoseeffectswouldbeestimatedonsomeaggregatelevel.


30/60


30

Studentgrowthforpurposesofassessingprogresstowardcollegeandcareerreadinesscanbe

definedasprogressalongtheCommonCorelearningtrajectories.Inthisway,theAmerican

ExaminationSystemmeasurestheextenttowhichstudentsareontrack(andstudentgrowth)allthe

wayfromPreKthrough12.

Thisapproachallowsmeasurementnotjustofwhetherstudentsareontrack,butalsoidentifieswhich

specificskilldeficitsareholdingeachofthemback.Itallowsteacherstoanswerthequestion:what

shouldtheinstructionalfocusberightnow,tomovethisparticularstudentorgroupsofstudents

forwardtowardscollegeandcareerreadiness?Italsoidentifieswhereinstructionalpracticesand/or

curriculummayneedtobereworked(wherethemeasuresshowthatthemajorityofstudentshavenot

gainedaskill).

Thehoneycombservesasavaluablewaytodisplaythesemeasuresofstudentgrowthforthestudents

andtheirparents,becauseitoffersaneasilycomprehensiblemapofthatstudentsprogress,relativeto

time,andtothestandardsforeachgradeaswellastotheultimategoalsofcollegeandcareer

readiness.

Accessibility

AllpartsofoursystemincorporatetheprinciplesofUniversalDesignforLearning.

TheexamscanandshouldremovebarriersfornonnativeEnglishspeakersandforstudentswithspecial

learningneeds.FornonnativeEnglishstudents,thetestsshouldbedesignedsothatlanguagewillnot

unnecessarilymakethemeaningofthequestionsunclearsothatthesestudentswillunderstandthe

examssothattheycanbemeasuredfairly.

TheDAEswouldmirrortheinstructionthatstudentswillreceiveintheclassroom;wewouldcarefully

designandvalidateaccessibilityforstudentswithlowincidencedisabilities.Somestudentsmaydeviate

fromthelearningtrajectories,buttheyshouldremainfocusedonacademiccontent.Thesystemshould

maintainexpectationsforallstudentsandguideteachersonhowallstudentscanmasterconceptsand

skills.Assessmentswouldbedesignedforallstudents,modificationswouldallowasmanystudentsas

possibletobevalidlyassessedwithinthesystem,andtherewouldbeflexibilityintermsofmodalityof

testadministrationanditemtype.

Technical Quality

ThenewCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystem

that

is

closely

tied

to

instruction

yet

meets

crucial

criteria

of

technical

quality

of

assessment.

The

core

gradelevelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.Theyare

specifiedatagrainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly


TheAmericanExaminationSystemincludesDistributedAccountabilityExams,foruseoverthecourseof

theschoolyear,whichmeasurethespecifichigherorderskillsthatarearticulatedintheCommonCore

Standardsandstatestandards,aswellasbasicknowledge.TheDistributedAccountabilityExamswill


31/60


31

includeextendedwrittenworkandotheropenendedexpressionsofstudentreasoningandthinking;in

mathematics,thesewouldincludedrawings,graphs,andexplanations.Theywillassessbasicknowledge

bothwithintheseconstructedperformancesand,whereappropriate,inclustersofmultiplechoice

items.

DistributedAccountabilityExamswilladdresseachoftheskills/topicsarticulatedforeachyearofthe

stateandcommonstandards.Theywouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.

Eachexamwouldprovideareliableestimateofstudentknowledgeonthecontentofaninstructional

unitthatisexplicitlytargetedtoastandard,orsetofstandards,intheCore.Thecollectionofexam

scoresforayear)wouldprovideavalidestimateoftheextenttowhichastudent(class,school)has

masteredthecontentspecifiedbythestandardsforthatyear.


Theexamswouldmatchcloselyinbothcontentandformthecontentthatisexpectedtobetaughtin

eachof

the

instructional

units.

New

instructional

units,

explicitly

linked

to

the

Core

standards

would

be

createdtoanchorthecontentvalidityoftheunits.Teamsofindependentcontentandinstructional

expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsand

instructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodelinstructional

units.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandardsthatnow

troublesmanystateassessments.



instructiononthecontentoftheassessment.Ourdevelopmentprocesswouldincludetestsof

instructionalvalidity,

similar

to

the

experiment

based

ones

used

by

the

Pittsburgh

Science

of

Learning

Center.Thesetestswouldinvolvepanelsofteacherswithgoodknowledgeofaninstructionalunits

contentaswellasdemonstrablygoodpedagogicalskills(asjudgedbyanexpertpanel).Theseteachers

wouldbeputintofourgroups.Twoofthegroupswouldteachtheinstructionalunitthatcorrespondsto

theDistributedAccountabilityExam.Inoneofthesegroups,theywouldgetpretestAfortheirstudents

beforetheunitistaughtandthenthestudentswouldtakeTestB.Inthesecondofthesegroups,the

testsareflipped:TestBisthepretestandTestAisgiventostudentsaftertheunitistaught.Inthethird

andfourthgroups,studentswouldnotbetaughttheparticularinstructionalunitatthattime,butwould

stillbegiventhepretestsandposttests(onegroupwithAasthepretestandBastheposttest,the

otherwithBasthepretestandAastheposttest).Onlyteststhat,throughtheseexperiments,

systematicallyregister

improvements

in

student

performance

as

a

result

of

corresponding

instruction

(anddemonstrateequivalencethroughthepre andposttestswaps)willbeincludedinourDistributed

AccountabilityExams.

Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabankoftasksthatwillbe

developedaspartofthisvalidationprocess.ItemsortasksfortheDAEswillalsobepretestedand

calibratedusingstandardclassicalandmultidimensionalIRTframeworks.


32/60


32

Rel iabi l i ty

DistributedAccountabilityExamswouldcontainamixofshortconstructedresponseitems,andmore

extendedwrittenresponses,alongwithsetsofmultiplechoiceitemsasappropriatetothestandard

being

examined.

Short

and

long

constructed

response

components

would

require

human

scoring.

Researchhasestablishedthatwhenconstructedresponsetasksarewelltargeted,scoringrubricsare

specificandgradersaretrained,ahighlevelofinterraterreliabilitycanbeattained(Mariano&Junker,

2007;Patzetal.,2002;Rayn&Shepard,2008).


notbythestudentsownteacher),orbygeographicallyandsociallyremotescorers(includingteachers


thathavebeenusedinEuropeancountries(e.g.,crossschoolorcrossstategradingexercises;re

gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationinthegradingexamsand

therelatedvalidationexercises(someofwhichcouldbefacetoface)isagoodprocessforprofessional

learningand

is

used

in

most

countries.

Though

the

process

is

more

costly

in

dollars

than

machine

scoring,itisaneducativeprocessworthbuildingintoourExaminationSystem.Gradevalidationatscale

wouldbesupportedbytheAmericanExaminationSystemplatform,whichcanenablerapid,cost

effectiveremotescanning,transmission,grading,validation,andreporting.Toensureprotectionof

studentprivacyrights,thesystemhasthecapacitytoanonymizethedigitizedstudentworkbefore

routingittotheremotescorersandvalidators,aswellas,forlimitedpurposes,automaticessayscoring

technologies.

TheDistributedAccountabilityExamsopenthepossibilityforincreaseduseofconstructedresponses

becausetheyaredistributedoverthecourseoftheyear,yieldingseveraltimesmoreopportunityto

collect

data

than

current

end

of

year

tests.

This

also

brings

benefits

in

terms

of

increased

test

reliability.

Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEs

takentogetherwouldbe5*(0.7)/(1+4*0.7)=0.92.Ifinstead,halfofeachDAEstestingtimewereused

forapretestonthenextinstructionalunitorsimplyforcalibratingfuturetestitems,theimprovement

wouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.

Yettoobtainthesemorereliableresults,studentswouldnothavetositfora5hourexam,oreventake

anendofyearexam.Theyjustwouldhavetotakeunitexamsastheynormallywouldinthecourseof

teaching,butnowwiththeunitexamcontributingtoanoverallaccountabilityscore.Anotheradvantage

isthatstudentswouldbetestedonrecentlylearnedmaterialatalltimes,sothatnuisanceeffectsof

delayedrecallwouldnotinfluencemeasuresofhowwellstudentswerelearningwhattheteachers

taught;this

would

probably

increase

reliability

even

more.

Althoughtheillustrationaboveisuseful,inrealityitlikelywillnotbepossibletostringtogetherDAEs

intoasingleunidimensionalmeasurementtowhichclassicalreliabilitycalculationsapply.Insteadwe

believetheDAEswithinasubjectwillbeatleastmildlymultidimensional;ifweconsidereachDAEwithin

asubjectwithinayearasameasureofoneproficiency,fiveDAEswouldbemeasuringfivedifferentbut

substantivelyrelatedproficiencies.Theseproficienciesarelikelytobestatisticallyrelatedaswell.For

exampleinNAEP,proficiencysubscaleswithinthesamesubjectareaaretypicallycorrelated0.8or


33/60


34/60


34

Produce Resul ts That Can Be Aggreg ated a t th e Classroom , School , D ist r i c t , and

Sta t e Leve ls

Yes.

Produce Repor

ResnickBergerSystemModel

Documents

Transcript of ResnickBergerSystemModel