ResnickBergerSystemModel
-
Upload
gothamschoolsorg -
Category
Documents
-
view
216 -
download
0
Transcript of ResnickBergerSystemModel
-
8/8/2019 ResnickBergerSystemModel
1/60
Created by Educational Testing Service (ETS) to forward a larger social mission, the Center for K 12 Assessment &
Performance Management has been given the directive to serve as an independent catalyst and resource for the
improvement of measurement and data systems to enhance student achievement.
pyright 2010 Wireless Generation, Inc. and Institute for Learning.. All rights reserved. No reproduction, use or distribution of any part of this material without the specific authorization of Educational Testing Service. 1
An American
Examination System
Lauren B. Resnick and Larry Berger
-
8/8/2019 ResnickBergerSystemModel
2/60
-
8/8/2019 ResnickBergerSystemModel
3/60
-
8/8/2019 ResnickBergerSystemModel
4/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
4
The Problem
Overthepasttwodecades,ourcountryhasbeentryingtobuildastandardsbasedaccountability
systemasafoundationforamoreequitableandhigherachievingeducationsystem.Inpractice,
however,we
have
created
a
test
based
accountability
system
that
does
not
reflect
the
standards
we
aimedforatthebeginningofthe1990s,muchlesstodaysfewer,clearer,higherCommonCore
Standards.
Severalstudies,usingseveraldifferentmethodologies,haveshownthatthestatetestsdonotmeasure
thehigherorderthinking,problemsolving,andcreativityneededforstudentstosucceedinthe21st
century.Thesetests,withonlyafewexceptions,systematicallyoverrepresentbasicskillsand
knowledgeandomitthecomplexknowledgeandreasoningweareseekingforcollegeandcareer
readiness.2
Themisrepresentationofstandardsbymostcurrentaccountabilitytestshashadnegativeeffectson
teaching
and
learning,
especially
for
poor
and
minority
students.
The
tests
carry
consequences,
and
manyeducatorsservingpoorstudentsaimtoraisetestscoresinthemostdirectinsomecases,the
onlywaytheyknow:Theyprovidepracticeonexercisesthatsubstantiallymatchtheformatand
contentoftheirstatesendofyearaccountabilitytests.Theseexercisesoftendepartsubstantiallyfrom
bestinstructionalpractice.Somestudieshavedocumentedasystematicdeclinefromfalltospringinthe
qualityofinstruction.Inreading,forexample,thecomplexityoftextsthatstudentsengagewithis
lowerinthesameclassroomswiththesamechildreninMarchthaninOctober.Andthereisless
discussionoftextandwordmeaningasteachersdirectchildrenthroughworkbookexercisesthatmimic
statetestitems(Anagnostopoulos,2003;Koretz&Hamilton,2006;McNeill,2002).Principalsanddistrict
administratorsencouragethispractice.Theyintroduceinterimassessmentsthatlargelymirrortheend
ofyear
tests
rather
than
model
the
kinds
of
performance
intended
by
the
standards.
They
do
this
becausethetestscount,andtheyareafraidthatwithoutpractice,studentswillnotdowellenoughto
meetadequateyearlyprogress(AYP)requirements.
Callsnowaboundforevenmorefrequenttestingandforfocusingteachersattentionearlyandoftenon
whichitemstheirstudentsarehavingdifficultyansweringontheinterimassessments.Butunlessthe
processisguidedbyafundamentalunderstandingofwhatkindofteachinghelpschildrenacquire
robustcompetence,weshouldnotbesurprisedwhenthemostfrequentresponsetoweakearlytest
scoresistopracticethetest.Thoughnooneintendedtodoso,wehavecreatedatestingbindthat,asit
tightens,drivesattentionawayfromtheintendedstandards.Theeffectsaregreatestinthepoorest
schools.Thenationscurrentapproachtoraisingachievementandincreasingequityintheeducation
systemis
having
an
effect
opposite
from
the
intended
one.
It
is
trapping
poor
children
in
a
basic
skills
teachingprogramthatgivesthemlittlechancetoacquirethedeeperknowledgeandabilitiesweseek
foreveryone.Anditmaybeloweringthelearningopportunitiesevenformanymoreprivilegedchildren
asschoolsturntheirenergiestothetestbasedbasicskillsprogram.
2Theproblemcannotbefixedbychangingcutscoressothatstatesnolongerdeemasbeingproficienttest
performancesthatbarelymeetNAEPstandardsforbasiclevelsofachievement.Thetestsarefundamentally
misalignedwith21st
centuryexpectations.Forananalysis,seeResnick,Stein,andCoon(2008).
-
8/8/2019 ResnickBergerSystemModel
5/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
5
Manyeducators,parents,andcitizenshaverespondedbyclamoringforanendtotestbased
accountability.WitnesstheonesidedreactiontoarecenteditorialintheNewYorkTimeswrittenby
SusanEngel(2010)callingforlesstestingandmoreplay(andbyimplication,lessdirectinstruction)for
children.Astreamofsupportivecommentarybyreadersensuedbutnoneexpressingconcernabout
howto
educate
poor
children,
minority
children,
or
English
language
learners
to
college
ready
levels
of
achievement.MostofthechildrenofresponderstoEngelsarticlewouldnotbeharmedmighteven
benefitbyaweakenedaccountabilitysystem.ButtheotherstheonesnoonespokeforintheNew
YorkTimesexchangecouldloseeventheslenderchanceswenowofferthem.
A Solu t ion
Testingandaccountabilityshouldremainattheheartofnationaleducationpolicy.Equityandnational
prosperitydependonasystemthatwillstretcheducators,theeducationsystem,andcommunitiesto
worktowardhighachievementandthatwillenableclearaccountabilitywhenachievementgoalsare
missed.Butthereshouldbenewformsofassessment,functioninginnewwayswithintheeducation
system,tomeettheneeds.Asearlyas1992,scholarsshowedhowinmanycountriesoftheworld,
tightlylinkedexaminationandcurriculumsystemskeptaspirationshigh,guidedteachersintheirwork,
andsometimescreatedpathwaysforyoungpeoplewhodidnotcomefromprivilegedfamilies
(Resnick&Resnick,1992).Thesecretlayinchargingteacherstopreparetheirstudentsforexamsand
makingsurethattheexamswereworthstudyingfor.Forthesystemtowork,teachersandstudents
neededtohavearoughideaofthekindsofquestionsthatwouldbeposedontheexamsalthoughnot
thespecificquestionsthatwouldappear.Thesystemsalsorequiredtrustthatexamgradeswouldbe
fairthatis,studentswouldlikelyreceivethesamegradenomatterwhoscoredtheirwrittenwork
(writtenessayspredominatedovershortanswerandmultiplechoiceitemsbecausethecountries
valuedthekindsofthinkingthatweredisplayedinsuchessays).Systemsforcheckingongradefairness
(andallowingchallengesinafewcases)variedamongthecountriesstudied,butallfoundwaysof
maintainingpublictrustinthesystem.
Inthispaper,weoutlineanAmericanExaminationSystem,onethatreflectskeyaspectsofthe
substantive,cognitivelydemandingEuropeansystems,whilemaintainingstandardsofpsychometric
rigornecessarytosupportAmericasaccountability,comparability,andequityagendas.
TheAmericanExaminationSystemwehaveinmind:
modelsthekindsofinstructionthatarevaluedsothatpreparingstudentsforassessmentworksforratherthanagainsthighcognitivedemandinstruction;
situatesexamswithinthestreamofongoinginstructionsothatassessmentssupportteachingratherthandistractfromit;
ensurescontentandinstructionalvalidityofallassessmentssothatthealignmentproblemsthathaveplaguedstatetestingsystemscanberesolved;
providesreliableandvalidaccountabilitymeasuresforstudent,school,andeducatorperformance;
includesdiagnostictoolsforinstructiontomeetindividualstudentneeds;
-
8/8/2019 ResnickBergerSystemModel
6/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
6
leveragesadvanceddatacollectionandcomputationalresourcestomasspersonalizetheformativeassessments,improvingtheirprecisionandusefulness.
TheAmericanExaminationSystemweoutlinewouldbeeducativeforthosewhouseit.Itwouldnotjust
tellus
how
well
students,
teachers,
and
schools
are
performing,
but
also
teach
teachers
how
to
teach,
teachstudentshowtolearn,andteacheducationorganizationshowtodevelopteachingexpertise.It
wouldmeetthiseducativegoalthroughasystemthatcombinesdistributedaccountabilityexamslinked
tospecifictopicsforinstructionwithdiagnostic,formativeassessmentsdesignedforteacheruseduring
instruction.
Anonlineplatformwillmakeitpossibletodeployandmanagealloftheseelementsatscaleinacost
effectivewaywhileminimizingadditionalburdensforteachers,students,andadministrators.This
onlineplatformwouldbemuchmorethanasystemforadministering,scoring,andreportingon
assessments.Itcansurroundthewhatofassessmentoutcomeswithusefulrepresentationsofsowhat?
(professionaldevelopment)
and
now
what?
(more
targeted
instructional
resources)
so
that
everyone
focusesontheconsequentialandinstructionalvalidityofassessmentresultsandnotjustthe
accountabilitypressure.
Distributed Accountability Exams (DAEs)
Accountabilitydatainthissystemwouldbederivedfromexamsthatareadministeredatintervals
throughouttheschoolyear,occurringafterstudentshavecompletedaunitofstudyonparticular
contentandskillsasidentifiedintheCommonCoreStandardsandstatestandards.Accountabilitydata
wouldbereportedonthebasisofindividualstudent,subgroup,class,school,anddistrict,aswellas
acrossclasses,schools,anddistricts.Thetypesoftasksontheexamswouldbelargelyfamiliarto
students,who
would
have
worked
on
similar
tasks
in
the
course
of
instruction.
But
neither
teachers
nor
studentswouldknowpriortotheDAEexactlywhatquestionswouldappear.Basedonwhatisrequired
fromthenewCommonCoreStandards,weexpectthreetofiveDAEsperyearinmathematicsand
literacyateachgrade,witheachexamassessingmaterialcoveredthrough37weeksofinstruction,but
thespecificsofnumberandtimingwouldneedtobeworkedoutwithstates.
TheDAEswouldmodelthekindofhighcognitivedemandperformancesintendedbytheCommonCore
Standardsandrigorousstatestandards,aswellastestbasicproceduralskills.Inliteracy,theywould
includeextendedwrittenworkandotheropenendedexpressionsofstudentreasoningandthinking;in
mathematics,theywouldincludedrawings,graphs,mathematicalexpressions,andexplanations.They
would
assess
basic
knowledge
both
within
these
constructed
performances
and,
where
appropriate,
in
clustersofmultiplechoiceitems.Inadditiontomodelinghighcognitivedemandinstruction,theDAEs
wouldreflectwhatshouldbetaught(specifictopicsdeterminedbystateandCommonCoreStandards).
TheCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystemthatis
closelytiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecoregrade
-
8/8/2019 ResnickBergerSystemModel
7/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
7
levelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.3Theyarespecified
atagranularsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly
meaningfulassessments.
Tasks
or
items
for
the
DAEs
would
be
pre
tested
and
calibrated
using
standard
classical
and
multi
dimensionalitemresponsetheory(IRT)frameworks.Inaddition,eachDAEwouldundergoarigorous
processofestablishingcontentvalidityandinstructionalvalidityprocessesthattesttheoryoftencalls
forbutarenotpartofstandardprocedureinmostinstancesofeducationtestdesign.Astheproject
matures,taskswouldbecollectedintoitembanksforuseinfutureconstructionofDAEs.Informationon
studentperformancedata,instructionaltargets,andtheformsofinstructionthatresultreliablyin
studentlearningwouldbesharedwithstakeholdersincludingparentsandstudents,teachers,schools,
testingadministrators,andthoseresponsibleforpreparingandselectingteachers.
Ideally,everystudentwouldtakeeachDAEwhenheorsheisreadyandnotbefore.Sothelongterm
goalshouldbetohavesufficientalternateexamsthatstudentshavemorethanonechancetotakean
exam(as
they
do
for
New
York
State
Regents).
Attheoutset,amorelimitedsetofequivalentexamstwoversionsofeachDAEwouldbedeveloped.
Thetwoversions,oneadministeredbeforeinstructionandoneafterwards,wouldbeusedbythe
assessmentdeveloperstoestablishinstructionalvalidityoftheexams.Availabilityofmultipleformsof
theDAEswouldallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,along
withteacherandschooleffectiveness.Inaddition,preinstructionresultscouldbeusedbyteachersas
partoftheformativedatatheyusetoplananinstructionalunit.
Figure1,adiagramofhowtheDAEsmightprogressthroughtheschoolyear,showshowDAEsinteract
withformativeassessments(describedinsubsequentsections)thatarealsointegratedintothesystem.
Figure1.ExampleofHowDistributedAccountabilityExams(DAEs)MightProgressinaSchoolYear.
3Someofthelearningsequencesinthestandardsarebasedonresearchconductedbymultiplescholarsoverthreedecades.
Othersarebasedonwellhonedintuitivejudgmentsbyexpertscholarsandpractitioners.Allwillrequirefurthervalidationin
useoverthecomingyears.Whatisnewandimportantinthecurrentcorestandardseffortisthatthestandardsareorganized
intomultidimensionalsequencesoflearningthatcaninformbothassessmentandinstruction.
-
8/8/2019 ResnickBergerSystemModel
8/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
8
ThesixthandseventhgradeCommonCoreStandardsformathematicsspecifyfivecontentareas:
RatiosandProportionalRelationships TheNumberSystem ExpressionsandEquations Geometry(insixthgrade,PropertiesofArea,SurfaceArea,andVolumeareexplicitly
named)
StatisticsandProbabilityTheRatiosandProportionalRelationshipssectionforsixthgrademathematics(seeAppendixA)includes
twoparallelsetsofstandards,oneforMathematicalUnderstandingandoneforMathematicalSkill.In
addition,thereisasetofstandardsforMathematicalPracticethatthestandardswritersintendtoapply
atall
grade
levels,
although
it
is
understood
that
the
student
performances
representing
good
mathematicalpracticewilllooksubstantiallydifferentatdifferentage/gradelevels.OurDAEswould
provideavalidandreliablepictureofhowstudentsareprogressingontheMathematicalPractice
standardsaswellasonthespecificcontentstandards.
Figure2displaysthesixthgradestandardsinavisualizationwecallthehoneycombthatspecifiesour
hypothesesabouttheinterdependenciesamongthem.Thehoneycomb,whichwedescribemorefully
below,servesasavisualrepresentation(interactivemap)oftheinstructionalandassessmentspacethat
needstobetraversedinallgrades,includingthesixth,andalsoasaframeforassemblingdataon
studentperformanceinamannerthatwillsupportinferencesabouttheprogressofindividualstudents,
classesof
students,
schools,
and
school
districts.
Takentogether,theMathematicalUnderstanding,MathematicalSkill,andMathematicalPractice
standardsinformandconstraintheassessmentsthatwouldbebuiltfortheDistributedAccountability
Exams.Assume,forpurposesofdevelopinganexample,thatthesixthandseventhgrademathematics
teachingwillbedividedintofiveunitsofinstruction,oneunitforeachofthefivecontentareas.One
wouldthusneedfivecontentspecificexamsinmathematicseachyearforsixthandseventhgrades.The
exams(liketheinstructionalunitstheyreference)mightnotbeofequallength,becausesomeofthe
standardscovermorematerialthanothers.Butweenvisionexamsof40to75minutesinlength,each
gearedtoateachingunitof3to7weeks.
An
example
of
an
exam
covering
the
sixth
grade
unit
on
Ratios
and
Proportional
Relationships
is
includedinAppendixA.
-
8/8/2019 ResnickBergerSystemModel
9/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
9
Figure2.VisualizationofSixthGradeStandardsasaHoneycomb.
An Engl i sh Languag e Ar ts Exam ple
TheEnglishlanguagearts/literacystandardscansimilarlybeusedtospecifysequencesofinstructional
unitsandassessments.Thecorestandardsareorganizedingradebandsratherthangradebygrade.As
inmathematics,skillsandunderstandingsareexpectedtodevelopovermultipleyears.Inaddition,
guidelinesexistforchoosingtextsthatusemodernquantitativemethodstocharacterizethecognitive
andlinguisticcomplexityofwritinginseveraldifferentgenres.
UsingalloftheseresourcesoftheCommonCoreStandards,wehavesketcheddistributedexamsfor
Englishlanguagearts;oneexampleofanexamisinAppendixB.
Validity and Reliability in Distributed Examinations
TheDAEswouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.Eachexamwouldprovide
areliableestimateofstudentknowledgeonthecontentofaninstructionalunitthatisexplicitly
targetedtoastandard,orsetofstandards,intheCore.Thecollectionofexamscoresforayear(e.g.,
fivemathematicsexamsineachofGrades6and7)wouldprovideavalidestimateoftheextentto
whichastudent(class,school)hasmasteredthecontentspecifiedbythestandardsforthatyear.
-
8/8/2019 ResnickBergerSystemModel
10/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
10
Cont ent Va l id i t y
Theexamswouldmatchclosely,inbothcontentandform,thecontentthatisexpectedtobetaughtin
eachoftheinstructionalunits.Newinstructionalunits,explicitlylinkedtotheCorestandards,wouldbe
created
to
anchor
the
content
validity
of
the
units.
Teams
of
independent
content
and
instructional
expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsandareof
highinstructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodel
instructionalunits.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandards
thatnowtroublesmanystateassessments.(Stateswouldnot,however,berequiredtousethemodel
instructionalunitsintheiractualclassrooms.)
I ns t ruc t i ona l Val i d i t y
Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality
instructiononthecontentoftheassessment.Althoughinstructionalvalidityispartofthegoldstandard
for
educational
testing,
it
is
almost
never
established
in
current
assessment
practice.
We
can
do
better.
Wewillapplystrategiesofinvivo(liveclassroom)researchdevelopedbythePittsburghScienceof
LearningCenter.Thesescientific(experimentbased)researchstrategiescanbeusedtoestablish
whethereachparticularDAE,infact,respondstogoodteaching.Statesandschooldistrictsusingthe
DAEsystemwouldbeabletovalidateDAEsagainstbestpracticeinstructiondevelopedbytheirmost
effectiveteachers.
Rel iabi l i ty
DAEswouldcontainamixofshortconstructedresponseitemsandmoreextendedwrittenresponses,
alongwithsetsofmultiplechoiceitemsasappropriatetothestandardbeingexamined.Shortandlong
constructed
response
components
would
require
human
scoring.
Research
has
established
that
when
constructedresponsetasksarewelltargeted,scoringrubricsarespecific,andgradersaretrained,ahigh
levelofinterraterreliabilitycanbeattained(Mariano&Junker,2007;Patz,Junker,Johnson,&Mariano,
2002;Rayn&Shepard,2008).
Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut
notbythestudentsownteacher)orbygeographicallyandsociallyremotescorers(includingteachers
elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods
thathavebeenusedinEuropeancountries(e.g.,crossschoolorcrossstategradingexercises;re
gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationingradingexamsandthe
relatedvalidationexercises(someofwhichcouldbefacetoface)createsagoodprocessfor
professionallearning,
one
that
many
countries
use.
DAEsopenthepossibilityforincreaseduseofconstructedresponsesbecausetheyaredistributedover
thecourseoftheyear,yieldingseveraltimesmoreopportunitytocollectdatathancurrentendofyear
-
8/8/2019 ResnickBergerSystemModel
11/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
11
tests.Thisalsobringsbenefitsintermsofincreasedtestreliability.4Yettoobtainthesemorereliable
results,studentswouldnothavetositfora5hourexamoreventakeanendofyearexam,depending
onhowaparticularstatesystemisdesigned.Theyjustwouldhavetotakeunitexamsastheynormally
wouldinthecourseofteaching,butnowwiththeunitexamcontributingtoanoverallaccountability
score.
Inaddition,weproposetouseearlierassessmentdatatohelpproducemorepreciseproficiency
estimatesforeachDAE.Thisapproach,similartowhatisusedinsomeonlinetutoringsystemsand
adaptivetestingsystems,couldmakeitpossibletoshortenmanyoftheassessmentswithnolossin
measurementprecision(seeAppendixC).
Distributedcontentandinstructionallyvalidatedexamsareanextlogicalstepinendingthetestingbind
anddevelopinganassessmentsystemthatwilldetectandrewardhighquality,effectiveteaching.
Insteadofsupportingtheuseofpracticematerialsthatmimictheoldendofyeartests,statescan
providehighqualityinstructionaltoolsthathelpteacherspreparestudentsforDAEexaminations.5
Therewill
be
no
need
for
the
current
crop
of
interim
tests
that
simply
mirror
the
end
of
year
test,
since
DAEsandrelatedformativeassessmentswilloccurthroughouttheschoolyearattimesthatmake
instructionalsense.Withthissystem,wegainabilitytomeasureasetofhigherorderskillsthatarenot
otherwiseeasilytested,includingskillsessentialtocollegeandcareerreadyperformanceinreading,
writing,andmathematics,withoutaddingenormousburdenoftesting.
Educative Formative Assessments
TheAmericanExaminationSystemwillfosterarichenvironmentofformativeassessmentsthatare
educativeinwaysthatdirectlyresemblethesummativesystem,butwithmoredirectapplicationto
dailyandweeklyinstruction.
TheywouldbealignedwiththelearningtrajectoriesderivedfromtheCommonCoreStandards,andthusalignedwithwhatteachersneedtoteach.
Theywouldmodelapproachestohowtoteach,andwould,attherequestofeducators,provideteachersstructuredopportunitiesforgainingexperienceinusingthoseteaching
methods.
Teacherswouldmaketheseassessmentspartoftheirinstructionalroutine,ratherthanan
addition
to
it.
Data
entry/record
keeping
burdens
will
be
minimal,
and
teachers
will
haveeasyandquickaccesstostudent andclasslevelreportingaswellastoolsto
4Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEstakentogetherwouldbe
5*(0.7)/(1+4*0.7)=0.92.If,instead,halfofeachDAE'stestingtimewereusedforapretestonthenextinstructionalunitor
simplyforcalibratingfuturetestitems,theimprovementwouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.5Foradescriptionofapproachestoprovidingthiskindofinstructionalguidanceinformsthatdonotsuppressteacher
ingenuityandjudgment,seeResnick(inpress)andMcConachieandPetrosky(2009).
-
8/8/2019 ResnickBergerSystemModel
12/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
12
understandtheinstructionalsignificanceofthatdata.Bytrackingfidelityintheuseof
thesediagnostictools,thesystemwillhelpteacherstousethemappropriately.
Formativeassessmenttasksthatcannotbemachinescoredwillbeaccompaniedbysimplerubricsfor
quicklyanalyzing
the
student
work.
Teachers
will
be
able
to
use
digital
devices
to
record
these
analyses.
Throughthosedevices,theteacherswillalsobeprovidedwithsamplesofanswersthatcorrespondto
eachlevelontherubric,tohelpthemcalibratetheirownanalyses.Asaformofprofessional
developmentandtoimprovethereliabilityofanalyses,teacherscouldalsouploadthestudentwork
intothesystem,alongwiththeiranalyses,togetfeedbackfromotherteachersorsubjectmatter
experts.
A Mathematics Example
Educativeformativeassessmentsinmathematicswillbedesignedrecognizingthatcognitively
demandingtaskscantypicallybesolvedinmanydifferentways.Fromadiagnosticperspective,itcanbe
asimportant
to
know
how
a
student
is
attempting
to
solve
a
problem
as
it
is
to
know
his
or
her
answer.
Theproblemsolvingtechniqueis,inmanycases,partofwhatisspecifiedinthestandards.The
sequenceofhowthesetechniquesareusedovertimewilloftenindicateastudentsprogressin
understandingconceptsandmovingalongalearningtrajectory.SotheEducativeFormative
Assessmentswouldincludeitemsthatcapturethisinformationandempowerteacherstolearnto
recognizethedifferentapproachesthatstudentstakeandtheirsignificancefordifferentiated
instruction.
AnexampleofthisapproachistheOngoingAssessmentProject(OGAP)6inmathematics,aframework
andsystemforanalyzingmathematicalreasoningofelementaryandmiddleschoolstudentsasthey
solveproblems.
Teachers
analyze
written
student
work
looking
for
evidence
of
mathematical
reasoning
andincreasinglevelsofsophisticationasstudentsprogressalonglearningtrajectories.Thediagnostic
andinstructionalutilityoftheitemsareenhancedbyexaminingthethinkingandstrategiesthatwent
intosolvingthem.Feweritemscanbeusedtoproducefarricherresultsbecausetheunderlyingthinking
issurfacedandmadeapparenttotheteacher.Figure3illustrateshowteacherfacingsoftwareenables
quickanalysisandrecordingofmeaningfulattributesofstudentworkcorrectnessofresponse,
sophisticationofthereasoning(alongatrajectoryfromadditivetransitionalmultiplicativestrategies),
andanyerrorsormisconceptionsthatemerge;thesetoolsandinterfacescanalsosupportremote
analysiswhenstudentworkisdigitizedandrouted.
Ingeneral,itwillbeessentialtoensurethatformativeassessmentresultsarenotincludedin
accountabilityreportingtoeliminatetheincentivesformisuse.Weenvisionthatthestudent,class,
andschoollevelresultswouldbeavailabletoteachers,coaches,andperhapsprincipals(toinform
professionaldevelopmentaswellasinstruction),butnottodistrict/stateadministrators.
6OGAPwasdevelopedasapartoftheVermontMathematicsPartnershipfundedbytheU.S.DepartmentofEducation(Award
numberS366A020002)andtheNationalScienceFoundation(AwardnumberHER0227057).
-
8/8/2019 ResnickBergerSystemModel
13/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
13
Figure3.TeacherFacingSoftware.
However,metricsoffidelityinimplementingtheformativeassessments(andtheirassociated
instructionalrecommendations)couldbeusedaspartofteacher/schoolperformance
management/accountability.For
instance,
are
teachers
doing
progress
monitoring
with
the
frequency
appropriateforeachstudent,giventhelongitudinaldataaboutthatstudent?Principalsand
district/stateofficialsshouldhaveaccesstothistypeofinformationinrealtime,sotheycanspotwhere
theremaybeweakinstructionalcapacityandprovidetimelyinterventions(includingtargeted
professionaldevelopment).Theywillwanttospotifteachersareusingtheformativesystemthewayin
which,andasoftenas,itshouldbeused.(DCpublicschoolsisanexampleofaschoolsystemthatis
alreadyusingthesetypesofformativeassessmentmetricsaspartofitsSchoolStatapproachto
continuous,districtwideperformancemanagement.)
Inaddition,theAmericanExaminationSystemplatformwouldprovidetoresearcherslongitudinaldata
includingformative
assessment
data,
organized
by
student/teacher/school/subgroup.
7
Inparticular,
this
datawouldbeusedaspartoftheresearchtosupportcontinuousimprovementofthesystem:tofine
tunethelearningtrajectories,measuresofproficiencyforeachstandard,andalgorithmsformass
customizationofassessments.
7Alldatawouldbeanonymoustoprotectprivacy(andpreventtheformativedatafrombeingusedforaccountability).
ResearcherswillbeabletoseethatStudentAhadTeacherXinSchoolYandseedataavailableforA,X,andY,butnotthe
identityofthoseindividualsandinstitutions.
-
8/8/2019 ResnickBergerSystemModel
14/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
14
Weexpectthatformativeassessmentfidelitydatawillbeespeciallyusefultoresearchers.Many
instructionalinnovations,whentestedunderrealclassroomcircumstances,failtoshowimpact:
researcherswonderwhetherthelackofresultswasbecauseofpoordesignorsimplybecausethe
teachersdidnotimplementitcorrectly.Inthefieldoflearningresearch,scholarsarepointingtothe
needfor
researchers
to
distinguish
between
poor
design
and
poor
implementation.
They
make
the
comparisonwithpharmaceuticaltrials,whereaprerequisitefortestingmedicalefficacyisknowing
whichofthetrialpatientstookthecorrectdosage(Rowan,Correnti,Miller,&Camburn,2009).
A New Paradigm for Educational Measurement: Adaptive Mass
Personalization
Webelievethatanadvancedmodelofeducationalmeasurementcanbebuiltonafoundationof
gatheringanorderofmagnitudemoredatabothinformalandformalabouteachstudentinthe
courseoftheyearsothateachtestmerelyenhancestheresolutionofapicturethatissubstantially
completebefore
each
test
begins.
Moreover,
by
applying
the
tools
of
mass
personalization
already
so
prevalentinInternetbasedcommerceandsocialnetworking,wewilleventuallybeabletopersonalize
eachassessmentattheindividuallevelsothattheenhancedresolutionitprovidesistargetedtoan
individualstudentscurrentlearninglevelaswellastoappropriatestandardsofreliabilityandvalidity.
Thatis,thesystemcankeepaskingquestionsuntilitknowsenoughtobeinstructionallyhelpfultothe
studentandtheteacheranduntilitknowsenoughtosupportrelevantpolicyandaccountability
decisions.
Stan dard izat ion Versus Personal i za t ion
Standardizationwastheengineofthefactorymodelthatdrovetheeconomyofthe19thand20th
centuries(Resnick
&
Resnick,
1977,
1980).
Now
the
powerful
drivers
of
the
economy
are
personalization
andcustomizationoftenappliedindirectcontradictiontoapreviouslyvaluedstandardizedoffering.
Amazon.com,forexample,learnswhatyouliketoreadandoffersanincreasinglypersonalized
bookstorejustforyouthatbecomesmorepreciseovertime.ThevideorentalchainNetflixhasnow
hostedseveralinternationalcompetitionsforimprovingtheirpersonalizationengine.
ThestatisticalenginesunderlyingpersonalizationontheWorldWideWebaredistinctfromthose
underlyingstandardizedtesting,buttheyarenowentirelyrobustandprovenindeedtheyaretested
andrefinedonadailybasisinlargescalecommerce,largescalemedicalresearch,andfinancialmarket
predictions.
It
is
time
to
bring
these
ideas
to
education
in
ways
that
will
dramatically
improve
the
precision
with
whichourformerlystandardizedtestsfulfilledtheirstandardpurposes,whilesimultaneouslyexpanding
theirusefulnesstoinformdailyinstruction,todiagnoseindividualpatternsinstudentlearning,andto
surroundstudentswithsupportsthatarepersonalizedtotheirneeds.
BecausetheAmericanExaminationSystemaimstoadministeralltypesofassessmentsforaverylarge
numberofstudentsoveraperiodofmultipleyears,acrossmultiplestates,andcantakeaccountof
variousothereducationdata,itshouldbeabletoserveasanengineformasspersonalizationofthese
-
8/8/2019 ResnickBergerSystemModel
15/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
15
assessments.Attributesthatcouldbethebasisofpersonalizationincludepaststudentperformanceon
assessments,teacherandschoolcharacteristics,aggregatedassessmentperformanceofstudentsina
school,previouseffectivenessofteacher,whichcurriculumwasused,andwhichassessmentshavebeen
used.Thistechnologyisscalablecomputingpowerissuchthatthereisnopracticallimitonthe
amountof
education
data
that
could
be
includedso
that
as
more
states
and
more
types
of
data
are
included,themoreprecise(anduseful)thecustomizationbecomes.
Thisinitialgoalformasspersonalizationwouldbetoapplyittoformativeassessment.Thereare,
alreadyinuse,manymodalitiesofformativeassessment(diagnostic,progressmonitoring,screening),
eachincludingamixofassessmenttypes(multiplechoice,constructedresponse,observation).Someof
thesearebestdeliveredaspartofgroupactivitiesandsomeoneononebetweenasinglestudentand
teacher.Manyteachers/districtsuseablendoftheseformativeassessments,whichmakessensegiven
thediverseneedsofparticularstudentsatdifferentmomentsoftheiracademicdevelopment;butmany
otherteacherswhoarenotthemselvesexpertsinformativeassessmentmethodologiesstruggleto
decidehow
best
to
integrate
all
of
these
choices
into
their
teaching
routines
for
their
particular
students.
So,inadditiontoprovidingneweducativeformativeassessments,theAmericanExaminationSystem
wouldmasscustomizeamuchwiderrangeofformativeassessmentsatthestudentandclasslevel.This
isadaptiveassessmentatthelevelaboveindividualitemsitfiguresoutwhichformativeassessmentto
giveandwhenenablingteacherstogetjusttherightnextpieceofinformationtheyneedabouttheir
students,withoutwastingalotofclassroomorotherschooltime.Withthisplatform,teacherswillbe
blendingmodesofassessmentinindividualizedwaysvaryingwhatdatatheycollectandhowbased
onwhatisknownsofarabouteachstudent.Tosupportthis,thesystemwillhostabankofformative
assessmentmaterials,tocoverthefullrangeofdiagnosticoptionsastateorschooldistrictwishesto
use,from
open
source
or
commercial
sources.
ThemasspersonalizationprocesscanalsoaddtothereliabilityandefficiencyofDAEs.AppendixC
showshowastandardstatisticalmodelcanusedatafrompreviousDAEstomakethenextDAEmore
efficient,aslongasthestudentisbehavingconsistentlyfromoneunittothenext.Ifthestudentseems
tobeperformingunusuallywell(orpoorly),thenthemodelcandetectthisandsuggestacustomization
oftheDAEtofurtherexplorewhatthestudentknowsandcando.
The Assessment P latform
TheassessmentPlatformmanagesbothpartsofthesystemtheDAEsandtheeducativeformative
assessmentstoenable
assessment
delivery,
scoring,
reporting,
and
analysis.
Based
on
widespread
classroomexperiencewithexistingproductsandoncurrentdesigns8(someofwhichhavebeenfunded
bytheGatesFoundation),itwillbeablehandlealloftheseelementsatscaleinacosteffectiveway,
whileminimizingadditionalburdensforteachers,students,andadministrators.
8TheauthorswishtoacknowledgethesupportoftheGatesFoundationinconceptualizinganextgenerationassessment
platformandformoregenerallyadvancingthefieldofalignedunitsofcurriculumandassessment.
-
8/8/2019 ResnickBergerSystemModel
16/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
16
Honeycomb
TheAmericanExaminationSystemwillprovideahoneycombaninteractivemapoflearning
trajectoriesandourhypothesesaboutthedependenciesamongthem.Thehoneycomboffersavisual
representation
of
the
instructional
and
assessment
space
that
needs
to
be
traversed
in
each
grade
as
wellasacrossgrades,allthewayfrompreKthroughGrade12.Itprovidesaframeforassemblingdata
onstudentperformanceinamannerthatwillsupportinferencesabouttheprogressofindividual
students,classesofstudents,schools,andschooldistricts.Itwillalsosupportresearchtovalidate/refine
thehypothesesaboutdependenciesamongtheskills(withinandacrosstrajectories)intheCommon
CoreStandardsandsimilarstatestandardsforinstance,identifyingwhatlevelofwhichspecificliteracy
skillsareneededtoachievemasteryofwhichmathematicsskills.
TheAmericanExaminationSystemwouldgiveeducatorssummativeandformativeassessmentsforeach
skillstepalongeachlearningtrajectory,startingwithmathematicsandliteracyforGrades310.Other
assessmentdataforinstance,existingformativeassessmentsforpreKthroughGrade3studentsor
highschool
examscan
also
be
mapped
onto
the
learning
trajectories.
All
of
this
data
can
be
included
in
thehoneycombsothatteachers,parents,andthestudentsthemselvescantrackindividualstudent
progress(andextenttowhichstudentsareontrack)towardcollegeandcareerreadiness.
ThehoneycombbuildsononeoftheintrinsicadvantagesoftheAmericanExaminationSystem,whichis
thatitoffersahighlycoherentandintegratedpackageofsummativeandformativeassessments.In
particular,thesystemsrapidscoringworkflowandreportinginterfacewouldenableeducatorstouse
theDAEresultsfordiagnosticpurposesattheindividualstudentandclasslevel.Forexample,where
studentshavewrittenanessay,teacherswouldbeabletoseewhetherstudentscanwritethesortof
complexsentencesandcanmakeargumentsoutofideasthatareappropriateforthegradeslearning
trajectory.
The
pre
tests
for
each
exam
would
be
especially
useful
in
this
regard
because
the
pre
tests
assessthetopicsandstandardsthatteacherisabouttoteach.
Eachhexagonofthehoneycombcouldalsolinktoinstructionalresources(includingvideoexemplars
andsocialnetworking/collaboration).SeeFigures4and5.
Thistoolcanbeadaptedforuseinanystatewhosestandardsincludelearningtrajectoriescomparable
tothoseintheCommonCoreStandards.Weenvisionthattherewouldbetwomeasuresofproficiency
indicatedforeachskill/hexagon:thefirstbasedonformative(nostakes)dataandthesecondbasedon
summative(highstakes)data.
Put t in g Power and Choice in the Hand s o f Teachers
Theplatformwillincludeanassignmentbuilder,sothateducatorscanselectformativeassessment
itemsastasksforusebythestudentsintheclassroomorashomework.Thisallowstheteachersto
focusstudentworkontheparticularconceptsandskillsthattheyneedtodevelop.So,forinstance,a
teachercoulddrilldownfromaspecifichoneycombhexagon(CommonCoreStandard)tobuildan
assignmentforasubsetofherstudents.SeeFigures6and7.
-
8/8/2019 ResnickBergerSystemModel
17/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
17
Figure4.HoneycombforMathematicsSixthGrade.
Figure5.EachHexagonoftheHoneycombCouldAlsoLinktoInstructionalResources.
-
8/8/2019 ResnickBergerSystemModel
18/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
18
Figure6.Assignmentbuilder.
Figure7.Individualassignment.
-
8/8/2019 ResnickBergerSystemModel
19/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
19
Other P la t for m Tools
Inadditiontoprovidingthehoneycombandtoolstosupportmasscustomization,theAmerican
ExaminationSystemplatformwill:
Enablestudentstotaketheassessmentsonlineoronpaper; Enableteachers/schoolstoscananduploadpaperbasedassessmentsandother
studentwork;
Manageremotescoringworkflowandprovidescoringinterfaceforremoteraters; Provideteacherswithascoringinterface(couldincludeabilitytomarkupstudentwork
andrecordnotes)andareporting(gradebook)interface;
Providedashboardtoolsfortrackingandanalyzingtheprogressofparticularstudentsand
groups
and
students;
Provideprincipalsanddistrict/stateadministratorswithareportinginterfacethatincludesaggregateanalysis(includingcrossclass,crossteacher,crossschool,cross
districtandcrossstateanddemographiccomparisons,withthelongitudinal
dimensionsincludingvalueaddedonendofyearhighstakesincludedineach);
Allowuserstogeneratecustomreportsinrealtimeondemandwithbothteacherandprincipal/administratorinterfaces;
Allowteacherstoshareformativeassessmentswitheachotherandexpertstogaininstructional
advice
and
create
opportunities
for
professional
development;
and
Providerolebasedaccessrights(includingtoprotectstudentprivacy).9Thus,thesystemwillgatherandprovidereadyaccesstoaccountabilityinformation,andalsohelp
teachersandschoolstoimprovelearningmeasuredbyrigorousstandardsandgoodinstructional
practices.ItwouldcoverthefulltrajectoryfromPreKthroughGrade12.
TheAmericanExaminationSystemwouldnotassumethatallassessmentswillalwaysbeconductedwith
studentssittingatcomputers.Givencurrentschoolinfrastructure,andgiventhechallengeofshowing
mathematicsworkviakeyboard,itmaybemoreefficienttocontinuetorelytosomeextentonpaper
andpencil
inputs
to
an
otherwise
digital
system.
The
continued
value
of
these
primitive
recording
toolsseemsespeciallycompellingwhenoneconsidersthatmuchofthevalueofthenewgenerationof
assessmenttasksdependsonsolicitingopenendedexpressionsofstudentreasoningandthinkingand
inthecaseofmathematicsthisincludesdrawings,graphsandexplanations.
9Toensureprotectionofstudentprivacyrights,thesystemhasthecapacitytomakedigitizedstudentworkanonymousbefore
routingittoremotescorers.
-
8/8/2019 ResnickBergerSystemModel
20/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
20
SotheAmericanExaminationSystemwouldincludeaprocesstoenablescanning/digitalphotographing,
uploadingandarchivingofverylargevolumesofpaperbasedstudentwork,includingforDistributed
AccountabilityExams,toenableremotescoringaswellasonlinestudentportfolios.The
scanning/photographingprocess,whichhasalreadybeentestedinNorthCarolinaclassrooms,puts
minimalburdens
on
teachers
or
other
school
staff
and
does
not
require
large
per
school
investments
in
hardwareornetworkinfrastructure.
Fortheforeseeablefuture,assessmentofopenendedexpressionsofstudentreasoningandthinking
willrequireatleastsomeelementofhumanscoring.Doingthisrigorouslyandreliably,especiallyina
summativecontextwheretherearestakesforteachersandschoolsaswellasforstudents,requires
findingacosteffectiveandtimeeffectiveworkflowfordirectingtheworktoremotescorers(including
crossschoolorcrossstategrading/validationexercises;regradingofasampleofstudentpapersatthe
statelevel).
TheAmericanExaminationSystemplatformenablesthisworkflow.Itautomatesdeliveryofdigitized
studentwork
(including
paper
and
pencil
work)
to
raters
and
those
validating
the
ratings.
Student
identityiskeptprivate(theratersdonotknowwhoseworkitis).Theonlineinterfaceforremoteraters
presentsthemwiththestudentworkalongsidescoringformsbasedontherubricappropriateforthat
typeofwork.SeeFigure8.
Theplatformwillallowteachers,principals,districtsandpotentiallyparentsandthestudents
themselvestogeneratecustomreportsinrealtimeondemand.Thesereportswouldaggregate
longitudinaldatafromdifferentDistributedAccountabilityExamsandformativeassessmentstoprovide
amorecompletepictureofeachstudent,class,andschool.
Figure8.TheOnlineInterfaceforRemoteRatersfortheAmericanExaminationSystem.
-
8/8/2019 ResnickBergerSystemModel
21/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
21
Development and Costs
OurvisionfortheAmericanExaminationSystemisambitious.Whatmakesitrealisticisthesubstantial
amountofworkthathasalreadybeendoneindevelopingthecontentandtoolsneededtomakeit
work.
Forinstance,IFLhasextensiveexperienceindevelopingmodelinstructionalunitssuchastheonesthat
willbepartofthesystemandinworkingwithschoolsystemstotailortheunitstolocalneedsand
preferences(McConachie&Petrosky,2010).IFLunits(andaccompanyingassessments)arebuiltintothe
curriculumguidancesystemofseveralurbanschooldistrictsandhavebeenshowntoproducehigh
levelsofteacherengagementandimprovedinstructionwhenaccompaniedbyappropriateformsof
professionaltraining(David&Greene,2008;Resnick,inpress;Talbert&David,2008).
Manyoftherequiredtechnologiesarealreadyinuseinexistingassessmentanddatamanagement
applicationsorarenowbeingdevelopedthroughWirelessGenerationandthroughvariousinitiativesof
the
Bill
and
Melinda
Gates
Foundation
to
create
aligned
systems
of
curriculum
and
formative
assessment.Thus,forinstance,muchoftheplatformforauthoringandadministeringcognitively
demandingassessmentitemsatscalewillbeavailableforonlineuseinDecember2010.
Thesystemwedescribeisonethatwilloperatefullyabout3yearsfromthebeginningoftheprocess,
withmasspersonalizationofsummativeassessmentplayingalargerroleattheendofthattimeframe.
Muchofthesystem,includingtheDistributedAccountabilityExams,EducativeFormativeAssessments,
andotheraspectsofthetechnologyplatform,willbeoperationalin2years.
Plat form
BasedonthedirectexperienceofWirelessGenerationinbuildingasystemofcomparablecomplexity
(ARIS,the
education
information
system
for
the
countrys
largest
public
school
system),
we
estimate
thatasecureandscalableversionoftheinitialplatformcanbeavailableforusein6monthsafterwork
ontheprojectformallybegins;additionalfunctionalitywouldbeavailableafter12months;anda
comprehensivesysteminuseatscalein18months.Additionaldevelopment,relatedtotheresearch
androlloutofthemasspersonalizedaspectoftheassessments,wouldtakeplacewithinalongertime
frame(36months).
Assessments
Theplatformcouldbeusedtodevelopandrolloutassessmentsaccordingtothefollowingthreephases:
establishmentofcontentvalidity(forboththeDAEsandEducativeFormativeAssessments);
establishmentof
instructional
validity
(for
the
DAEs);
and
then
the
use
of
the
system
for
summative
accountabilitypurposes.Statesshouldbeconsultedtodeterminewhichgradesandwhatsubjectsto
prioritize.
Weanticipatethat,after12months,theDistributedAccountabilityExams,includingtheunitsformodel
instruction,foruseforGrades310,willreceivesignoffoncontentvaliditybyStateDepartments.The
EducativeFormativeAssessmentscouldbegintobeusedduringthisfirstyearafterthecontentis
validated.
-
8/8/2019 ResnickBergerSystemModel
22/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
22
Duringmonths1324,orsoonerifpossible,wewilldotheexperimentstocaptureinstructionalvalidity,
beginningassoonascontentvalidityisestablished.
Afterthis24monthperiod,theDistributedAccountabilityExamswouldbeusedforsummative
accountability
purposes.
Operat iona l Costs
Weestimate,foratypicalstate,theongoingcostsofthesystemwillbeaboutthesameasforcurrent
NCLBtests.Currentexpendituresaretypically$20$30perstudent,andinsomecaseshigherthan$80,
tocoverreadingandmathematics(U.S.DepartmentofEducation,2010).
AdministeringtheDistributedExams(includingthepreinstructionversion)willcostmoretodevelop
andscorethanthecurrenthighstakestests,iffornootherreasonthantheirfrequency.Butthecurrent
interimexams,andexpensesassociatedwiththose(typically$15$20ormoreperstudentperyear),
couldbeeliminated.
Teacherswithintheschooldistrictcouldscoretheexamsfromeachothersstudents,butasignificant
portionoftheongoingcostwouldbefromvalidationofsamplesofteacherscoring.
Apartfromprovisionoftablet/handheldandscanningdevicesforteachers(offtheshelf,industry
standardtechnologiesthatarecomingdowninpriceeachyear),costsassociatedwiththemaintenance
ofthetechnologyplatformwouldbeminimalwhenconsideredonaperstudentbasis.
Key System Characteristics
Rigorous Standards and Good Instructional PracticesThenewCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystem
thatiscloselytiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecore
gradelevelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.Theyare
specifiedatagrainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly
meaningfulassessments.
TheAmericanExaminationSystemincludesDistributedAccountabilityExams,foruseoverthecourseof
theschoolyear,whichmeasurethespecifichigherorderskillsthatarearticulatedintheCommonCore
Standardsandstatestandards,aswellasbasicknowledge.TheDistributedAccountabilityExamswill
include
extended
written
work
and
other
open
ended
expressions
of
student
reasoning
and
thinking;
in
mathematics,thesewouldincludedrawings,graphs,andexplanations.Theywillassessbasicknowledge
bothwithintheseconstructedperformancesand,whereappropriate,inclustersofmultiplechoice
items.After24months,thesetestswillbegintoreplacecurrentsummativetestsforaccountability
purposes.
TheDAEswillreflectwhatshouldbetaught(specifictopicsdeterminedbystateandCommonCore
Standards).DistributedAccountabilityExamswilladdresseachoftheskills/topicsarticulatedforeach
-
8/8/2019 ResnickBergerSystemModel
23/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
23
yearofthestateandcommonstandards.Inthefirstwave,therewillbeDistributedAccountability
ExamsformathematicsandliteracyforGrades310inliteracyandmathematics.Afterthat,sample
itemswouldbepublishedandinvitationsextendedforparticipatoryauthorshipofassessmentitems
thatrelatetothestandardsthatarebeingtestedandtheparticularitemandassessmenttypes.
TheDAEswouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.Eachexamwouldprovide
areliableestimateofstudentknowledgeonthecontentofaninstructionalunitthatisexplicitly
targetedtoastandard,orsetofstandards,intheCore.Thecollectionofexamscoresforayear(e.g.,
fivemathematicsexamsineachofGrades6and7)wouldprovideavalidestimateoftheextentto
whichastudent(class,school)hasmasteredthecontentspecifiedbythestandardsforthatyear.
Cont ent Va l id i t y
Theexamswouldmatchcloselyinbothcontentandformthecontentthatisexpectedtobetaughtin
eachoftheinstructionalunits.Newinstructionalunits,explicitlylinkedtotheCorestandardswouldbe
createdto
anchor
the
content
validity
of
the
units.
Teams
of
independent
content
and
instructional
expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsandareof
highinstructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodel
instructionalunits.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandards
thatnowtroublesmanystateassessments.
I ns t ruc t i ona l Val i d i t y
Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality
instructiononthecontentoftheassessment.Ourdevelopmentprocesswouldincludetestsof
instructionalvalidity,similartotheexperimentbasedonesusedbythePittsburghScienceofLearning
Center.These
tests
would
involve
panels
of
teachers
with
good
knowledge
of
an
instructional
units
contentaswellasdemonstrablygoodpedagogicalskills(asjudgedbyanexpertpanel).Theseteachers
wouldbeputintofourgroups.Twoofthegroupswouldteachtheinstructionalunitthatcorrespondsto
theDistributedAccountabilityExam.Inoneofthesegroups,theywouldgetPretestAfortheirstudents
beforetheunitistaughtandthenthestudentswouldtakeTestB.Inthesecondofthesegroups,the
testsareflipped:TestBisthepretestandTestAisgiventostudentsaftertheunitistaught.Inthethird
andfourthgroups,studentswouldnotbetaughttheparticularinstructionalunitatthattime,butwould
stillbegiventhepretestsandposttests(onegroupwithAasthepretestandBastheposttest,the
otherwithBasthepretestandAastheposttest).Onlyteststhat,throughtheseexperiments,
systematicallyregisterimprovementsinstudentperformanceasaresultofcorrespondinginstruction
(anddemonstrate
equivalence
through
the
pre
and
post
test
swaps)
will
be
included
in
our
Distributed
AccountabilityExams.Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabank
oftasksthatwillbedevelopedaspartofthisvalidationprocess.
Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabankoftasksthatwillbe
developedaspartofthisvalidationprocess.ItemsortasksfortheDAEswillalsobepretestedand
calibratedusingstandardclassicalandmultidimensionalIRTframeworks.Availabilityofmultipleforms
oftheDAEswillallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,along
-
8/8/2019 ResnickBergerSystemModel
24/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
24
withteacherandschooleffectiveness.10Inaddition,preinstructionresultscanbeusedbyteachersas
partoftheformativedatatheyusetoplananinstructionalunit.
Rel iabi l i ty
DistributedAccountability
Exams
would
contain
a
mix
of
short
constructed
response
items,
and
more
extendedwrittenresponses,alongwithsetsofmultiplechoiceitemsasappropriatetothestandard
beingexamined.Shortandlongconstructedresponsecomponentswouldrequirehumanscoring.
Researchhasestablishedthatwhenconstructedresponsetasksarewelltargeted,scoringrubricsare
specificandgradersaretrained,ahighlevelofinterraterreliabilitycanbeattained(Mariano&Junker,
2007;Patzetal.,2002;Rayn&Shepard,2008).
Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut
notbythestudentsownteacher),orbygeographicallyandsociallyremotescorers(includingteachers
elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods
thathave
been
used
in
European
countries
(e.g.,
cross
school
or
cross
state
grading
exercises;
re
gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationinthegradingexamsand
therelatedvalidationexercises(someofwhichcouldbefacetoface)isagoodprocessforprofessional
learningandisusedinmostcountries.Thoughtheprocessismorecostlyindollarsthanmachine
scoring,itisaneducativeprocessworthbuildingintoourExaminationSystem.Gradevalidationatscale
wouldbesupportedbytheAmericanExaminationSystemplatform,whichcanenablerapid,cost
effectiveremotescanning,transmission,grading,validation,andreporting.Toensureprotectionof
studentprivacyrights,thesystemhasthecapacitytoanonymizethedigitizedstudentworkbefore
routingittotheremotescorersandvalidators,aswellas,forlimitedpurposes,automaticessayscoring
technologies.
TheDistributed
Accountability
Exams
open
the
possibility
for
increased
use
of
constructed
responses
becausetheyaredistributedoverthecourseoftheyear,yieldingseveraltimesmoreopportunityto
collectdatathancurrentendofyeartests.Thisalsobringsbenefitsintermsofincreasedtestreliability.
Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEs
takentogetherwouldbe5*(0.7)/(1+4*0.7)=0.92.Ifinstead,halfofeachDAEstestingtimewereused
forapretestonthenextinstructionalunitorsimplyforcalibratingfuturetestitems,theimprovement
wouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.
Yettoobtainthesemorereliableresults,studentswouldnothavetositfora5hourexam,oreventake
anendofyearexam.Theyjustwouldhavetotakeunitexamsastheynormallywouldinthecourseof
teaching,but
now
with
the
unit
exam
contributing
to
an
overall
accountability
score.
Another
advantage
isthatstudentswouldbetestedonrecentlylearnedmaterialatalltimes,sothatnuisanceeffectsof
delayedrecallwouldnotinfluencemeasuresofhowwellstudentswerelearningwhattheteachers
taught;thiswouldprobablyincreasereliabilityevenmore.
10Ifthepreinstructionversionsarenotlongenoughtobereliabletoestimateinstructionaleffectsonindividualstudents,then
thoseeffectswillbeestimatedonsomeaggregatelevel.
-
8/8/2019 ResnickBergerSystemModel
25/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
25
Althoughtheillustrationaboveisuseful,inrealityitlikelywillnotbepossibletostringtogetherDAEs
intoasingleunidimensionalmeasurementtowhichclassicalreliabilitycalculationsapply.Insteadwe
believetheDAEswithinasubjectwillbeatleastmildlymultidimensional;ifweconsidereachDAEwithin
asubjectwithinayearasameasureofoneproficiency,fiveDAEswouldbemeasuringfivedifferentbut
substantivelyrelated
proficiencies.
These
proficiencies
are
likely
to
be
statistically
related
as
well.
For
exampleinNAEP,proficiencysubscaleswithinthesamesubjectareaaretypicallycorrelated0.8or
higher,andseldomlowerthan0.50.6.Wecanexploitthesecorrelationsbybuildingamultidimensional
BayesianlatentvariablemodeltotakeadvantageofproficiencyestimatesfromoldDAEstohelp
producemorepreciseproficiencyestimatesforthenextDAE,orindeedtoshortenthenextDAEwithno
lossinmeasurementprecision.
Forexample,suppose11wewishtoestimateastudentsproficiencywithamarginoferrorof0.2(SEM=
0.1),andeachitemcontributesroughlyoneunitofFisherinformationtoproficiencyestimation(here
weareborrowinganIRTformulationforspecificity),thenthestudentwouldneedtoanswerroughly
100items.
However,
if
we
could
already
predict
the
proficiency
on
this
DAE
with
a
margin
of
error
of
0.4
usingpastDAEperformance,wewouldneedonlyroughly20moreitemstoobtainamarginoferrorof
0.2onthisDAE.
ThiscalculationdependsonthestudentsperformanceonthenewDAEbeingconsistent,inawaythat
canbemadepreciseusingBayesianmodeling,withhis/herperformanceonpastDAEs.Ifthestudents
responsesonthenextDAEareinconsistentwithhisorherolderDAEresults,wewouldneedtodo
followuptestingtogetamorepreciseestimateofthestudentsproficiency.Thusforstudentswho
learnconsistentlyfromoneunittothenext,wecanexploitpastperformancetohelpestimate
proficiencyonthecurrentunitofinstruction.However,forexample,forthestudentwhoperforms
unusuallywell(orpoorly)onthecurrentunit,wecanusetheBayesianmachinerytoseethe
inconsistency,and
offer
another
block
of
items
in
order
to
more
precisely
assess
that
students
learning.
Asimilarprocessisusedinonlinetutoringsystemsandadaptivetestingsystems,andisanillustrationof
thekindofusefulcustomizationthatisdiscussedbelow.
Distributedcontentandinstructionallyvalidatedexamsareanextlogicalstepinendingthetestingbind
anddevelopinganassessmentsystemthatwilldetectandrewardhighquality,effectiveteaching.
Insteadofsupportingtheuseofpracticematerialsthatmimictheoldendofyeartests,statescan
providehighqualityinstructionaltoolsthathelpteacherspreparestudentsforDAEexaminations.12
Therewillbenoneedforinterimtests,sinceDAEsandrelatedformativeassessmentswilloccur
throughouttheschoolyearattimesthatmakeinstructionalsense.Withthissystem,wegainabilityto
measurea
set
of
higher
order
skills
that
are
not
easily
otherwise
tested,
including
ones
essential
to
collegeandcareerreadyperformanceinreading,writingandmathematics,withoutaddingenormous
burdenoftesting.
11Thenumbersarechosenheremostlyforcomputationalconvenience,andmaynotreflecttheactualvaluesobtainedfrom
itemprecalibration,etc.12
Foradescriptionofapproachestoprovidingthiskindofinstructionalguidanceinformsthatdonotsuppressteacher
ingenuityandjudgment,seeResnick(inpress).
-
8/8/2019 ResnickBergerSystemModel
26/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
26
AnotherwayofvalidatingtheDistributedAccountabilityExamsscoreswouldbetocomparethemto
NAEPscores.StatesmightexpandtheuseoftheNAEPtest(everyyearand/orincreasethepercentage
ofstudents).
The
American
Examination
System
will
also
foster
a
rich
environment
of
formative
assessments
that
are
educativeinwaysthatdirectlyresemblethesummativesystem,butwithmoredirectapplicationto
dailyandweeklyinstruction.
TheywouldbealignedwiththelearningtrajectoriesderivedfromtheCommonCoreStandards,andthusalignedwithwhatteachersneedtoteach.
Theywouldmodelapproachestohowtoteach,andwould,attherequestofeducators,provideteachersstructuredopportunitiesforgainingexperienceinusingthoseteaching
methods.
Teachers
would
make
these
assessments
part
of
their
instructional
routine,
rather
than
anadditiontoit.Dataentry/recordkeepingburdenswillbeminimal,andteacherswill
haveeasyandquickaccesstostudent andclasslevelreportingaswellastoolsto
understandtheinstructionalsignificanceofthatdata.Bytrackingfidelityintheuseof
thesediagnostictools,thesystemwillhelpteacherstousethemappropriately.
Formativeassessmenttasksthatcannotbemachinescoredwillbeaccompaniedbysimplerubricsfor
quicklyanalyzingthestudentwork.Teacherswillbeabletousedigitaldevicestorecordtheseanalyses.
Throughthosedevices,theteacherswillalsobeprovidedwithsamplesofanswersthatcorrespondto
eachlevelontherubric,tohelpthemcalibratetheirownanalyses.Asaformofprofessional
developmentandtoimprovethereliabilityofanalyses,teacherscouldalsouploadthestudentwork
intothesystem,alongwiththeiranalyses,togetfeedbackfromotherteachersorsubjectmatter
experts.Theformativeassessmentswouldingeneralnotbeusedforsummativepurposes,butmetrics
ofteacherfidelityinimplementingtheformativeassessments(andtheirassociatedinstructional
recommendations)couldbeusedaspartofteacher/schoolperformancemanagement/accountability.
Toenableteacherstomakebestuseofallofthese,thesystemwillprovideanonlineplatformwhich
includes:thehoneycomb(totrackstudentprogressonlearningtrajectoriestowardscollegeandcareer
readiness,andtoaccessdiagnosticandinstructionalsupportforeachstageofeachtrajectory);other
dashboardtoolsfortrackingandanalyzingtheprogressofparticularstudentsandgroupsandstudents;
andinterfacesforuploading,sharing,scoring,reportingandanalyzingstudentwork.
Becausethe
system
will
administer
both
types
of
assessments
(Distributed
Accountability
Exams
and
formative),foraverylargenumberofstudentsoveraperiodofmultipleyearsandpotentiallyacross
multiplestates,andcantakeaccountofvariousotherstudent,teacherandschooldata,itwouldalso
eventuallybeabletoserveasanengineforthemasspersonalizationofassessments.Mass
personalizationforformativeassessmentcouldbedoneacrossmanydimensionstoinclude:past
studentperformanceonassessments;teacherandschoolcharacteristicsincludingaggregated
assessmentperformanceofstudentsandothermeasuresofpreviouseffectiveness;andwhich
-
8/8/2019 ResnickBergerSystemModel
27/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
27
curriculumwasused.Thisadaptiveorpersonalizedapproachtoassessmentwillenablegreaterprecision
inthedata;closeralignmenttothetaughtcurriculum;andlesstesting.
Thisinitialgoalformasspersonalizationwouldbetoapplyittoformativeassessment.Thereare,
already
in
use,
many
modalities
of
formative
assessment
(diagnostic,
progress
monitoring,
screening),
eachincludingamixofassessmenttypes(multiplechoice,constructedresponse,observation).Someof
thesearebestdeliveredaspartofgroupactivitiesandsomeoneononebetweenasinglestudentand
teacher.Manyteachers/districtsuseablendoftheseformativeassessments,whichmakessensegiven
thediverseneedsofparticularstudentsatdifferentmomentsoftheiracademicdevelopment;butsome
teacherswhoarenotthemselvesexpertsinformativeassessmentmethodologiesstruggletodecide
howbesttointegrateallofthesechoicesintotheirteachingroutinesfortheirparticularstudents.
So,inadditiontoprovidingnewEducativeFormativeAssessments,theAmericanExaminationSystem
wouldmasscustomizeamuchwiderrangeofformativeassessmentsatthestudentandclasslevel.This
isadaptiveassessmentatthelevelaboveindividualitemsitfiguresoutwhichformativeassessmentto
giveand
when
enabling
teachers
to
get
just
the
right
next
piece
of
information
they
need
about
their
students,withoutwastingalotofclassroomorotherschooltime.Withthisplatform,teacherswillbe
blendingmodesofassessmentinindividualizedwaysvaryingwhatdatatheycollectandhowbased
onwhatisknownsofarabouteachstudent.Tosupportthis,thesystemwillhostabankofformative
assessmentmaterials,tocoverthefullrangeofdiagnosticoptionsastateorschooldistrictwishesto
use,fromopensourceorcommercialsources.
ThemasspersonalizationprocesscanalsoaddtothereliabilityandefficiencyoftheDistributed
AccountabilityExams.Above,weshowedhowaBayesianmodelcanusedatafrompreviousDAEsto
makethenextDAEmoreefficient,aslongasthestudentisbehavingconsistentlyfromoneunittothe
next.
If
the
student
seems
to
be
performing
unusually
well
(or
poorly)
then
the
Bayesian
machinery
can
detectthisandsuggestacustomizationoftheDAEtofurtherexplorewhatthestudentknowsandcando.
Technology
Del i very
Integratedonlinedeliveryofallassessments.Bothsummative(DistributedAccountability)and
formativeassessmentsdeliveredtoteachersand/orstudentsacrossandwithinstatesthroughasingle
softwareplatform.Thesystemenablesacoherentuseofmultipletypesofassessments(includingtypes
thatwillbeadministeredonpaperandthenscanned)aspartofeffortstohavestudentsmeetthe
standards
and
move
along
the
skill
trajectories
towards
college
readiness
and
career
readiness.
ThehoneycomboffersaninteractiveonlinemapoflearningtrajectoriesbasedontheCommonCore
Standards.Itprovidesanintuitiveandaccessiblewayforeducatorstounderstandandmakeuseof
thesetrajectoriesallthewayfromPreKthrough12.Itwillalsoenablethemtograspthedependencies
amongandwithinthetrajectoriesforinstance,identifyingwhatlevelofwhichspecificliteracyskills
areneededtoachievemasteryofwhichmathematicsskills.Thistoolcanadaptedforuseinanystate
whosestandardsincludelearningtrajectoriescomparabletothosethatwillbeintheCommonCore.
-
8/8/2019 ResnickBergerSystemModel
28/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
28
TheAmericanExaminationSystemwilldeliverDistributedAccountabilityExams,formativeassessments
andavailableinstructionaloptionsforeachstepalongeachlearningtrajectory,startingwith
mathematicsandliteracyforGrades310.Thehoneycomballowseducatorstovisualizethesequenceof
assessmentsandinstructionaloptionsalignedwiththelearningtrajectories;theywillbedisplayedfor
educatorsat
intervals
along
scales
that
include
the
entire
range
of
skills
to
be
taught
in
PreK
12.
Other
(nonDAE)formativeassessmentsandinstructionaloptions,includingforPreK2and1112,canalsobe
alignedanddeliveredthroughthesameinterfacetohelpeducatorsusetheminacoherentwayto
identifyandaddresstheparticularlearningneedsofeachstudentastheymoveonthepathstowards
collegeandcareerreadiness.
Masscustomizationofassessments.BecausetheSystemwilladministeralltypesofassessmentsfora
verylargenumberofstudentsoveraperiodofmultipleyearsandpotentiallyacrossmultiplestates,and
cantakeaccountoftakeaccountofvariousothereducationdata,itwillbeabletoserveasanenginefor
themasspersonalizationofassessments.(Dimensionsandbenefitsofmasscustomizationdiscussedin
RigorousStandards
and
Good
Instructional
Practices
section.)
This
technology
is
scalablecomputing
powerissuchthatthereisnopracticallimitontheamountofeducationdatathatcouldbeincluded
sothatasmorestatesandmoretypesofdataareincluded,themoreprecise(anduseful)the
customizationbecomes.
Scor ing
Enableteachers/schoolstoscananduploadstudentwork.TheAmericanExaminationSystemdoesnot
assumethatallassessmentswillalwaysbeconductedwithstudentssittingatcomputers.Givencurrent
schoolinfrastructure,andgiventhechallengeofshowingmathematicsworkviaakeyboard,itmaybe
moreefficienttocontinuetorelytosomeextentonpaperandpencilinputstoanotherwisedigital
system.The
continued
value
of
these
primitive
recording
tools
seems
especially
compelling
when
one
considersthatmuchofthevalueofthenewgenerationofassessmenttasksdependsonsolicitingopen
endedexpressionsofstudentreasoningandthinkingandinthecaseofmathematicsthisincludes
drawings,graphs,andexplanations.
SotheAmericanExaminationSystemincludesaprocesstoenablescanning/digitalphotographing,
uploading,andarchivingofverylargevolumesofpaperbasedstudentwork,includingforDistributed
AccountabilityExams,toenableremotescoringaswellasonlinestudentportfolios.The
scanning/photographingprocess,whichhasalreadybeentestedinNorthCarolinaclassrooms,puts
minimalburdensonteachersorotherschoolstaffanddoesnotrequirelargeperschoolinvestmentsin
hardwareornetworkinfrastructure.
Remotescoringworkflowandinterface.Fortheforeseeablefuture,assessmentofopenended
expressionsofstudentreasoningandthinkingwillrequireatleastsomeelementofhumanscoring.
Doingthisrigorouslyandreliably,especiallyinasummativecontextwheretherearestakesforteachers
andschoolsaswellasforstudents,requiresfindingacosteffectiveandtimeeffectiveworkflowsfor
directingtheworktoremotescorers(includingcrossschoolorcrossstategrading/validationexercises;
regradingofasampleofstudentpapersatthestatelevel).
-
8/8/2019 ResnickBergerSystemModel
29/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
29
TheAmericanExaminationSystemplatformenablesthisworkflow.Itautomatesdeliveryofdigitized
studentwork(includingpaperandpencilwork)toratersandthosevalidatingtheratings.Student
identityiskeptprivate(theratersdontknowwhoseworkitis).Theonlineinterfaceforremoteraters
presentsthemwiththestudentworkalongsidescoringformsbasedontherubricappropriateforthat
typeof
work.
Formativeassessmentinterface.Forformativeassessment,theplatformprovidesascoringinterface
forteacherssimilartotheoneforremotescoringofDistributedAccountabilityExams.Thisinterface
includestoolstomarkupstudentworkandrecordnotes.Teacherscanalsoeasilyemailthemarkedup
worktostudentsandtheirparents(sotheygetfeedbackonthesamedaythattheassessmentwas
delivered).Whenelectronicessayscoringtechnologieswillbeusedtoaddprecisionand/orcanhelp
teachersmanagethetriageassociatedwithknowingwhichpapersmightrequirespecialattention.
SimilartoWirelessGenerationsmClassplatform,theAmericanExaminationSystemplatformcouldalso
includemobiletoolsthatenableteacherstodigitallyrecordwhattheyareobservingwhiletheyare
activelyinvolved
with
the
class.
Because
formative
assessment
is
part
of
each
teachers
day
to
day
instruction,capturingtheresultingdataprovidesawaytotrackinstructionalfidelity(whetherthe
teachersareusingtherecommendedgoodinstructionalpractices).
Repor t i ng
PlatformprovidesreportsandreportinginterfacesdescribedintheReportingsectionbelow.
Summative Assessments That Measure Growth and That P roject
Readiness
TheCommonCoreprovidesafoundationforacriterionreferencedexaminationsystemthatisclosely
tiedtoinstructionyetmeetscrucialcriteriaoftechnicalqualityofassessment.Thecoregradelevel
standardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.13Theyarespecifiedata
grainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondinglymeaningful
assessmentstojudgeprogresstowardcollegeandcareerreadiness.
TasksoritemsfortheDAEswouldbepretestedandcalibratedusingstandardclassicalandmulti
dimensionalIRTframeworks.Attheoutset,twoversionsofeachDAEwouldbedeveloped.Thetwo
versions,oneadministeredbeforeinstructionandoneafterwards,wouldbeusedbytheassessment
developerstoestablishinstructionalvalidityoftheexams.AvailabilityofmultipleformsoftheDAEs
wouldallowstatesanddistrictstousethecontentbasedexamstoplotstudentgrowth,alongwith
teacherand
school
effectiveness.14
13Someofthelearningsequencesinthestandardsarebasedonresearchconductedbymultiplescholarsoverthreedecades.
Othersarebasedonwellhonedintuitivejudgmentsbyexpertscholarsandpractitioners.Allwillrequirefurthervalidationin
useoverthecomingyears.Whatisnewandimportantinthecurrentcorestandardseffortisthatthestandardsareorganized
intomultidimensionalsequencesoflearningthatcaninformbothassessmentandinstruction.14Ifthepreinstructionversionsarenotlongenoughtobereliabletoestimateinstructionaleffectsonindividualstudents,then
thoseeffectswouldbeestimatedonsomeaggregatelevel.
-
8/8/2019 ResnickBergerSystemModel
30/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
30
Studentgrowthforpurposesofassessingprogresstowardcollegeandcareerreadinesscanbe
definedasprogressalongtheCommonCorelearningtrajectories.Inthisway,theAmerican
ExaminationSystemmeasurestheextenttowhichstudentsareontrack(andstudentgrowth)allthe
wayfromPreKthrough12.
Thisapproachallowsmeasurementnotjustofwhetherstudentsareontrack,butalsoidentifieswhich
specificskilldeficitsareholdingeachofthemback.Itallowsteacherstoanswerthequestion:what
shouldtheinstructionalfocusberightnow,tomovethisparticularstudentorgroupsofstudents
forwardtowardscollegeandcareerreadiness?Italsoidentifieswhereinstructionalpracticesand/or
curriculummayneedtobereworked(wherethemeasuresshowthatthemajorityofstudentshavenot
gainedaskill).
Thehoneycombservesasavaluablewaytodisplaythesemeasuresofstudentgrowthforthestudents
andtheirparents,becauseitoffersaneasilycomprehensiblemapofthatstudentsprogress,relativeto
time,andtothestandardsforeachgradeaswellastotheultimategoalsofcollegeandcareer
readiness.
Accessibility
AllpartsofoursystemincorporatetheprinciplesofUniversalDesignforLearning.
TheexamscanandshouldremovebarriersfornonnativeEnglishspeakersandforstudentswithspecial
learningneeds.FornonnativeEnglishstudents,thetestsshouldbedesignedsothatlanguagewillnot
unnecessarilymakethemeaningofthequestionsunclearsothatthesestudentswillunderstandthe
examssothattheycanbemeasuredfairly.
TheDAEswouldmirrortheinstructionthatstudentswillreceiveintheclassroom;wewouldcarefully
designandvalidateaccessibilityforstudentswithlowincidencedisabilities.Somestudentsmaydeviate
fromthelearningtrajectories,buttheyshouldremainfocusedonacademiccontent.Thesystemshould
maintainexpectationsforallstudentsandguideteachersonhowallstudentscanmasterconceptsand
skills.Assessmentswouldbedesignedforallstudents,modificationswouldallowasmanystudentsas
possibletobevalidlyassessedwithinthesystem,andtherewouldbeflexibilityintermsofmodalityof
testadministrationanditemtype.
Technical Quality
ThenewCommonCoreStandardsprovideafoundationforacriterionreferencedexaminationsystem
that
is
closely
tied
to
instruction
yet
meets
crucial
criteria
of
technical
quality
of
assessment.
The
core
gradelevelstandardsareorganizedasasetoftrajectoriesorsequencesoflearninggoals.Theyare
specifiedatagrainsizethatcanbeusedtoorganizemeaningfulunitsofinstructionandcorrespondingly
meaningfulassessments.
TheAmericanExaminationSystemincludesDistributedAccountabilityExams,foruseoverthecourseof
theschoolyear,whichmeasurethespecifichigherorderskillsthatarearticulatedintheCommonCore
Standardsandstatestandards,aswellasbasicknowledge.TheDistributedAccountabilityExamswill
-
8/8/2019 ResnickBergerSystemModel
31/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
31
includeextendedwrittenworkandotheropenendedexpressionsofstudentreasoningandthinking;in
mathematics,thesewouldincludedrawings,graphs,andexplanations.Theywillassessbasicknowledge
bothwithintheseconstructedperformancesand,whereappropriate,inclustersofmultiplechoice
items.
DistributedAccountabilityExamswilladdresseachoftheskills/topicsarticulatedforeachyearofthe
stateandcommonstandards.Theywouldbebuilttostrongcriteriaofcontentandinstructionalvalidity.
Eachexamwouldprovideareliableestimateofstudentknowledgeonthecontentofaninstructional
unitthatisexplicitlytargetedtoastandard,orsetofstandards,intheCore.Thecollectionofexam
scoresforayear)wouldprovideavalidestimateoftheextenttowhichastudent(class,school)has
masteredthecontentspecifiedbythestandardsforthatyear.
Cont ent Va l id i t y
Theexamswouldmatchcloselyinbothcontentandformthecontentthatisexpectedtobetaughtin
eachof
the
instructional
units.
New
instructional
units,
explicitly
linked
to
the
Core
standards
would
be
createdtoanchorthecontentvalidityoftheunits.Teamsofindependentcontentandinstructional
expertswouldreviewthemodelinstructionalunitstoensuretheymatchwiththestandardsand
instructionalquality.Thesameteamswouldjudgethealignmentofexamstothemodelinstructional
units.Thisprocesswouldlargelyovercometheproblemofweakalignmenttostandardsthatnow
troublesmanystateassessments.
I ns t ruc t i ona l Val i d i t y
Assessmentsareconsideredinstructionallyvalidwhenstudentperformanceimprovesafterquality
instructiononthecontentoftheassessment.Ourdevelopmentprocesswouldincludetestsof
instructionalvalidity,
similar
to
the
experiment
based
ones
used
by
the
Pittsburgh
Science
of
Learning
Center.Thesetestswouldinvolvepanelsofteacherswithgoodknowledgeofaninstructionalunits
contentaswellasdemonstrablygoodpedagogicalskills(asjudgedbyanexpertpanel).Theseteachers
wouldbeputintofourgroups.Twoofthegroupswouldteachtheinstructionalunitthatcorrespondsto
theDistributedAccountabilityExam.Inoneofthesegroups,theywouldgetpretestAfortheirstudents
beforetheunitistaughtandthenthestudentswouldtakeTestB.Inthesecondofthesegroups,the
testsareflipped:TestBisthepretestandTestAisgiventostudentsaftertheunitistaught.Inthethird
andfourthgroups,studentswouldnotbetaughttheparticularinstructionalunitatthattime,butwould
stillbegiventhepretestsandposttests(onegroupwithAasthepretestandBastheposttest,the
otherwithBasthepretestandAastheposttest).Onlyteststhat,throughtheseexperiments,
systematicallyregister
improvements
in
student
performance
as
a
result
of
corresponding
instruction
(anddemonstrateequivalencethroughthepre andposttestswaps)willbeincludedinourDistributed
AccountabilityExams.
Bothpretestandendofunitexamsinanygivenyearwillbedrawnfromabankoftasksthatwillbe
developedaspartofthisvalidationprocess.ItemsortasksfortheDAEswillalsobepretestedand
calibratedusingstandardclassicalandmultidimensionalIRTframeworks.
-
8/8/2019 ResnickBergerSystemModel
32/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
32
Rel iabi l i ty
DistributedAccountabilityExamswouldcontainamixofshortconstructedresponseitems,andmore
extendedwrittenresponses,alongwithsetsofmultiplechoiceitemsasappropriatetothestandard
being
examined.
Short
and
long
constructed
response
components
would
require
human
scoring.
Researchhasestablishedthatwhenconstructedresponsetasksarewelltargeted,scoringrubricsare
specificandgradersaretrained,ahighlevelofinterraterreliabilitycanbeattained(Mariano&Junker,
2007;Patzetal.,2002;Rayn&Shepard,2008).
Studentresponsesonconstructedresponseitemscouldbegradedlocally(withinthesameschoolbut
notbythestudentsownteacher),orbygeographicallyandsociallyremotescorers(includingteachers
elsewhereinthedistrictorstate).Thesegradescouldbevalidatedusingoneofanumberofmethods
thathavebeenusedinEuropeancountries(e.g.,crossschoolorcrossstategradingexercises;re
gradingofasampleofstudentpapersatthestatelevel).Teacherparticipationinthegradingexamsand
therelatedvalidationexercises(someofwhichcouldbefacetoface)isagoodprocessforprofessional
learningand
is
used
in
most
countries.
Though
the
process
is
more
costly
in
dollars
than
machine
scoring,itisaneducativeprocessworthbuildingintoourExaminationSystem.Gradevalidationatscale
wouldbesupportedbytheAmericanExaminationSystemplatform,whichcanenablerapid,cost
effectiveremotescanning,transmission,grading,validation,andreporting.Toensureprotectionof
studentprivacyrights,thesystemhasthecapacitytoanonymizethedigitizedstudentworkbefore
routingittotheremotescorersandvalidators,aswellas,forlimitedpurposes,automaticessayscoring
technologies.
TheDistributedAccountabilityExamsopenthepossibilityforincreaseduseofconstructedresponses
becausetheyaredistributedoverthecourseoftheyear,yieldingseveraltimesmoreopportunityto
collect
data
than
current
end
of
year
tests.
This
also
brings
benefits
in
terms
of
increased
test
reliability.
Forinstance,ifthereliabilityofeachsingleDAEhourlongexamwere0.7,thereliabilityoffiveDAEs
takentogetherwouldbe5*(0.7)/(1+4*0.7)=0.92.Ifinstead,halfofeachDAEstestingtimewereused
forapretestonthenextinstructionalunitorsimplyforcalibratingfuturetestitems,theimprovement
wouldbe2.5*(0.7)/(1+1.5*0.7)=0.85stillahighrateofreliability.
Yettoobtainthesemorereliableresults,studentswouldnothavetositfora5hourexam,oreventake
anendofyearexam.Theyjustwouldhavetotakeunitexamsastheynormallywouldinthecourseof
teaching,butnowwiththeunitexamcontributingtoanoverallaccountabilityscore.Anotheradvantage
isthatstudentswouldbetestedonrecentlylearnedmaterialatalltimes,sothatnuisanceeffectsof
delayedrecallwouldnotinfluencemeasuresofhowwellstudentswerelearningwhattheteachers
taught;this
would
probably
increase
reliability
even
more.
Althoughtheillustrationaboveisuseful,inrealityitlikelywillnotbepossibletostringtogetherDAEs
intoasingleunidimensionalmeasurementtowhichclassicalreliabilitycalculationsapply.Insteadwe
believetheDAEswithinasubjectwillbeatleastmildlymultidimensional;ifweconsidereachDAEwithin
asubjectwithinayearasameasureofoneproficiency,fiveDAEswouldbemeasuringfivedifferentbut
substantivelyrelatedproficiencies.Theseproficienciesarelikelytobestatisticallyrelatedaswell.For
exampleinNAEP,proficiencysubscaleswithinthesamesubjectareaaretypicallycorrelated0.8or
-
8/8/2019 ResnickBergerSystemModel
33/60
-
8/8/2019 ResnickBergerSystemModel
34/60
National Conference on Next Generation Assessment SystemsContent Provided by the Center for K 12 Assessment & Performance Management
34
Produce Resul ts That Can Be Aggreg ated a t th e Classroom , School , D ist r i c t , and
Sta t e Leve ls
Yes.
Produce Repor