A Kendama Learning Robot Based on Bi-directional Theory

22
@ Pergamon NeuralNetworks, Vol.9, No.8, pp. 1281–1302, 1996 Copyright 01996 E S L rights r Printedin GreatBritain 089H080/96$15.00+ .00 PII: S0893-60S0(96)000433 1 S P I A Kendama Learning Robot Based on Bi-directional Theory HIROYUKIMIYAMOTO,l STEFANSCHAAL,1~2 FRANCESCAGANDOLF0,3HIROAKIGOMI,4 YASUHARUKOIKE,5 RIEKO OSU,1J6 ERI NAKANO, 1,7yAsuHIRo WADA,8 AND MITSUOKAWATO1 ‘ATR H I P R L K J I T 3 I T 4 B R L K J 5 M C 6 U 7 U S C (Received28September1995;revisedandaccepted28 March 1996) Abstract-A general theory of movement-pattern perception based on bi-directional theory for sensory-motor integration can be usedfor motion capture and learning by watching in robotics. We demonstrate our methods using the game of Kenalrma, executed by the SARCOS Dextrous Slave Arm, whichhas a very similar kinematic structure to the human arm. Three ingredients have to be integrated for the successful execution of this task. The ingredients are (1) to extract via-pointsfrom a human movement trajectory using a forward-inverse relaxation moa%l,(2) to treat via-points as a control variable while reconstructing the &sired trajectory from all the via-points, and (3) to modify the via-pointsfor successful execution. In order to test the validity of the via-point representation, we utilized a numerical model of the SARCOS arm, and examined the behavior of the system under several conditions. Copyright @ 1996 Elsevier Science Ltd. KeywordeTeachingbyshowing, Task-level learning, Dynamicoptimization. 1. INTRODUCTION Severalinterestingfindingsin neuroscienceindicate the close relationshipbetweentheactiveexecutionof a movement and the passive monitoring of a movement when performed by somebody else. For example,Perrettet al. (1989) found a varietyof cell typesin the superiortemporalSUICUS of the monkey brainthatare selectivefor thesightof handand body actions. di Pellegrino et al. (1992) reported that neurons of the rostral part of the inferior premotor cortex of the monkey dischargeduringgoal-directed hand movements, and also discharge when the monkey observes specific meaningful hand move- mentsperformed by the experimenters. Deeetyet al. (1994) found that during the observation of hand movements, without actual movement, and during motor imagery,brain activationwas mainlyfound in A c w l t Y T H I P R L a c e w a l t P A m D 3 H I P R L t u d R r s s H M H 2 S S K 6 J m r n visualcortical areas,and also in areasrelatedto the motor behavior of normal human subjects with positron emission tomography. These neurophysio- logical data suggest that observations and mental images of motor behaviorsmight play an important role in motor learning.Indeed,when we learn novel motor skills,we often turnour attentionto a teacher who demonstratesnewmotor behaviors. The idea of making use of a demonstrationhas beenpickedup by researchersin robotics for higher- level task learning (e.g., Kang & Ikeuchi, 1993; Kuniyoshi et al., 1994),a researchtrend away from simple trajectoryfollowing. When we make a robot perform a task,it is difficultto overcome fluctuations and environmental uncertainty with conventional . teaching methods such as “teaching play back”. In this paper we propose a higher level of teaching, “teaching by showing” (or “learning by watching”), in order to overcome some of the problems inherent in the play back approach. The approach we will pursueis based on the bi- directionaltheory (Kawato, 1992, 1995),which gives a general computational framework for sensory- motor integration, and derives generic representa- tions for a wide varietyof motor behaviors(Section 2). One of theessentialingredientsof thistheoryisto representbehaviors in terms of a sparse via-point 1281

Transcript of A Kendama Learning Robot Based on Bi-directional Theory

Page 1: A Kendama Learning Robot Based on Bi-directional Theory

@Pergamon

NeuralNetworks,Vol.9,No.8,pp. 1281–1302,1996Copyright01996 E S L rightsr

Printedin GreatBritain089H080/96$15.00+.00

PII:S0893-60S0(96)000433

1 S PI

A Kendama Learning Robot Based on Bi-directional Theory

HIROYUKIMIYAMOTO,l STEFANSCHAAL,1~2FRANCESCAGANDOLF0,3 HIROAKI GOMI,4YASUHARUKOIKE,5 RIEKOOSU,1J6ERI NAKANO,1,7yAsuHIRo WADA,8 AND MITSUOKAWATO1

‘ATR H I P R L K J ‘ I T 3 IT 4 B R L K J 5 M C 6 U 7 U

‘ S C

(Received28September1995;revisedandaccepted28 March1996)

Abstract-A general theory of movement-pattern perception based on bi-directional theory for sensory-motorintegration can be usedfor motion capture and learning by watching in robotics. We demonstrate our methods usingthe game of Kenalrma, executed by the SARCOS Dextrous Slave Arm, whichhas a very similar kinematic structureto the human arm. Three ingredients have to be integrated for the successful execution of this task. The ingredientsare (1) to extract via-pointsfrom a human movement trajectory using a forward-inverse relaxation moa%l, (2) totreat via-points as a control variable while reconstructing the &sired trajectory from all the via-points, and (3) tomodify the via-points for successful execution. In order to test the validity of the via-point representation, we utilizeda numerical model of the SARCOS arm, and examined the behavior of the system under several conditions.Copyright @ 1996 Elsevier Science Ltd.

KeywordeTeachingby showing,Task-levellearning,Dynamicoptimization.

1. INTRODUCTION

Severalinterestingfindingsin neuroscienceindicatethe close relationshipbetweentheactiveexecutionofa movement and the passive monitoring of amovement when performed by somebody else. Forexample,Perrettet al. (1989) found a varietyof celltypesin the superiortemporalSUICUSof the monkeybrainthatareselectivefor thesightof handandbodyactions. di Pellegrino et al. (1992) reported thatneuronsof the rostral part of the inferior premotorcortex of the monkey dischargeduringgoal-directedhand movements, and also discharge when themonkey observes specific meaningful hand move-mentsperformed by the experimenters.Deeetyet al.(1994) found that during the observation of handmovements, without actual movement, and duringmotor imagery,brain activationwasmainlyfound in

A cw l t YT H I P RL ac e nw al t P A m D 3

H I P R Lt u d i

R r s s H MH 2 S S K 6 Jm r n i

visualcortical areas,and also in areasrelatedto themotor behavior of normal human subjects withpositron emissiontomography. These neurophysio-logical data suggest that observations and mentalimagesof motor behaviorsmight play an importantrole in motor learning.Indeed,when we learnnovelmotor skills,we often turnour attentionto a teacherwho demonstratesnew motor behaviors.

The idea of making use of a demonstrationhasbeen pickedup by researchersin robotics for higher-level task learning (e.g., Kang & Ikeuchi, 1993;Kuniyoshi et al., 1994),a researchtrend away fromsimple trajectoryfollowing. When we make a robotperform a task,it is difficultto overcome fluctuationsand environmental uncertainty with conventional .teaching methods such as “teaching play back”. Inthis paper we propose a higher level of teaching,“teaching by showing” (or “learning by watching”),in order to overcome some of the problems inherentin the play back approach.

The approach we will pursueis based on the bi-directionaltheory (Kawato, 1992,1995),which givesa general computational framework for sensory-motor integration, and derives generic representa-tions for a wide varietyof motor behaviors(Section2). One of theessentialingredientsof thistheory is torepresentbehaviors in terms of a sparse via-point

1281

Page 2: A Kendama Learning Robot Based on Bi-directional Theory

1282

representation(Section3). In thispaper,wewill showhow this approach can be applied to learningtheJapanesegame of Kendama by first extractingvia-points from a human demonstration, and thentransferringthis knowledge to an anthropomorphicrobot arm which has to learn to perform the sametask (Section 4). In Section 5 we will examineanddiscussthevalidityof thevia-pointrepresentationbymeansof severalnumericalsimulations.

2. BI-DIRECTIONAL THEORY

Fast and coordinated arm movements should beexecutedunder feedforwardcontrol since biologicalfeedbackloops, in particularthosevia the periphery,areslowandhavesmallgains.Recentestimatesshowdynamic stiffnessis not very high during movement(Bennett et al., 1992; Bennett, 1993; Gomi et al.,1992). Thus, internal neural models, such as aninversedynamicsmodel, are necessary(Katayama&Kawato, 1993;Kawato et al., 1993a).Analysisof theactivityof singlePurkinjecellssuggeststhe existenceof an inverse dynamics model in the cerebellum(Shidaraet al., 1993).Trajectoriesof point-to-pointarm movementsusing multi-jointsare characterizedby roughlystraighthandpathsandbell-shapedspeedprofiles (Morasso, 1981). Kinematic and dynamicoptimization principles have been proposed toaccount for theseinvariantfeaturesso far (Flash &Hogan, 1985; Uno et al., 1989a; Kawato, 1992).Experimentaldata supportthedynamicoptimizationtheory, which requires both forward and inversemodels of the motor apparatus and the externalworld (Kawato, 1995).

The problem of controlling goal-directed limbmovementscan be partitionedconceptuallyinto a setof information-processing subprocesses: trajectoryplanning,coordinate transformationfrom extracor-poral spaceto intrinsicbody coordinates,and motorcommand generation. These subprocesses are re-quired to translatethe spatialcharacteristicsof thetargetor goal of the movementinto an appropriatepatternof muscle activation. Over the past decade,computationalstudiesof motor control havebecomemuch more advanced while concentratingon thesethreecomputationalproblems.

Many of the models can be broadly classifiedintoone of two alternativeclassesof theories:unidirec-tional and bi-directional (see Figure 1). Both typesassume a hierarchical arrangement of the threecomputational problems to be solved for visually-guidedreachingand a correspondinghierarchyin theneuralrepresentationsof theseproblems: thatis, thetrajectory planning problem, the coordinate trans-formation problem and the motor command genera-tion problem on the one side and the desiredtrajectoryin extrinsicspace, the desiredtrajectoryin

Unidirectionaltheory

oor mT

e TI S

Controller

5?M C

H. Miyamoto et al.

Bi-directionaltheory

ETrajectoryPlannin

e TE S

IKM FKMae TI S

IDM FDM

otorC

Smoothness I

F C u bt g m t ct s B a a h ac p i rv m I s a a bd u t r a a bd b tf i k mf i d m

intrinsic space, and the motor commands on theother.

In the unidirectional theory framework, theinformation flow is only downward, that is,unidirectional. On the other hand, in the bi-directional theory framework, both downward andupwardinformationflow is allowed.

The downward information flow is mediated bythe inverse kinematics model and the inversedynamics model of the controlled object and theenvironment. The upward information flow ismediatedby the forward kinematicsmodel and theforward dynamicsmodel.

2.1. Unidirectionalversus Bidirectional Theories

In unidirectional theories, information flows onlydownwardfrom thehigherlevelto thelower level.Asa result, the higher level computationalproblem issolved without any reference to the lower levelcomputational problems. For example, trajectoryplanning is solved without using any knowledgeabout coordinatetransformationor motor commandgeneration. Thus, the three problems are solvedsequentiallystepby step (Table 1). That is, first thetrajectoryplanningproblemis solvedto compute thedesiredtrajectoryin theextrinsicspace(in manycasestask-orientedvisual coordinates).Then, the coordi-nate transformationproblem is solved to obtain thedesiredtrajectoryin the intrinsicspace (joint angles,

Page 3: A Kendama Learning Robot Based on Bi-directional Theory

A Ken&ma karnirrg Robot 1283

T 1C U B T G M

U B

c S S

t (c

O p ( ( (t

ae

muscle lengths, etc.) from the trajectory in theextrinsic space. Finally, the necessarymotor com-mandsfor thedesiredtrajectoryin theintrinsicspaceare calculatedby a controller.

On the other hand, in bi-directional theories,upward information flow as well as downward flowis allowed. In fact, the former actuallyis essentialinorder to solve ill-posednessin the three computa-tional problems in a reasonably short time. As aresult,the higherlevelcomputationalproblemcan besolved while taking account of eventswhich happenat the lower levels.For example,trajectoryplanningis influencedby the requirementfor smooth motorcommands. Thus, the three problems are solvedsimultaneouslyratherthansequentially.

One of the most fundamentaldifferencesbetweenthe two theories concerns the spaces where thetrajectoryis planned.Consequently,at presenttheiris a controversyconcerningwhichcoordinatesystem,extrinsic (kinematic) or intrinsic (dynamic), is usedfor trajectoryplanning.

In the unidirectional theory, the t

assumedto be planned solely in the extrinsicspace(usually task-orientedvisual coordinates), while allthekinematicand dynamicfactors at thelower levelsareneglected.On theotherhand,in thebi-directionaltheory,trajectoryplanninginvolvesboth theintrinsic(body coordinates) and extrinsic space. Goals ofmovements such as the end point of reaching aregiven in the extrinsic space while necessary con-straintsto select a unique trajectory(i.e., to resolvethe ill-posedness) are given in the intrinsic space.Thus, the two spaces are used simultaneouslyfortrajectoryplanning.

Almost inseparably coupled to the spaces fortrajectory planning, optimization principles fortrajectory planning developed by the two theoriesare markedly different.In the unidirectionaltheory,becausethe planningprocessdoes not takethe lowerlevelsinto account, the optimizationprinciplehas tobe kinematic and all of the dynamic factors arethereforeneglected.One representativeexampleis theminimum-jerkmodel (Flash& Hogan, 1985).On theother hand, in the bi-directionaltheory, it is possible

to use the dynamic optimization principle whichtakes lower levels into account. One representativeexampleis the minimum-torque-changemodel (Unoet al., 1989a).Minimum-muscle-tension-changemod-el (Uno et al., 1989b) and minimum-motor-com-mand-change model (Kawato, 1992) o

examplesin the bi-directionaltheory. Others couldfurtherbe proposedin thebi-directionaltheoryclass.

In the purestexampleof a unidirectionaltheory,that which combines kinematicpath planning withthe virtual trajectory control hypothesis (Flash,1987),the braindoes not need to utilizeany internalmodel of the motor apparatus or environment intrajectoryplanningand control. On the other hand,both the forward dynamics model and the inversedynamicsmodel are necessaryfor fast computationof trajectoryplanningunderthebi-directionaltheory(Kawato, 1992;Wada & Kawato, 1993).In general,inversemodels are necessaryfor fast computation,while forward models are necessary to resolveredundancyproblems,or in more intuitiveterms,toimprove the adaptabilityof behaviors. These twokinds of models correspond to downward andupwardinformationflows respectively.

These internal models should be learned andstored somewherein the brain. We believethat thisacquisitionof internalmodelsof themotor apparatusand the environmentforms a major part of earlymotor learning. A biologically plausible learningscheme to acquire the inversedynamics model waspreviouslyproposed (Kawato et al., 1987;Kawato &Gomi, 1992). We have already obtained someexperimentalevidencethat internalmodels resideinthe cerebellum(Shidaraet al., 1993).

2.2. Forward-InverseRelaxation Neural NetworkModel and SequentialMovements

We have developed severalneural network modelswhich can generatethe dynamicallyoptimal trajec-tory and control a motor apparatusto move along it(Kawato et al., 1989).Recently,we developeda newneural network model which can generate thetrajectory within a few iterations (Kawato, 1992;

Page 4: A Kendama Learning Robot Based on Bi-directional Theory

1284 H. Miyamoto et al.

oV P v cI S v c

vR et pv o

M uS

F F or n n m( t f

Wada & Kawato, 1993). It is called the rorward-inverse relaxation neural network model (FIRM)becauseit containsboth theforwarddynamicsmodeland the inverse dynamics model of the controlledobject (Figure 2). Note that the FIRM can generateand control the optimal trajectory in real timeaccording to any one of the dynamic optimizationprinciples (e.g., minimum-torque-change model,minimum-muscle-tension-changemodel, minimum-motor-command-changemodel).

In reachingmovements,the location of the startand end points and the specifiedmovementdurationprovide the two-point boundary conditions for theoptimizationproblem.The nonlineardynamicsof thearm govern the relationship between the controlvariable (joint torques) and the trajectory. Thesubject can pass through specified via-points.According to Marr’s three-levelsof understandingbrain function (Marr, 1982)we can summarizeourstudy of reaching movements as follows: (1) Thecomputationaltheoryof reachingis theminimizationof one of the intrinsic dynamic variables. (2) Thealgorithm involves relaxation in a distributednet-work and thetaskis representedby thestart,via- andend points and the desiredmovement duration. (3)The hardware is the FIRM network and theassociatedcontrollersand actuators.

We hypothesizedthat this computationalframe-work mightbe extendedto otherclassesof voluntarymovementssuch as handwritingor speech.For thisextensionof the theory, the firstand third levelsareeasilytransferred,but the representationlevelneedscareful consideration. For speech, we believe thateach phoneme determinesa via-point location. Forhandwriting,we developed an algorithm to extractthe minimum number of via-points from a given

trajectorywithsomelevelof errorthreshold(Wada&Kawato, 1995,seealso Section3.1). If a ilxednumberof via-points are given and the arm dynamics areknown, we can calculate the optimal trajectorypassing through these via-points. Our via-pointextraction algorithm uses FIRM again and thissuggests a duality between movement patternformation and movement pattern perception (seeSection2.3).

We succeeded in reconstructinga cursive hand-writingtrajectoryquiteaccuratelyusingabout 10via-points for each character(Wada & Kawato, 1995).The extractedvia-pointsincluded not only kinema-tically-definablepointswithmaximumcurvatureandlowest velocity but also other points not easilyextracted by any purely kinematic method whichdoes not takeaccount of the dynamicsof the arm orthe dynamicoptimizationprinciple.

2.3. Bidirectional Motor Theory for MovementPattern Perception

A simplesystemfor recognizingwords from cursivehandwriting,based on the above via-point represen-tation,worked withouta word dictionary.When thesame algorithm was applied to speech articulatormotion during natural speech, the extracted via-pointscorrespondedfairlywell to the phonemesthatwere determined from a simultaneouslyrecordedacoustic signal. Natural speech movements werereconstructed well from those phoneme-like via-points (Wada et al., 1995).

Thus, we have accumulatedevidencethat the bi-directionaltheoryapproachis a promisingcomputa-tional model for both generationand perceptionofseveraltypes of movements.We have proposed thesamebi-directionalarchitecturefor fast computationin early vision, and fast and reliable integrationofdifferentvision modules in middle vision problems,including the integrationof surface normal estima-tion, boundarydetectionand light source estimationin discriminatingshapefrom shading(Kawato et al.,1991,1993b;Hayakawaet al., 1994).

The mathematicalstructuresof these models arealmostidentical.Thus, althoughwe do not haveanystrong psychological, anatomical, or physiologicaldata, it is very temptingto propose thatthe samebi-directional architecture might be applicable tointegration of different modules ranging fromsensory informationprocessingto motor control asshown in Figure 3.

The basic assumptionin this proposal is that thebrainconsistsof a numberof moduleswhich roughlycorrespond to different cortical areas and thesemodules are organizedin a hierarchicalyet parallelmanner. We do not assume a single gigantic mapwhich plays a centralrole in integration.Rather,we

Page 5: A Kendama Learning Robot Based on Bi-directional Theory

A Kenabna tiarning Robot 1285

zmI

Upward(forward

model)

t

,#!%i3modulec moduleg

Motor otorApparatus1 Apparatus2

F A g b lh sm i b a m ip r

are proposing a paralleldistributedway of integrat-ing differentmodules. It is well known in neuro-anatomythatif one areais connectedto anotherareaby a feedforwardconnection, backwardor feedbackconnections alwaysexist. Our main proposal is thatthe downwardinformationflow implementedby thefeedforwardconnection constitutesan approximateinversemodel of some physical process outside thebrain such as the kinematic transformations,thedynamic transformationsand the optics, that is theimage generationprocess. On the other hand, theupward information flow implementedby the feed-back connections provides a forward model of thecorrespondingphysicalprocess.

What are the advantages of this bi-directionalarchitecture?The fist advantageis its fast computa-tion. The cascade of inverse models gives afeedforward and one-shot calculation. This canexecute reflexesand fixed action patternstriggeredby specific stimuli. However, if we do not haveforward models, and if we must rely solely oncomputational machinery provided by the uni-directional theory, our behavior repertoire shouldbe very limited. Equifinality,optimality, or adapt-ability to differentenvironmentalsituationscan berealizedonly if the brain uses some kind of internalforward models, in other words an emulator orpredictorof externalevents.The networkrelaxationin the circle of the inverse and forward modelsconvergesvery rapidly, withina few iterations,to asuboptimal solution. This is the second computa-tional advantage. Third, integration of parallel

modules can be done rapidly. Thus, bi-directionaltheories may provide an understandingabout howthelargenumberof differentvisualcortical areascanneverthelessproduce a coherentperceptof the visualworld. Finally,the bi-directionaltheorymight give aconcrete computational algorithm for the motortheory of movementpatternperception.

In the motor-theory of speechperception (Liber-man et al., 1967; Liberrnan& Mattingly, 1985), aneuralnetworkfor motor control is hypothesizedtoplay an essentialrole in theperceptionof speech.Ouralgorithm gives one specific computational realiza-tion of thispsychologicaltheory.Data resultingfromhuman movement, either visual data (e.g., hand-writing, biological motion) or auditory data (e.g.,speech)areveryseverelyconstrainedby thedynamicsof the controlled objects and interactionswith theexternalworld as well as the motor control strategyadopted by the central nervous system. Thus, anyefficientmotor-patternperceptionschememusteitherimplicitlyor explicitlytakeaccount of thesephysicaland physiological constraints.Here, we advocate aratherradical stand-pointby statingthat the move-ment-pattern generation network actively partici-pates in movement-patternperceptionin a dualisticway. The proposed theory of movement-patternperception based on dynamic optimization can beused for movementpatternrecognition(handwritingor speech),telecommunicationandteleoperation,andlearning-by-imitationin robotics.

In the bi-directionaltheory framework, we havetwo opposite flows in the hardware,thus the samealgorithmcan be appliedequallyto motor control aswellas to perceptionof movementpatterns,as shownin Figure4.

mI 41

I I

I via-pointrepresentation I

Fmotor command

I I

1I movement

trajectory I

movementpatternperception

F A c a r m tm p b b

a v r

Page 6: A Kendama Learning Robot Based on Bi-directional Theory

1286 H. A4iyanrotoet al.

T 2D S L W

Strategy of Representations for Intelligence of Differencem

U iA cT rI ni

3. LEARNING BY WATCHINGTable 2 describesdifferentstrategiesfor learningbywatching. If the learnerhad perfectintelligence,s/hewould be able to understand the will or motorintentionof the teacherfrom perceivingtheteacher’sdemonstration.Subsequently,thisinformationcouldbe translated into a stream of actual motorcommands by taking into account the laws ofphysics describing the properties and environmentof the teacherand learner.

This strategycan be conceivedas thehighestlevel,themost “cognitive” way of learningby watching.Incontrast, if one wanted to avoid this high level ofintelligence,one could employ a strategysolelybasedon imitating the position and force trajectorydemonstratedby the teacherwith accuracy. Indeed,such an indiscriminateimitationcould be an efficientstrategy if the following conditions, at least, weresatisfied.(1) The teacherand the learnerhaveexactlythe same kinematicand dynamicproperties.(2) Theteacher and the learner have exactly the sameenvironment. (3) The learnerhas extremelyprecisemeasurement and control ability. However, theserequirementsare not satisfiedin almost all realisticsituations.Hence, a more abstractunderstandingofthe teacher’s motor behaviors at a higher level isessential.

Learning algorithms, such as reinforcementlearning and genetic algorithms, can be efficientlyused for task-levellearningif adequaterepresenta-tions of the taskare selected.However,thisselectionis one of themost difficultand criticalpartsof motorlearning. Assuming the pre-existence of properrepresentationsamounts to having solved a majorpart of the problem a priori and failsto dealwith thefull problem of motor learning(Schaalet al., 1992).Good representationshave to fulfill a variety ofprerequisites.The representationsshould take intoaccount thedynamicsof thecontrolledobject andtheexternal world as well as computational principlesadopted by the central nervous system in motorcontrol.

In this paper we want to demonstratethat via-point representations, extracted from movementtrajectories based on a dynamic optimizationprinciple, are an attractivecandidate for a genericrepresentationin motor control.

3.1. Via-pointsExtraction and Trajectory Formation

Using FIRM, Wada and Kawato (1995) developedan algorithmto extractapproximatelythe minimumnumber of via-points S = {Pl, P2,. . . ,PN} from agiven trajectory ~d.~ with some level of errorthreshold6. The via-point extraction problem is tofind the minimum number N such that thereconstructedtrajectoryXOPtdoes not deviatefromthe originaltrajectorymore than the threshold:

Note that this problem is a nonlinear optimizationproblem. If the error threshold6 is satisfiedfor theminimumN, thentheproblemis to find thevia-pointset Y that gives the minimum error level[l~d,~-XOP,(Y) II. If a fixed number of via-pointsis givenand the arm dynamicsis known as

: =j(x,M),

whereM is themotor command, FIRM can calculatethe optimal trajectoryXOPt(S)passingthroughthesevia-points in a given time tf and minimizing thefollowingequation

Ill II‘fgg2dt-~dt(3)

Figure 5 illustrates how the algorithm forextracting via-points works using an example ofKendama.Basedon the FIRM model, thevia-pointsare extracted sequentially as follows: (1) FIRMgeneratesan optimal trajectory between the startpoint and theend point usingtheminimumprinciplefor the approximatedlinear dynamics (point massmodel) (left column). In this case, the generatedtrajectoryis equivalentto the minimum-jerktrajec-tory (Iifth order polynomial). (2) FIRM selectsthefirstvia-pointat the point of maximalsquarederrorbetween the given trajectory and the generatedtrajectory. Then FIRM generatesa new trajectorypassingthrough this via-point (middle column). (3)This procedure continues (right column) until the

Page 7: A Kendama Learning Robot Based on Bi-directional Theory

A Ken&ma Learning Robo~ 1287

x 0.102(meter)l

+

. .

-0,0s1 mev 0.483(meter)

-0.121(meterz o.141(meter)

*

d -34. deTHETA 155.65(d

-56.54(deg~? I

+ 5t 0.102(meter) x 0,102(meter)

3

-0.061 me2

-0.lxlor!ef 0.483(meter) ~ 0,483(meter)

3 @ @

-0.121(meter/z 0.141(meter)

q

- d -34.65(derHETA 155.65( THETA 155.65(d

-38.91 d -38.91(d

‘k#=@ “226.17(d

-56.54{deg) -56.54(deI I I I

-1 0 (sec.) 1 -“1 O(sec.) i -’1 0 (sec.) i

F A s i a a v T a m a s eb g t ( c g t ( c w e v ( Go t w c a ( ( p ( E f v( ( E n v ( u t p s h p p r yp a z p u l t p a h o ( E a md E r p a a w a h F r a z a P t ra Y a T t r a a [ C ( f t rn e r r o a t w o a w m

error betweenthe giventrajectoryand the generatedtrajectorybecomes sufficientlysmall.

3.2. Applicationto Robot Learning

We want to demonstratethe usefulnessof the via-point representationfor learningby watching in arobot learningexample.The robot is the SARCOSdextrousslave arm (Figure 6) which has almost thesamekinematicstructure(sevendegrees-of-freedom)as a human arm. However, there are two majordifferencesin comparison to human subjects.First,the kinematic parameters of the robot arm aredifferent, particularly due to differences in limblength. Second, the robot’s dynamic parameters,such as mass and inertia,are very differentfrom thehumanarm’s (about 4-5 timesheavier).Its mechan-ical joints and hydraulic actuators result in quitedifferent compliance properties for the robotcompared with those for humanarms. Thus, even ifthe robot arm were able to follow exactly the sametrajectoryin Cartesianteacher, it could not

space as shown by a humanexert the same force when

interacting with the environment, and, therefore,could not perform the task successfully. Formanipulationtasks,force control is one of the mostcriticalfactors.Thisoffersaninterestingchallengeforlearningby watchingsince forces, of course, cannotbe extractedfrom recordingkinematicdata.

The SARCOS arm is controlled by joint anglecommand. Becauseof the redundancyinherentin aseven degree-of-freedomkinematic structure, therearean infinitenumberof joint anglesevenif thehandposition and orientation in Cartesian space aredetermined.

The above-mentioned difficultiescan be solvedwith task-levellearningusing via-points as follows.Let ~d.ti and t?d~~denote the trajectory of humanmovement (hand position and orientationin Carte-sianspace,andjoint angles).Let X~,(i = 1,2,. .., N)denote the six-dimensionalvectors (hand positionand orientationin Cartesianspace) that expressvia-points extracted from human movement and6~ia(i= 1,2,... ,N) denote the seven dimensionalvectors that express the same via-points in jointangle space. Let %!a(i= 1,2,.. ., IV) and

Page 8: A Kendama Learning Robot Based on Bi-directional Theory

1288 H. Miyarnoto et al.

J2 shoulder abduction-adduction

J1 shoulder

J3 humeralrotation

J4 elbow

Y

J5 wristrotation

J6 wrist flexion-extension

F S d e ( k e ( h K e rf f r w e e v b p m c e p e n

k s pA v e a e c l ( m l J r m ca e j w f a a

~~a(i= 1,2,... ,N) denote the via-points of theSARCOS arm. In general, variables without tildedenotethosefor thehumandemonstrator,whilewithtilde for the SARCOS robot. We requirethat robotvia-pointsrepresentedin Cartesianspaceare exactlyequal to humans(~~a = X!!~), and robot via-pointsrepresentedin joint angle coordinates are closest tohumanswhilesatisfyingtheabove conditionsstrictly.Let us mathematicallyformulatethisbelow.

Let ~ denote the forward kinematicsequation ofthe SARCOS arm. There are an infinitenumber ofjoint angles of the robot arm which satisfies~~a = x(~:,) = x~a because of redundancyin theSARCOS arm. As a solution, we can choose 0~,which satisfiesthe above condition and minimizesl[o~, -ti~all. By meansofthis optimizationprinciplethe robot joint angles are determinedas close aspossible to those of the human subject.The desiredtrajectory in the joint angle coordinate 0[, theSARCOS arm passingthrough the-via-pointst%, isobtained as the optimal trajectoryOOPtbased on thesmoothnessprinciple.

Let us express the sets of the via-points of thehuman arm and SARCOS arm, respectively, asfollows:

Let and J denote via-point extractionalgorithm, the via-point determination algorithm,and optimal trajectorycalculation, respectively.Weget the following equations

&lamer= H(S’’mCh,), (7)

iopt=J(@l-er). (8)

The realizedtrajectoryexecutedby theSARCOS armis expressedas follows:

& =G(?opt). (9)

HereG is nearlyequalto an identitymap if adequatefeedforwardand feedbackcontrollerare used. Fromthe above equations,we obtain

t$wl= G(iopt)

= G(J(~hme,)) = G(J(H(E(od.ta))))

= e(e&tJ. (lo)

The compositefunction0 is not so differentfrom the

Page 9: A Kendama Learning Robot Based on Bi-directional Theory

A Kenalzma Learning Robot

identity map, if the following two conditions aresatisfied:(1) the optimizationprincipleadoptedhereis not so different from the human’s and (2)representation of the via-points is appropriate.Eventually, the realized trajectory ~malis smoothand parameterizedby a small number of controlvariables @l.,,n.,. Resultant realizedmovement bytherobot arm is close to thedemonstratedmovementof the human subject. From this, it is only a steptothe successfulexecutionof the task.

Let T be the task target which estimatesthesuccess or failure of the task directly. For exam-ple, in the case of the Kendama task, the taskvariablesare horizontal discrepancies(along the xand y axes; see the caption of Figure 5 for thecoordinate system) between the ball and the cupwhen the ball falls to the same height as the cup(detailed description will be given in the nextsection).Becausewe can determineT uniquelyfrom~~,1, the task variable is represented asT = F(@leamm) by meansof eqn (10).

In orderto maketherealizedtaskTagree withthe~esiredtask Tde~ir~,by meansof the representation91..,”., we can use various learningschemessuch asfeedback-error-learning, forward-inverse-modeling,direct-inverse-modeling,genetic algorithm, and re-inforcementlearning.

We can summarize our teaching method gener-ally as follows. (1) Teaching information via ahuman demonstration, (2) perceiving movementpatterns based on the dynamics of the controlledobject and an optimization principle, (3) extractingthe via-points which reconstruct the movementpattern by means of a neural network model, (4)treatingthe via-points as the control variable withfixed trajectory planning and a control scheme,and (5) modifying spatialand temporalpositions ofthe via-points so that the desired task will berealizedsuccessfully.

4. ROBOT EXPERIMENT

In this section, we demonstratethe performanceofthe Kendamalearningrobot in order to examinethepotentialof our method.

We chose the game of Kendamaas a task whichallows for a simple task-level representation.TheJapanese toy Kendama consists of two partsconnected by a thin string: a ball and a handleequippedwith threecups of differentsizes.Thereareseveral way to play Kendama. For simplicity, weconsider the second-easiest one. In the initialcondition, a player holds the handle with the ballhanging down. The player swings the handle up,yanking the ball to fly over the handle.After a fewhundredmillisecondsof flight, the ball is caught inthemiddle-sizecup. In orderto playKendama,quick

1289

and dynamic motion is required. A robotic armwhichhascomplex,uncompensatedlinkinterferencescould not perform the highly precisefollowing.

4.1. Overviewof KendamaLearningRobot

If the SARCOS arm yanks up the ballthe ball will fall onto the cup. However,

trajectory

properly,the robot

arm cannot reproduce the human Kendama taskexactly, because of the difference in dynamic andkinematic properties between the human objectsand the robot. Moreover, as shown in Figure 8(later), some error remains even if the inverse-dynamics model is used for feedforward controlwith a conventional feedback control. Althoughthe human subject grasps the handle of theKendama softly, we attachedthe handle rigidly tothe fixed forefinger of the SARCOS arm with asmall vise. Due to the difference in the graspingof the handle, the ball will show a behavior quiteunlike that of the human demonstration even ifthe SARCOS arm performs perfect trajectoryfollowing. The diametersof the ball and the cup areabout 58 and about 33 mm, respectively. Thelengthof the stringis about 395 mm.

Figure 7 illustratesa schematic diagram of theKendama learningexperimentbased on the mathe-matical and abstract formulation given in Section3.2. We made the Kendama learning according tothe following procedure: (1) We measured thehuman Kendama execution using a three-dimen-sional vision system (OPTOTRAK). We extractedCartesianvia-pointsX~~ from a human demonstra-tion. We relied on the joint angle via-points 13~,of the human demonstration for the coordinatetransformation from the Cartesian to the jointspace of the SARCOS arm via-points (~Vi~+ ~~i~).XVi, were used as the initial state of the robot’sCartesianvia-pointsY;,. (2) The robot’s Cartesianvia-points YVi~were transformed into the robot’sconfiguration space (joint angle ~~~) (see Sections3.2 and 4.3). (3) The optimized trajectory ~~Ptwas generated as the desired trajectory of theSARCOS arm. (4) The Kendama task was executedby the SARCOS arm. The ball and the cupposition trajectories (~t)~lland X.UP, respectively)were measured by a three-dimensionalvisual-sen-sing system (QUICKMAG). (5) To make theSARCOS arm yang the ball up properly and dropit onto the cup, X!tiawere modified. Steps (2)-(5)were repeated until the Kendama task was exe-cuted successfully. Detailed descriptions of thisprocedure will be given in the following subsections[Section 4.2 for (l), Section 4.3 for (2) and (3),Section4.4 for (4), and Section4.5 for (5)].

Page 10: A Kendama Learning Robot Based on Bi-directional Theory

I / b dQUICKMAG m

+

F A e d K l rt d d

4.2.Measurementof HumanDemonstration

While a subjectwas playingKendama,we measuredthe positionsof the shoulder,elbow, wrist,and backof the human right hand by a three-dimensionalvision system(OPTOTRAK: infrared light-emittingdiode markersandthreecameras,theprecisionof theposition sensing is about a few millimetersin thecurrentexperimentalsetting).We alignedthepositiondata so that the time at the peak hand height(maximum of z) was in the middle of a totalmovementtime (2 s) (see also the caption of Figure5). From these position data, we calculated theCartesian hand positions and orientations ~.jat~which is a six-dimensionalvector, and the trajec-tories of the sevenjoints’ angular positions edah, aseven-dimensionalvector. We extracted Cartesianvia-points Fia(i = 1,..., N) frOIIt ~dam US@ the

FIRM algorithm.For simplicity,wefixedthenumberof via-pointsNat sevenfor therealrobot experiment.Then we extracted joint angular via-points

H. iUiyamoto et a[.

e~ia(i = 1,... ,ZV)from ed.~ at the same temporalpositionsas Xi,.

Figure 8 shows the via-pointsand trajectoriesofthe humanand SARCOS arm. In the figure,dashed,solid, and dotted lines show the trajectoriesof thehuman movement, desiredmovement of the SAR-COS arm, and realizedmovement of the SARCOSarm, respectively.A cross and a circle indicate thevia-point of the human arm and the SARCOS arm,respectively.Becauseof the differencein kinematicproperties (e.g., link length) between the humansubjects and the robot, the joint trajectoriesof theSARCOS arm are much differentfrom that of thehumansubject.As shownin Figure 8, the via-pointsextracted using the FIRM algorithm seem to beassigned at the point that representsthe essentialpointsof themovement,e.g., aroundthebeginningofthe upwardmotion of the hand (around the second,third, and fourth via-points.See thirdpanel of rightsidein Figure 8).

In Figure 8, joint trajectoriesof the human andSARCOS arm show largedifferences,especiallyneart = O. If we give the humanjoint trajectory to theSARCOS arm, the hand trajectory of the robot isdifferentfrom the human one. It is caused by thedifferences in the kinematic parameter (e.g., linklength).Conversely,if we adjustthe hand trajectoryin Cartesianspace, the joint trajectoryof the robotcan be chosen arbitrarily because the robot andhuman arm have the redundancy(sevendegreesoffreedom).To determinetheproperjoint trajectoryofthe SARCOS arm, we describe the method fortransformingthe humanmovementto the robot’s inthe following section.

Transformingthe HumanMovementto that forthe Robot

As alreadyexplainedin Section3.2, whentransform-ing the Cartesian via-points to joint space, theCartesiancoordinates are given as hard constraintswhich must be strictly satisfied, while the softconstraint is that the robot joint angleshave to beas close as possible to those of the human subject.This can be accomplished using the followingNewton-likemethod:

where“~~i, is the ith via-pointof the SARCOS armrepresentedas the joint angleat the nth iterationofthe Newton-like method. We adopted the “singu-larity low-sensitive motion resolution matrix”Jt(d) = (~Z(@/dO)t proposed by Nakamura andHanafusa (1984) as a generalizedinversematrix ofthe Jacobian. This matrix minimizes

Page 11: A Kendama Learning Robot Based on Bi-directional Theory

A Kenakma Learning Robot 1291

~1ahowlderflexim-extensimr 90.0

-90.00

-180.00J3bumral rotatimr 90.00

4

-90.00J4 elbow 135.00

4,

-45.00,J5wrist rotatim 90.00

44

-90.00J6v e x90.00

4

-90.00J7 wrist abductiom-adduction 45.00

●<

0.400

4<

-0.4001.000

4

-1.00(m 180.0[

44

4

0.0(KETA 180.01

0.01SY 360.0

-“1 O(sec.) 1 -’1 O(sec.) i

F V t h S a ( J a t v (d ( C p t p r y p a z p u m m

o t [ C ( Z E a m d h A c a c s a vp h S a r D i s t h a S d i ad r t S a r

1 l+ k The first term of this concrete formula of .lt is given as follows.performanceindexrequiresthatthe inversetransfor-mation is mathematicallyexact,and the second term Jt = (JTJ+ kI)-’JT. (12)

requiresthatthejoint anglesdo not move too much.k is a weightingratio of thesetwo requirements.The Here,.lTdenotesthetransposeof matrixJ. k isvaried

Page 12: A Kendama Learning Robot Based on Bi-directional Theory

1292 H. Miyamoto et al.

according to the following equation:

{

~ = ko(l – w w <

o otherwise

w= Jw. (13)

Here, w representsthe distance from the singularposture(e.g., straightelbow posture)of theSARCOSarm, W. determinestheneighborhood of thesingularposition. When the posture of the SARCOS arm isthe singular position, k is set to k. >0. As wincreases,k approaches O in the neighborhood ofthesingularposition.When w is largeenough,k is setto O.Sincethismatrixhasa minimumnormproperty,we selectthe initialjoint angle of the robot arm aso;;. = o~~, where 6~i~is the via-point of the jointan~!eof the human subject. The joint angle~~~aisupdatedaccordingto eqn (11) so thatthejoint angledoes not move too much. In a few iterations,theSARCOS hand position and orientation~(”~~a) isclose to the human hand position and orientationXVia.Therefore,~~i~satisfiesthe above hardand softconstraints.

The desiredjoint angulartrajectorywasgeneratedby FIRM. When the arm dynamicsis approximatedlinearly as in the dynamics (14), there are twomethods which generate the optimal trajectory(Wada & Kawato, 1995).

where TJ is the torque generatedby thejth actuatorIJ and @ are the inertia of the link and theaccelerationof thejth joint angle, respectively.Thefirstmethod is based on the spline function. In thisexperiment, we adopted the second method (seeWada & Kawato, 1995 for details). Using thesecond method, the trajectory passing through avia-point is produced sequentially.Using the mini-mum principle for the approximated linear dy-namics such as eqn (14), FIRM can calculate theoptimal trajectory@oPt.

Definitionof Taak Variable

By using the desired joint trajectory obtained inSubsection 4.3, that is, by simply imitating thehumanKendamamotion, the SARCOS arm fails toperform the Kendama task as shown in Figure 9.To make the SARCOS arm successfully executethe task, we adopted the concept of task-levellearning. Aboaf et al. (1988) demonstrated thatlearningcan proceed at the task-level,even thoughlower level modules do not perform perfectly. Inour case, the control variables are via-points. We

want to adjust the ball throwing so that the ballfallsexactlyonto the cup.

Let usfirstexplainthemore intuitivelystraightfor-ward definitionof the task-variable,then propose amore practicalmodificationof that. We may choosethe realizedand the desiredtask representationsasfollows.

‘in’=(vTT’id=(3‘“)Here, (x~,y~,z:)= and (x~,y~,,@T representthe balland the cup positions respectively in Cartesiancoordinates when the ball falls to the same heightas theCUp(z: = &h),(X positiverightward,y positiveanterior,and z positiveupward).Figure 10illustratesthe definition of the variables used for the taskrepresentation.However, this task representationcannot determine the height of the ball flight.Moreover, there is a possibility that the visionsystem cannot measure the ball position correctlywhen the robot arm or the handleof the Kendamaoccludes the view of the ball. Then we choose thefollowing representationas a more practicalchoice,whichis essentiallythe sameas the one above.

The task variablesare the verticaland horizontaldistancebetweenthe initialand thepeak positionsofthe ball:

b b b)= and (X~,y~,Z~)Trepresent thewhere (xS,y~,z~initialand the peak positionsof the ball in Cartesiancoordinates, respectively.Thus, representsthedeviation from the initial position to the peakposition of the ball. Since our aim is that the ballfalls onto the cup, we have to control the peak ballposition so that the horizontalball position exactlymatches the horizontal cup position when the ballfalls to the same height as the cup. After the ballbeginsto fall freely, the x and y componentsof theball velocity are constant.Usinglinearinterpolation,the x and y coordinates of the desired peak ballposition are derived from the cup position when theballfallsto thesameheightasthecup (seeFigure10).Therefore,we set the desiredtask representationasfollows:

()~b—XbPsTprac

&gi,ed= ~; – Y: 1Const.

(17)

(18)

Page 13: A Kendama Learning Robot Based on Bi-directional Theory

A Kenabma tiarning Robot 1293

x 0.400

.~m■mm

■ ●m

-0.400Y 1.000

II

0.200z 0.300 +

. . .● m

mmm ● m ■ ● u ■ a m m. ■....,.. ■¤.. ■

mm■ 8 ■ ■ ■ ■ . m mm n

-1.ooo~I I

0 1F f t K t e S a ( T c ( l b (l t rp r y p a z p u m m Q ( Td v s f d p S i ( e f ( s mt a f m f r j a S m f ei c d p b m p t d s dc b p t l d c e p i i r p

p p r s t e h K t

j; = Y + (Y;– Jtvt,

tp– tf. —

– ‘

(19)

(20)

Where#h and y; representthe cup positionwhen theball falls to the same height as the cup, x?, and y:representthehorizontalball positionwhentheball isjust beginningto fall freely,tf,tp,and threpresent thetime when the ball begins to fall freely, reachesthepeak position,andfallsto thesameheightas thecup,respectively.Sincetheball showsstraightlocus in thex and y coordinateswhiletheball is fallingfreely,wecan use the linear interpolation. We set the zcomponent of 7’~~~~~at a constant value so thatthe ball risesto a proper height(in this experiment,we set the desiredpeak height of the ball to rangefrom about 0.5 to 0.8 m).

4.5. Modificationof Via-points throughVisuo-motorLearning

To improve the robot’s performance, the via-pointlocationsweremodifiedby thefollowingNewton-like

tf tp th

timeF A a i d at r t c ( i

b ( l t d va z x s C c pu x p r t yc a s c b d

b r s h dm s t f ( f p

h b t s hb r

Page 14: A Kendama Learning Robot Based on Bi-directional Theory

1294 H. A4iyamoto et a[.

method. This learningschemecan be regardedas anextension of the task-level learning algorithmproposed by Aboaf et al. (1988), in the sensethatthe control variables are not necessarilythe tasktargetbut are as abstractas the task target.

3’.+,=3’.= u 2?(T&.d-7’n).(2,)w’.

Here, ~. denotes the Cartesian via-points of theSARCOS arm in nth iteration of the Newton-likemethod, Tde~iNd,and are the desiredand realizedtaskrepresentations,respectively.We usedthematrix

= # = where AT de-notes the transposeof matrix A. This matrix is thesimply regularizedg-inverse matrix (Rao & Mitra,1971; Okamoto & Musha, 1992) which minimizesIll– AA#112+kllA#112.The first term of this perfor-manceindexrequiresthatthe inversetransformationis mathematicallyexact,and thesecondtermrequiresthat the via-points do not move too much. k is aweighting ratio of these two requirements.In thisexperiment,we setk = 1,becausewe wantedto makethe SARCOS arm accomplish the task as close aspossible to the human demonstration and also tomodify the via-point as small as possible for safeexecutionof the SARCOS arm movement.

Becauseit is difficultto analyticallycompute thematrix usedthematrixATreal/AX$~for

To estimatethe matrix, we observed thechangesin the behaviorof the Kendamataskwhen the perturbationA%!~ is added on the via-points. In thisexperiment,we added +~ri, (0.01min magnitude)changesonly to the three-dimensionalposition (x, y, and z) of the via-points. Thus,observation of the perturbed behaviors of theKendamatask totalled3 x 2 x 7 = 42.

Since the amplitudeof the Kendama movementalong the verticaldirectionis largerthan that alongthe horizontaldirection,a weightingconstanthad tobe introduced to balance the importance of theindividualcomponentsin the updateequations.Thisis accomplished by the diagonal matrixB = diag(l, 1,0.2). Becausethe sixthand the seventhvia-pointsare aroundtheperiod of catchingtheball,themodificationof thesevia-pointsmadethelearningtoo hard. Hence we did not modify the sixthand theseventhvia-points.

Reaults

During the robot’s task execution, the ball and thecup positionsweremeasuredby a three-dimensionalvisualsensingsystem(QUICKMAG: trackingcolorblobs with two cameras).This systemcan samplethecenter of a blob of a specific color at the sampling

rate of 60 Hz, and the precision of the positionsensingis about 5-10 mm in thecurrentexperimentalsetting.The positionsof theballand thecup werefedto the SARCOS controller via an 8-bit parallelportat the 500 Hz samplingrate. After the executionofthe task, the recorded ball and cup trajectoriesweresmoothed using a Butterworthsecond order digitallow-passfilterwith a low-passcut-off of 5 Hz. Fromthesetrajectoriesthe distancebetweenthe initialandthe peak position of the ball, was calculated.

After sevenlearningcycles,as shownin Figure11,the SARCOS arm executed Kendama successfully.We tried several executions with this desiredtrajectory,and got 2–3 successfulexecutionsout often.

5. NUMERICAL SIMULATION

As mentionedin Subsection4.2, theFIRM algorithmextractsvia-points at a time which corresponds tosegmentationof theKendamamovementinto severalessentialmotor primitives. These motor primitivesand correspondingsegmentationshould be commonfeaturesirrespectiveof differencesamong individualhuman demonstrators.To confirm this expectation,we controlled the simulatedKendama robot usingseveral trials from differenthuman demonstrators.Furthermore,in order to explore the validityof thevia-point extraction of the FIRM algorithm, wealtered the conditions of the via-point extraction.Sincethe realrobot shows dangerousbehaviorat aneccentric trajectory, and since the real robotexperimenttakes much time, we used a numericalmodel of the robot to examine the Kendama taskwith severalvia-point extractionmethods.

Figure 12 shows a schematic diagram of theKendama learningsimulation.The systemis almostthe same as the Kendama learning experiment(Figure 7), except for the robot arm and theKendamaequipment.

5.1. Analytical Model

In order to simulatethe robot arm, we make use ofanalyticalmodels of the forward dynamics, inversedynamics, and forward kinematicsof the SARCOSdextrous slave arm. The dynamics models werederivedin a recursiveNewton Euler formulationbyadopting the spatial vector arithmeticapproach ofFeatherstone (1987). The forward kinematics areobtainedas a by-product of thesecalculations.In thesimulation, the forward dynamics are numericallyintegratedby means of Euler’s method with a timestep of 0.5 ms. The inversedynamicsequationsareused for feedforward control together with aconventionalPD controller. It should be noted thatdespite the fact that we have perfect models for

Page 15: A Kendama Learning Robot Based on Bi-directional Theory

A Kenahna Learning Robot 1295

[ 0.400

0.200~ 0.300

-1.ooq1 I

R

-1 O(sec.) 1

F a t K t e S a F 9 I d

simulation, the coarse Euler integration of theforward model as well as the finiteservo rate of ourcontroller (500 Hz) still introduce some differencesbetween the desired trajectories and the actuallyrealizedones, particularlyfor fast movements,suchas those as necessaryfor Kendama.

In order to simulatetheball behavior,we usedthefollowing simplifiedKendamadynamicsmodel:

@&

I

+ \ /

F A e d K ls O t p d f F 7 a

{K(l – 1.) if 1>10IIFII= ; =0

&o

f = ~ –X)2+(~h– Y)2+ (Zh– Z)2,

MX= ‘f, = ‘l[~l(xh – X

= ‘ll~l(~h – Y)/L

m,?= ‘f= – mg = ‘ll~l(zh – z)// – mg,

(22)

(23)

(24)

(25)

(26)

whereK representstheelasticconstantof thestring,Frepresentstheforce vectorwhichactsupon thestring,1and 10representthe currentlengthand the naturalIength of the string, respectively, Xh, ~h and .Zhrepresentthe Cartesianposition of the origin of thestringat thehandle,m andg representthemassof theball and accelerationby gravity, respectively,x, y,andz representtheCartesianpositionof theball.Thedouble dots represent the second order timederivative.These equationswere integratednumeri-cally by meansof Euler’smethod with a time stepof2 ms. For simplicity,we consideredtheball asa pointmass and the cup as a hemisphericalinner surface(the equationof the interactionbetweenthe cup andthe ball is not shown).Although the behaviorof thissimplifiedKendama model is slightlydifferentfrom

Page 16: A Kendama Learning Robot Based on Bi-directional Theory

1296

the behavior of the real Kendama, and betterapproximate models were developed (e.g., Fujii etal., 1993), the qualitativebehavior of our simulatedKendama is good enough for our purpose. Thebehaviorof theball is qualitativelysimilarto therealone as shown in the next subsection.

5.2. Usinga FewInstancesof HumanDemonstration

In this simulation, we used three sets of humandemonstration data. The measurementwas per-formed as described in Subsection4.2. Two right-handed subjects (subjects A and C) and one left-handed subject (subject B) participated in thismeasurement. We asked subject B to performKendama with the right hand, since our robot hasonly a rightarm.

The dataof subjectA wereexactlythesameas thedata used in Section 4. Figures 9 and 11 show theexecutions by the SARCOS arm before and afterlearning, respectively.Figures 13 and 14 show thesimulated robot’s performance based on datademonstrated by subject A. Comparing the twopairs of figures(Figures 9, 13 and Figures 11, 14),thesimulatedKendamashowsquitesimilarbehaviorto the realKendamaexecutedby the realrobot.

Figures 15 and 16 show the simulatedexecutionafterfive learningcycles usingdatademonstratedby

H. iUiyarnoto et al.

subject B and subject C, respectively. The taskperformancewas successfulwhen using the data ofsubjectB, aswellaswhenusingthedataof subjectA.The simulatedrobot, however,failedwhenusingthedata of subjectC.

In Figures 14, 15, and 16, it can be seenthat thevia-points extracted by the FIRM algorithm areassignedat similar locations among the individualdemonstrators.This suggeststhat the FIRM algo-rithm extracts a characteristiccommon to all thedemonstrated movements as a representation ofsegmentationof movementsinto the essentialparts.As shown in Figure 16, usingthe data of subject C,althoughthetasktarget(theball fallinginto thecup)was accomplishedalmostperfectly,the ball droppedfrom thecup. Note that,as shownin thefigure,therewas a considerably large error betweenthe desiredtrajectoryand therealizedtrajectoryespeciallyin thelatter half of the movement. In order to examinewhether the imperfection of the control mainlycausedthe failure,we usedan ideal robot insteadofthe dynamics model. Using the ideal robot whoserealizedjoint trajectoryis identicalto thedesiredone,the Kendama was executed successfully.Thus, weconcluded that the movementof subject C was tooquick for the simulatedSARCOS arm (also for therealrobot) to follow thedesiredtrajectorywithsmallerror. This coincideswith the observationthat when

-0.20[0.80(

!%-

-1.10[I I

*Y -1

-’1 i iF S t K t b l c s ( T c d( l r ( l b ( l t r A c a c s vh S a r ( F 9 l d

Page 17: A Kendama Learning Robot Based on Bi-directional Theory

A Kenakma Learning Robot 1297

x

-0.201Y 0.80[

0.00z 0.301

v MEVflni

1

-1.10[I I

0 1

F S t K t a f I c c a F ld a

0.601

-0.20{0.801

0.00~ 0.30

-1.101 I

-’1 6 i

ZoNr V

F S t K t a f l c c a F ld e

Page 18: A Kendama Learning Robot Based on Bi-directional Theory

1298 Miyanroto et al.

x 0.601

Y

zo.tfoao.3oa

-1.100I I

0 1

F S t K t o f I o c a F id a

usingthedatademonstratedby subjectC, joint limitswerehitand torquecommandsweresaturated.Usingdata of a trial in which subject C was asked toperform Kendama as gently as possible, theKendama could be executed successfully by thesimulatedSARCOS arm.

Changingthe Numberof Via-points

In Section4, we fixed the numberof via-pointsat 7for simplicity. In this subsection, we changed thenumber of via-points and investigated how thenumberof via-pointsaffectsthe task.

Figure 17(left) showsthecaseof 4 via-points.Therisingball was interceptedby the hand, becausetheavoiding motion of the hand was not reproducedsufficiently.In the case of the number of via-pointsranging from 5 to 9, the Kendama task wassuccessfullyexecutedwithin 5 learningcycles. In thecase of 10 via-points, the Kendama task wassuccessfullyexecuted, too, but ten learning cycleswereneeded.

Figure 17 (right) shows the case of 11 via-points.As shown in the figure, the desired trajectory wascorrugated and the error of the realizedtrajectorywas considerablylarge.

An adequatenumberof via-pointsis theminimumnumberof via-pointsneededto representthetask.Asmallernumber of via-pointswill increasethe error

betweenthe humantrajectoryand the reconstructedtrajectory, because of the insufficiency of theinformation for the movement. In contrast, anexeessivenumber of via-points overfits the trajec-tory: as the number of via-pointsincreasesover theoptimal, trivialpoints which are nonessentialfor thetaskwillbe selected.Suchunnecessaryvia-pointswillintroduce undesiredeffects and cause difficultiesincontrol.

In the caseof 11via-points[Figure17(right)],thedesiredtrajectorywascorrugatedandtheerrorof therealizedtrajectoryis considerablylarge. We suspectthat this notched desiredtrajectoryis caused by theFIRM algorithm.The intervaltime between a via-point and itsadjoiningvia-pointbecomestoo shortifthe FIRM algorithmextractsan excessivenumberofvia-points.Consequentlythe desiredjoint trajectorytends to be bumpy since the trajectoryformation isbased on the fifth order polynomial (see Subsection3.1). If the desired trajectory is not smooth,accelerationchangesrapidlyand quitea largetorqueis required to follow the desiredtrajectory. As thelearningcyclesarerepeated,thedegreeof notchingofthe desired trajectory increases. This undesirablemodification is probably a result of the impreciseestimationof the matrixAT,.,l/AX~, for ineqn (21). It shouldbe noted thatusingan idealrobot(whose realizedjoint trajectoryis identicalwith thedesired one), in the case of 11 via-points, the

Page 19: A Kendama Learning Robot Based on Bi-directional Theory

A Ken&ma Learning Robot 1299

r 0.60

r 0.80

0.001~ 0.301

-1.1OII I-“1 6 i

K

-0.20[f

“..’...

0.001if 0.301

B

-1.10[I I-’1 6 i

F S r a n v a f l c ( c 4 v (c v F l d

Kendama was executed successfullyeven when thedesiredtrajectorywas rugged.

5.4. SelectingVia-pointsat Equi-timeIntervals

When the FIRM algorithm extracts the via-points,FIRM selectsthe via-point at the point of maximalsquared error betweenthe given trajectoryand thegeneratedtrajectory,as describedin Subsection3.1.In this subsection, via-points were distributeduniformly at equal temporal intervals in order toexaminethe validityof the FIRM algorithm.

Figure 18 (left) shows the case of 7 via-pointsatequi-timeintervals.The hand cannot avoid therisingball. Figure 18 (right)shows the case of 9 via-pointsat equi-timeintervals.In this case the Kendamataskwas successfullyexecuted.Note thatmore via-pointsarerequiredwhenthevia-pointsare selectedat equi-time intervals.

We alignedthepositiondataso thatthetimeatthepeak hand height(maximumof z) was in the middleof the total movementtime (2 s). When we chose anodd number of via-points, the middle via-point isassigned at a time when the hand is just in thepositionat peakheight.In thepresentway of playingKendama,themajor constituentsof handmovementare an upward motion to yank up the ball, arightward motion to avoid the rising ball, and aleftwardmotion to catch the fallingball. In the case

of 9 via-points at equi-timeintervals,the via-pointshappento be located around thesepoints.

An Index of Manipulationof Task Learning

We compared an indexof abilityof manipulationoftask learning under various conditions. Here, weconsider the following scalar value as the index ofability

w = ~ AAT, (27)

whereA = AT,eJAXvia. denotesthetransposeofmatrixA. Figure 19 shows the abilityundervariousconditions.

In the case of 3 via-points with the FIRMalgorithm,the abilityis very low. The reconstructedtrajectory cannot reproduce the trajectory demon-strated by the human; even if the via-points aremodified,it is very difficultto control thebehaviorofthe ball. Moreover, an excessivelysmallvalue of wcauses inaccuracy of the inverse-transformationineqn (11). In the case of 4 via-pointswith the FIRMalgorithm, the ability is sufficientlyhigh. Althoughthe task target (yanking the ball up properly wasaccomplished, the motion to avoid the rising ball(rightward:x positivedirection)was not reproducedand the simulatedrobot failedKendama[seeFigure

Page 20: A Kendama Learning Robot Based on Bi-directional Theory

1300 H. Miyamoto et al.

K 0.600

-o.2oar o.8oa

O.oocz o.3oa

-1.loa, I

-’1 6(sec.) i

c 0.600

-0.20[f 0.80[

0.001z 0.30{

-1.1OII 1-’1 6(sec.) i

F S r e I v ( c 7 v ( c 9 vF l d

17 (left)]. In the case of 5–10 via-points with theFIRM algorithm, the ability is sufficientlyhigh,resultingin successfulexecution. In the case of 11via-pointswiththeFIRM algorithm,eventhoughtheabilitywassui%cientlyhigh, thedesiredtrajectorywascorrugatedas learningprogressed.

When we chose an even number of via-pointsatequi-timeintervals,the abilitywas too smallbecauseof thereasonmentionedin Subswtion5.4.In thecaseof 5 and 7 via-points at equi-time intervals, themotion to avoid the rising ball was not reproduced

120=

100

I [40

20--

345

and the simulatedrobot failed Kendama, althoughthe abilitywas sufficientlyhigh. In the case of 9 and11 via-points (11 case is not shown) at equi-timeintervals, the ability was sufficientlyhigh and thesimulated robot executed Kendama successfully,because, as mentioned previously, the via-pointshappento be located around the major constituentsof hand movement.

Generallyspeaking,the abilityis greaterwith theFIRM algorithmthanwiththeequi-timeintervals,asshownin Figure 19.It isconceivablethatan adequate

+

.time intervals

v

F i a m t l u v c

Page 21: A Kendama Learning Robot Based on Bi-directional Theory

A Kenalwna Learning Robot 1301

number of via-points extracted by the FIRMalgorithm is a good representationfor the humanmovementpatternin the case of Kendama.

CONCLUSION

We demonstratedthata generaltheoryof movementpatternperceptionbasedon a dynamicoptimizationtheory can be used for motion captureand learningby watching in robotics. In the presentexperiment,we chose the toy task Kendama to demonstratethepotential of our methods. In the future, we areplanning to extend our studies to even morecomplicatedtasks,such as a tennisserve.

So far, we have not made full use of visualfeedback to improve the performanceof our robot.Visual feedback was only employed to obtain anerror measurementafter an open-loop execution ofthe task. Clearly, when humansperform Kendama,they modify theirhand positionsduringthe flightoftheball basedon visualinformation,whichimprovestheir performance significantly.How does biologycoordinate such modifications?Reachingmovementsof monkeysandhumanshavebeenstudiedin thepastdecade (e.g., Geogopoulos et al., 1981). Severaltheoreticalmodels wereproposed to account for themodification of aimed movement in response to adisplacementof the target (e.g., Massone & Bizzi,1989; Flash & Henis, 1991; Flanagan et al., 1993;Hoff & Arbib, 1993).Before adding visualfeedbackcontrol to our system,more work willbe necessarytogain a better understandingof the mechanismsatwork in visuo-motor coordination.

REFERENCES

A A & R (T r l ProceedingsIEEE InternationalConferenceon RoboticsandAutomation.

B ( T g h e jr c p e i d

v m ExperimentalBrainResearch,95, 488–498.

B H & H (T is h e j d cv m E B R4

C I r m & c( e R A d

D P J B TW M & F ( Mm r ew p e tN 3 6

P F F G & R( U nm e a n e

s E xB R 1F e( R d a B

KF O & F ( C

t m ot ar JB 2 1

F ( c h e tm m B C 22

F & H ( cm e c m m

J N 1F & H ( t m d

r t v t J C Nr 3 2

Fujii, T., Yasuda, T., Yokoi, S., & T ( A vp m s a g wP I I W RH C

G K & M (S t r t a mE p u c t lJ N 4 7

G K & K ( H h sd d p m m Pi I E M B S (1

H N W & K ( Ac m s e is e i N N 1

H & A ( M t ft i r g J M

B 2 1K & I K. ( T a r i

t f p a g f oI T R A 4

K & K ( V ts e d m m pn i m B C 3

K ( O l n nf c c m

M & K ( A a Xs e p a ic n s j ( 821–849). Cam-b P

K ( B t a iI M ( A a

X C PK & G ( A c m f

r c b fB C 9

K F & S ( A hn m c l vm B C 1

K M U & S ( Tf m c n n mb m t c B C

2K I H & H (

C t n n m ib v c a ( T R T0 K A

K G K & K (S l c m cB ( C l c ( 1S F S P S IA M

K H & I ( A fo m r c b v aN C N S 4

Kurriyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning bywatching: extracting reusable task knowledge from visualobservation of human performance. I TR A 1 7

Page 22: A Kendama Learning Robot Based on Bi-directional Theory

1302 l

L & M ( m ts P r C 1

L C S & SK ( P s c Pg R 4

M ( V Y FM & B ( A n n m l

t f B C 6 4M ( S c m E

m B R 2N & H ( S l

m r a r a TS I C E 2 4

O & M ( I p sO J

P H B T BM C H & O( F a n ra o a J E B1 8

R & M ( G i ma pY J W

S A & B ( w s

l P S Y W AL S ( 1

S K G & K (I e m P c

c N 3 5U K & S ( F

c o t h mm t m B Cn 8

U S & K ( M mt m w r h mP S B Pg E ( 2 J

W & K ( A n n mt f u f i dm N N 6 9

W & K ( A t c hb m p B C7 3

W K V & K ( Ac t m p r b

o m p g B Cn 7 1