您的当前位置：首页 Learning behavior fusion estimation from demonstration, in IEEE Int. Symp. Robot and Human

Learning behavior fusion estimation from demonstration, in IEEE Int. Symp. Robot and Human

来源：汇智旅游网

LearningBehaviorFusionEstimationfromDemonstration

MonicaNicolescuandOdestChadwickeJenkinsandAdamOlenderski

Abstract

Acriticalchallengeinrobotlearningfromdemonstra-tionistheabilitytomapthebehaviorofthetrainerontotherobot’sexistingrepertoireofbasic/primitivecapabil-ities.Followingabehavior-basedapproach,weaimtoexpressateacher’sdemonstrationasalinearcombination(orfusion)oftherobot’sprimitives.Wetreatthisproblemasastateestimationproblemoverthespaceofpossiblelinearfusionweights.Weconsiderthisfusionstatetobeamodeloftheteacher’scontrolpolicyexpressedwithrespecttotherobot’scapabilities.Onceestimatedundervarioussensorypreconditions,fusionstateestimatesareusedasacoordinationpolicyforonlinerobotcontroltoimitatetheteacher’sdecisionmaking.Aparticleﬁlterisusedtoinferfusionstatefromcontrolcommandsdemonstratedbytheteacherandpredictedbyeachprimitive.Theparticleﬁlterallowsforinferenceundertheambiguityoveralargespaceoflikelyfusioncombinationsanddynamicchangestotheteacher’spolicyovertime.WepresentresultsofourapproachinasimulatedandrealworldenvironmentswithaPioneer3DXmobilerobot.

I.INTRODUCTION

Theproblemofestimatingthestateoftheuserliesatthecoreofhuman-robotinteraction.Suchstateinformationcanvaryoverseveralmodalities,suchasaffectivestateandsharedstate.Humanstateintheseproblemsisestimatedandusedtoguidearobot’sactionsorasystemdesignprocesstoimproveauser’sexperienceorperformance.Inthecaseoflearningfromdemonstration(LFD)[1],theobjectiveistoestimatethehuman’sdecisionstate.Thatis,thesinglecontrolpolicyoutofallpossiblecontrolpolicies

M.NicolescuandA.OlenderskiarewiththeUniversityofNevada,Reno,16N.VirginiaSt.MS171,Reno,NV,523

monica@cse.unr.edu,olenders@cse.unr.edu

O.C.JenkinsiswithBrownUniversity,115WatermanSt.,4thFloor,Providence,RI02912-1910cjenkins@cs.brown.edu

ThisworkhasbeensupportedbytheNationalScienceFoundationundercontractnumberIIS-06876andbyaUNRJuniorFacultyAwardtoMonicaNicolescu.

thatisutilizedbytheteacherduringademonstration.Onceestimated,thelearnedpolicycanbeusedastherobot’scontrolpolicy,assuminganappropriatetransformbetweenembodiments.LFDstrivestoprovideopentheprogrammingofrobotcontroltobroaderpopulationsinsocietybylearningimplicitlyfromhumanguidance,asopposedtoexplicitprogrammingofacomputerlanguage.InadditiontoLFD,knowledgeofauser’sdecisionstatecanbeusedtoinformonlinehuman-robotinteraction,suchasforadjustableautonomy[2].

Atahighlevel,estimatingauser’sdecisionstateisﬁndingthemostlikelycontrolpolicygivensensory-motorobservationsfromdemonstration.InthecontextofaMarkovDecisionProcess,acontrolpolicyµ(S)→AisdeﬁnedbyamappingofobservedworldstateStocontroloutputsA.Thiscontrolpolicygovernedbyavaluefunction,Q(S,A),thatspeciﬁesthebeneﬁtfortakingactionAinworldstateS.Thisvaluefunctiondeﬁnesthedecisionstateoverthespaceofallstateaction(S,A)pairs.AtkesonandSchaal[3]deﬁneoptimality(asarewardfunction)forthisproblemasminimizingthedivergenceinperformanceofthelearnedpolicyandobservationsfromademonstration.Speciﬁcally,givenaworldstateoccurringduringdemonstration,themotoroutputpredictedbythelearnedpolicyshouldresultchangestoworldstatetothedemonstration.Byphrasingoptimalitybasedondemonstration,policiescanbelearnedforarbitrarytaskswithout(orwith)biastowardsspeciﬁctasks.

However,learningsuchpoliciesaresubjecttoissuesinpartialobservabilityinworldstateandgeneralizationtonewsituations.Forinstance,theapproachofAtkesonandSchaal,andsubsequentwork[4],aregearedfortop-downplanningoverthespaceofcontrolpoliciesgivenfullyob-servableandrelevantaspectsofworldstate.However,suchmethodsaresuitedforlimitedorone-timegeneralizationthatvariesorreﬁnesexistingbehavior(e.g.,correctingademonstratedlocomotiongaitorvariationsofatennisswing).

Toaddresstheseissues,weproposeabehavior-basedapproachtolearningfromdemonstrationthatusesbe-haviorfusiontoprovidebottom-upgeneralizationtonewsituations.Assumingasetofpreexistingrobotbehaviors

expressedasschemas[5]orpotentialﬁelds,ouraimistolearnacoordinationpolicythatlinearlyfusestheircombinedoutputinamannerthatmatchestheteacher’sdemonstration.Wephrasethelearningofthiscoordinationasafusionestimationproblem,i.e.,stateestimationinthespaceoflinearcombinationsofprimitivebehaviors.Fordomainssuchasmobilerobotics,fusionestimationisoftensubjecttoambiguouschangesinworldstatethatareattributabletoalargespaceofsolutions.

Inthispaper,wepresentbehaviorfusionestimationasamethodtolearnfromdemonstrationfusionweightsforcoordinatingconcurrentlyexecutingprimitivebehaviors.Toaccountforthisambiguityanddynamicchangestotheuser’sfusionpolicy,aparticleﬁlterisusedtoinferfu-sionestimatesfromrobotsensoryobservationsandmotorcommands.Wefocusonthelimitedcaseinwhichfusionisassumedtobeunimodalforeachdiscretecombinationofbehaviorpreconditions.Resultsarepresenteddemon-stratingfusionpolicieslearnedfromdatacollectedduringsimulatedandreal-worldnavigationdemonstrationsofateleoperatedPioneer3DXrobot.

II.RELATEDWORK

Ourworkfallsintothecategoryoflearningbyexpe-rienceddemonstrations.Thisapproachimpliestherobotactivelyparticipatesinthedemonstrationprovidedbytheteacher,andexperiencesthetaskthroughitsownsensors.Successfulﬁrstpersonapproacheshavedemonstratedlearningofreactivepolicies[6],trajectories[7],orhigh-levelrepresentationsofsequentialtasks[8].Theseap-proachesemployateacherfollowingstrategy,wheretherobotlearnerfollowsahumanorarobotteacher.Suchsequential(orarbitrated)representationsforcoordinationcanbeconsideredasubsetofourapproach.Incontrast,ouraimistoavoidhardbinarydecisionsincoordinationofandfusingcontrolcommandsfromdifferentbehaviors.Thisresultsinageneral-purposepolicy,whichwouldallowtherobottoperformthedemonstratedtaskinanynewenvironments,fromanyinitialpositions.Wedonotattempttoreproduceexacttrajectories,butratherlearntheunderlyingpolicyforexecutingthetask.Inmakingbinarydecisions,sequentialmethodsrepresentallpossiblecoordinationsasdiscretesetofpossibilitiesalongeachaxisoffusionspace.Plattetal.[9]proposednull-spacecompositionasafusioncoordinationmechanismlimitedtocontrolstateswherebehaviorsdonotaffecteachother.Thiscoordinationallowsforoff-axis(butdiscrete)com-binationsinfusionspace.AlthoughtheworkofPlattetal.isappliedtodexterousmanipulation,wefocusonlyontheircoordinationmechanismandnotplatformspeciﬁcs.Thechoiceoftheparticleﬁlter[10]isonlyoneofseveralmethodsavailabletoinferbehaviorfusion.The

moststraightforwardchoiceisalinearleastsquaresop-timization.Whileleastsquaresworkedwellwhenlittleambiguitywaspresentinthefusionlikelihood,ourpriortestingshowedthatitdidnotsufﬁcientlyﬁndfunctionalfusionpoliciesasthenumberofprimitivesincreasedandintroducedgreaterambiguity.Nonlinearmethods,suchasLevenberg-Marquardt,couldyieldbetterresults.However,wechosetheparticleﬁltertoaccountforambiguityexplic-itly.LFDintheformreinforcementlearningmethods,suchasQ-Learning[11],areaviableoptionforfullyobservablestates,butarenontrivialtoextendforpartialobservability.Asigniﬁcantchallengeforallroboticsystemsthatlearnfromateacher’sdemonstrationistheabilitytomaptheperceivedbehaviorofthetrainertotheirownbehaviorrepertoire.Wefocusonthespeciﬁcproblemoflearningbehaviorfusionfromdemonstrationthatcouldbecastintomoreholisticapproachestohuman-robotinteraction,suchasworkbyBreazealetal.[12].Onesuccessfulapproachtothisproblemhasbeentomatchobservationstorobotbehaviorsbasedonforwardmodels[13],[14],inwhichmultiplebehaviormodelscompeteforpredictionoftheteacher’sbehavior[15],[16],andthebehaviorwiththemostaccuratepredictionistheonesaidtomatchtheobservedaction.

III.BEHAVIORREPRESENTATION

Behavior-BasedControl(BBC)hasbecomeoneofthemostpopularapproachestoembeddedandroboticsystemcontrolbothinresearchandinpracticalapplications.Weutilizeaschema-basedrepresentationinthecontextofBBC,similartoapproachesin[5].ThischoiceisessentialforthepurposeofourworkbecauseschemaswithBBCprovideacontinuousencodingofbehavioralresponsesandauniformoutputintheformofvectorsgeneratedusingapotentialﬁeldsapproach.

Inoursystem,acontrollerconsistsofasetofcon-currentlyrunningbehaviors.Thus,foragiventask,eachbehaviorbringsitsowncontributiontotheoverallmotorcommand.Thesecontributionsareweightedsuchthat,forexample,anobstacleavoidancebehaviorcouldhaveahigherimpactthanreachingatarget,iftheobstaclesintheﬁeldaresigniﬁcantlydangeroustotherobot.Alternatively,inatimeconstrainedtask,therobotcouldgiveahighercontributiontogettingtothedestinationthantoobstaclesalongtheway.Theseweightsaffectthemagnitudeoftheindividualvectorscomingfromeachbehavior,thusgeneratingdifferentmodalitiesofexecutionforthetask.

IV.BEHAVIORFUSIONESTIMATION

Theprimaryfunctioninbehaviorfusionestimationistoinfer,fromateacherprovideddemonstration,thecontribu-

tion(orweight)ofeachprimitiveintherobot’srepertoiresuchthattheircombinationmatchestheobservedoutcome.Theseweightsmodulatethemagnitudeofcontrolvectoroutputbytheindividualprimitives,thusinﬂuencingtheresultingcommandfromfusionandconsequentlythewaytherobotinteractswiththeworld.However,choosingtheseweightsisanon-trivialproblem.Tosavetimeandresources(suchasrobotpower),weautomaticallyestimateappropriateweightsforfusingbehaviorsaccordingtothedesirednavigationstyleasdemonstrated.

ForasetofNprimitives,behaviorfusionestimationisaccomplishedbyestimatingthejointprobabilitydistri-butionofthefusionspace(i.e.,acrossweightingcombi-nations)overthedemonstrationduration.Forthiswork,demonstrationsconsistedofguidingtherobotthroughanavigationtask,usingajoystick,whiletherobot’sbehaviorscontinuouslyprovidepredictionsonwhattheiroutputswouldbe(intheformofa2Dspeedandheadingvectorintherobot’scoordinatesystem)forthecurrentsensoryreadings.However,insteadofbeingtranslatedintomotorcommands,thesepredictionsarerecordedalongwiththeturningrateoftherobot,atthatmomentoftime.Thus,foreachtimestept,weareprovidedwithasetof

predictionvectorsVpt=vt1...vt

fromeachprimitiveandademonstrationvectorVrobot.Itisknownrt

expressingN

therealizedcontroloutputofthethattheresultingvectorValinearcombinationofthepredictionvectors[vrt

istaccordingtosomeunknownsuperpositionweights1···vt

SN]

[st1···st

N]:

Vrt=󰀂

Nstivt

(1)i=1

Weconsiderheadingtobethemostimportantconsider-ationforbehaviorfusionin2Dnavigation.Consequently,wenormalizecommandvectorstounitlength.

Thegoalofthealgorithmistoinfertheweightssovertimeor,morepreciselytherelativeproportionsamongtheweightsthatcouldproducethedemonstrationvectorVr.

A.IncorporatingBehaviorPreconditions

Atanygiventimeduringthedemonstration,multiplebehaviorscouldbeactive,dependingonwhethertheirpreconditionsaremetornot.Wesegmentthedemonstra-tionintointervalsbasedonthebinarydecisionssetbythepreconditionsofeachbehavior.Asegmentationofthedemonstrationtraceisperformedatthemomentsoftimewhenthestatusofanyofthebehaviors’preconditionschangesbetweenmetandnot-met.Theresultingsegmentsrepresentdifferentenvironmentalsituations,sincedifferentbehaviorsbecome“applicable”atthetransitionpoints.Theweightsofbehaviorswithineachsegmentencodethemodeofperformingthecurrenttaskgiventhesituationand,thus

withineachsegment,theweightsoftheapplicablebehav-iorsareconstant.Forexample,foratargetreachingtask,

therobotcouldbehaveundertheinﬂuenceofcorridor-follow,target-followandavoid-obstaclebehaviorsifinthepresenceofobstacle,butwouldbehaveonlyundertheinﬂuenceoftarget-followifinanopenspace.

B.FusionParameterEstimation

SimilartoMonteCarlorobotlocalization,aparticleﬁlterisusedtorecursivelyestimatethejointdensityintheparameterspaceoffusionweightsStovertimet=1···T.Particleﬁlters[17]havebeenusedforstateandparameterestimationinseveraldifferentdomains(suchasrobotlocalization[10],poseestimation[18],andinsecttracking[19]).Restatingthesemethods,mostlyfollowing[19],weusethestandardformoftheBayesﬁltertoestimatetheposteriorprobabilitydensityp(St|Vr1:t,Vp1:t)inthespaceoffusionparametersgivenpredictionandresultvectors:p(St|Vr1:t,Vp1:t)=

(2)

kp(Vrt,Vpt|St)

󰀃

p(St|St−1)p(St−1|Vr1:t−1,Vp1:t−1)

wherep(Vandr1:t,Vresultp1:t|St)isthelikelihoodofobservingpredictionvectorgivenavectoroffusionparameters,p(St|St−1)isthemotionmodeldescribingtheexpecteddisplacementofparameterweightsoveratimestep,p(St−1|Vfromther1:t−1,Vpreviousp1:t−1)isthepriorprobabilitydistributiontimestep,andkisanor-malizationconstanttoenforcethatthedistributionsumstoone.Wesimplifythelikelihoodusingthechainruleofprobabilityanddomainknowledge(Eq.1)thatpredictionvectorsarenotdependentonthefusionweights:p(Vrt,Vpt|St)=p(Vrt|Vpt,St)p(Vp1:t|St)=p(Vrt|Vpt,St)

(3)

TheresultingBayesﬁlter:p(St|Vr1:t,Vp1:t)=

(4)

kp(Vrt|Vpt,St)

󰀃

p(St|St−1)p(St−1|Vr1:t−1,Vp1:t−1)

hasaMonteCarloapproximationthatrepresentstheposteriorasparticledistributionofMweightedsamples

{St

(j),πt(j)}Mj=1,whereSt(j)isaparticlerepresentingaspe-ciﬁchypothesisforfusionweightsandπtparticleproportionaltoitsposterior(j)istheweightoftheprobability:

p(St|Vr1:t,Vp1:t)∝kp(Vrt|Vpt,St)󰀂

πt(j)p(St|S

t−1

(5)Theestimationoftheposteriorattimetisperformedby1)importancesamplingtodrawnewparticlehypothesesSt(j)fromtheposteriorattimet−1and2)computing

weightsπ(j)foreachparticlefromthelikelihood.Im-portancesamplingisperformedbyrandomlyassigning

t−1tt−1

particleS(andi)toparticlesS(j)basedonweightsπ

addingGaussiannoise.Thisprocesseffectivelysamplesthefollowingproposaldistribution:󰀂

ttt−1tt

π((6)S(∼q(S)󰀁j)p(S|S(j))i)(i)

andweightsbythefollowinglikelihoodasthedistance

betweenactualandpredicteddisplacementdirection:

tttttˆtπ((7)i)=p(Vr|Vp,S)=2−D(Vr,V(i))/2whereD(a,b)istheEuclideandistancebetweenaand

band:󰀁N

ttS(vkk=1i),ktˆ=V

(i)

Fig.3.ResultsfromScenario3.Therobotusedlearnedfusionweightsfromtheleft,centerandrespectivelyrightsidedemonstrations.

learnstheproperweightsfordealingwiththissituation.Inthisimage,asforthefollowingplots,therobotpathisinred,thelaserrangereadingsareinblueandthedetectionofthegoalisingreen.

B.Scenario2

Duringthesecondsetofexperiments,therobotwasequippedwiththesamesetofbehaviorsasinScenario1,withtheonlydifferencethatthewereplacedthegeneralwallAttractbehaviorwithwallAttract-leftandwallAttract-right.Theleftandrightwallattractbehaviorsrespondonlytothewallsontheircorrespondingside,asopposedtoallthecombinedapproachofregularwallAttract.Theexperimentswereperformedinasimplesimulateden-vironment,showninFigure1,bottomrightinset.Thefourdemonstrationsconsistedoftakingthefollowingpathsthroughtheenvironment:onethroughtheuppercorridor,onethroughthebottom(wide)corridor,inbothcaseskeepingontheleft,thenontheright.WiththeweightslearnedinthisenvironmentwetestedthebehavioroftherobotinasimulatedSEMbuildingmap.

Results.Asopposedtotheﬁrstsetofexperiments,duetothesplitofthewall-Attractbehavior,therobotnowlearnedadifferenceonwhichsidetofavorwhennavigatingacorridor.However,amoresigniﬁcantdifferenceinbehav-iorisapparentwhentherobotapproachesaT-junction.Whenusingtheweightslearnedfromtheright-followingdemonstrations(forbothcorridordemonstrations),therobotturnstotherightwhenitapproachesanintersection.Thatis,whenitisgiventhechoiceofeither“rightorleft”or“rightorstraight,”itwillgoright,becauseofhowstronglyitisattractedtotherightwall.Whenitisgiventhechoice“straightorleft,”itwillgostraight.Withtheweightslearnedfromtheleft-followingdemonstrations,therobotexhibitsasimilarbehaviorfortheleftside.Theseresultsdemonstratetherobustnessofourapproach,inthatcontrollerslearnedinoneenvironmenttranslatetodifferentenvironmentsaswell.

C.Scenario3

Inthethirdsetofexperiments,thesetofbehaviorswasthesameasinScenario2,withtheonlydifferencethat

insteadofthewanderbehavior,weuseditssplitversions,wander-leftandwander-right.Thesetwobehaviorsseekleftandrespectivelyrightopenspaces,asopposedtotheregularwander,whichlooksforopenspaceinanydirection.Weperformedthreedemonstrations,intheSEMsimulatedenvironment,eachconsistingoftakingatourofthebuilding,asfollows:1)keeptotheright,2)keeptothecenter,and3)keeptotheleft.

Results.Asexpectedfromtheseexperiments,therobothaslearnedsimilarpreferencesasthoseinthesecondscenario.Figure3(left)showsthetrajectoryoftherobot,usingthecontrollerlearnedfromtheleftfollowdemonstration.Therobotstartsatthetopoftheleftcorridor,choosesaleftturnattheT-junction,thenstopsatthegoal.Duringtheentireruntherobotkeepsclosertothewallontheleft.Figure3(right)showsasimilarpreference,thistimeforfollowingwallsontherightandchoosingarightturnatT-junctions.Inthisexperiment,therobotstartsatthetopoftherightcorridorandmovesdownandleft.WhilefortheleftandrightpreferencestherobotmakesaclearturninaparticulardirectionwhenreachingtheT-junction,forthecenterexperimenttherobotshowsthatitdoesnothaveapreferreddirection,asshownbyhowfaritgoesintotheT-junctionbeforedecidingtoturn,duetoitswanderbehavior(Figure3(center)).Therobotnavigatesclosertotheleftduetothewanderingbehavior,whichattractstherobottotheopenspacesthroughthedoors.Weonlyshowgoalreachingcapabilityfortheleftexperiment:westoppedthecenterandrightexperimentsbeforetherobotmadethefullturnintheenvironmenttoreachthegoal,locatedinadifferentareaintheenvironment.Ifallowedtorunlonger,therobotwasabletoreachthegoalinallsituations.

Inadditiontolearningtheleftandrightpreference,ourresultsdemonstratethattheadditionalreﬁnementoftheunderlyingbehaviorsetinawander-leftandwander-rightbehavior,allowedtherobottocaptureadditionalaspectsofthedemonstration.Inparticular,whenevertherobotfounditselfinthemiddleofaT-junction,withopenspaceonbothsides,therobotwouldchoosetogointhedirectiongivenbythepreferenceexpressedduringthedemonstration:right

Fig.4.Therobotlearnspreferencesofwan-deringindirectionsspeciﬁedduringthedemonstration(leftandrightrespectively).

fortherightweightsandleftfortheleftweights.Thispreferencewasdemonstratedeveninthecaseinwhichtherobothadmoreopenspaceintheoppositedirection.Underequalweightingofleftandrightwandering,therobotwouldnormallyfollowthelargeropenspace.Figure4showsthispreferencethroughtherobot’strajectory.Intheleftimage,therobotisusingtheweightslearnedfromtheleft-followdemonstration.WhiletherobotstartsorientedslightlytowardtherightinthemiddleoftheT-junction,asshownbyitslaserproﬁle,thehigherweightofthewander-leftbehaviorpullstherobotintheleftdirection.Similarly,intherightimage,therobotusestheweightsfromtheright-followdemonstration.Eveniforientedslightlytotheleft,wherethereismoreopenspace,therobotchoosestogorightduetothehigherweightofwander-right.

Theapproachwepresenteddemonstratestheimpor-tanceofconsideringconcurrentlyrunningbehaviorsasunderlyingmechanismsforachievingatask.Ourmethodallowsforlearningofboththegoalsinvolvedinthetask(e.g.,reachingatarget)andalsooftheparticularwaysinwhichthesametaskcanbeperformed.Inaddition,ourresultsdemonstratetheimportanceofchoosingtheprimitivebehaviorset,animportantandstillopenissueforbehavior-basedresearch.Ourlearnedcontrollersarenotrestrictedtoaaparticularpathorexecutionsequenceandthusaregeneralenoughtoexhibitmeaningfulbehavioreveninenvironmentsdifferentfromtheoneinwhichthedemonstrationtookplace.

VI.SUMMARY

Wepresentedamethodforrobottasklearningfromdemonstrationthataddressestheproblemofmappingobservationstorobotbehaviorsfromanovelperspective.Ourclaimisthatmotorbehavioristypicallyexpressedintermsofconcurrentcontrolofmultipledifferentactivities.Tothisend,wedevelopedalearningbydemonstrationapproachthatallowsarobottomapthedemonstrator’sactionsontomultiplebehaviorprimitivesfromitsreper-toire.Thismethodhasbeenshowntocapturenotonlytheoverallgoalsofthetask,butalsothespeciﬁcsoftheuser’sdemonstration,thusenablingadditionalcapabilitiesthroughlearningbydemonstration.

References

[1]S.Schaal,“Isimitationlearningtheroutetohumanoidrobots,”

TrendsinCognitiveSciences,vol.3,no.6,pp.233–242,1999.[2]M.Goodrich,D.Olsen,J.Crandall,andT.Palmer,“Experimentsin

adjustableautonomy.”2001.

[3]C.G.AtkesonandS.Schaal,“Robotlearningfromdemonstration,”

inICML’97:ProceedingsoftheFourteenthInternationalConfer-enceonMachineLearning.SanFrancisco,CA,USA:MorganKaufmannPublishersInc.,1997,pp.12–20.

[4]S.Schaal,J.Peters,J.Nakanishi,andA.Ijspeert,“Learning

movementprimitives,”inInternationalSymposiumonRoboticsResearch,2004.

[5]R.C.Arkin,“Motorschemabasednavigationforamobilerobot:

Anapproachtoprogrammingbybehavior,”inIEEEConferenceonRoboticsandAutomation,1987,1987,pp.2–271.

[6]G.HayesandJ.Demiris,“Arobotcontrollerusinglearningby

imitation,”inProc.oftheIntl.Symp.onIntelligentRoboticSystems,Grenoble,France,1994,pp.198–204.

[7]P.Gaussier,S.Moga,J.Banquet,andM.Quoy,“Fromperception-actionloopstoimitationprocesses:Abottom-upapproachoflearningbyimitation,”AppliedArtiﬁcialIntelligenceJournal,vol.12(78),pp.701–729,1998.

[8]M.N.NicolescuandM.J.Matari´c,“Naturalmethodsforrobottask

learning:Instructivedemonstration,generalizationandpractice,”inProc.,SecondIntl.JointConf.onAutonomousAgentsandMulti-AgentSystems,Melbourne,Australia,July2003.

[9]R.Platt,A.H.Fagg,andR.R.Grupen,“Manipulationgaits:

Sequencesofgraspcontroltasks,”inIEEEConferenceonRoboticsandAutomation,NewOrleans,LA,USA,2004,pp.801–806.[10]S.Thrun,W.Burgard,andD.Fox,ProbabilisticRobotics.MIT

Press,2005.

[11]W.D.SmartandL.P.Kaelbling,“Effectivereinforcementlearning

formobilerobots,”inProceedingsofIEEEInternationalConfer-enceonRoboticsandAutomation(ICRA2002),vol.4,May2002,pp.3404–3410.

[12]C.Breazeal,A.Brooks,J.Gray,G.Hoffman,C.Kidd,H.Lee,

J.Lieberman,A.Lockerd,,andD.Mulanda,“Tutelageandcollab-orationforhumanoidrobots,”InternationalJournalofHumanoidRobotics,vol.1,no.2,2004.

[13]S.Schaal,“Learningfromdemonstration,”AdvancesinNeural

InformationProcessingSystems,vol.9,pp.1040–1046,1997.[14]O.C.JenkinsandM.J.Matari´c,“Performance-derivedbehavior

vocabularies:Data-drivenacqusitionofskillsfrommotion,”Inter-nationalJournalofHumanoidRobotics,vol.1,no.2,pp.237–288,Jun2004.

[15]D.WolpertandM.Kawato,“Multiplepairedforwardandinverse

modelsformotorcontrol,”NeuralNetworks,vol.11,pp.1317–1329,1998.

[16]O.C.Jenkins,“Data-drivenderivationofskillsforautonomous

humanoidagents,”Ph.D.dissertation,TheUniversityofSouthernCalifornia,2003.

[17]S.Arulampalam,S.Maskell,N.Gordon,andT.Clapp,“Atuto-rialonparticleﬁltersforon-linenon-linear/non-gaussianbayesiantracking,”IEEETransactionsonSignalProcessing,vol.50,no.2,pp.174–188,Feb.2002.

[18]M.IsardandA.Blake,“Condensation–conditionaldensityprop-agationforvisualtracking,”InternationalJournalofComputerVision,vol.29,no.1,pp.5–28,1998.

[19]Z.Khan,T.R.Balch,andF.Dellaert,“Arao-blackwellizedparticle

ﬁlterforeigentracking,”inIEEEComputerVisionandPatternRecognition,vol.2,2004,pp.980–986.

[20]B.Gerkey,R.T.Vaughan,andA.Howard.,“Theplayer/stage

project:Toolsformulti-robotanddistributedsensorsystems,”inProc.,the11thInternationalConferenceonAdvancedRobotics,2003,pp.317–323.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文