MonicaNicolescuandOdestChadwickeJenkinsandAdamOlenderski
Abstract
Acriticalchallengeinrobotlearningfromdemonstra-tionistheabilitytomapthebehaviorofthetrainerontotherobot’sexistingrepertoireofbasic/primitivecapabil-ities.Followingabehavior-basedapproach,weaimtoexpressateacher’sdemonstrationasalinearcombination(orfusion)oftherobot’sprimitives.Wetreatthisproblemasastateestimationproblemoverthespaceofpossiblelinearfusionweights.Weconsiderthisfusionstatetobeamodeloftheteacher’scontrolpolicyexpressedwithrespecttotherobot’scapabilities.Onceestimatedundervarioussensorypreconditions,fusionstateestimatesareusedasacoordinationpolicyforonlinerobotcontroltoimitatetheteacher’sdecisionmaking.Aparticlefilterisusedtoinferfusionstatefromcontrolcommandsdemonstratedbytheteacherandpredictedbyeachprimitive.Theparticlefilterallowsforinferenceundertheambiguityoveralargespaceoflikelyfusioncombinationsanddynamicchangestotheteacher’spolicyovertime.WepresentresultsofourapproachinasimulatedandrealworldenvironmentswithaPioneer3DXmobilerobot.
I.INTRODUCTION
Theproblemofestimatingthestateoftheuserliesatthecoreofhuman-robotinteraction.Suchstateinformationcanvaryoverseveralmodalities,suchasaffectivestateandsharedstate.Humanstateintheseproblemsisestimatedandusedtoguidearobot’sactionsorasystemdesignprocesstoimproveauser’sexperienceorperformance.Inthecaseoflearningfromdemonstration(LFD)[1],theobjectiveistoestimatethehuman’sdecisionstate.Thatis,thesinglecontrolpolicyoutofallpossiblecontrolpolicies
M.NicolescuandA.OlenderskiarewiththeUniversityofNevada,Reno,16N.VirginiaSt.MS171,Reno,NV,523
monica@cse.unr.edu,olenders@cse.unr.edu
O.C.JenkinsiswithBrownUniversity,115WatermanSt.,4thFloor,Providence,RI02912-1910cjenkins@cs.brown.edu
ThisworkhasbeensupportedbytheNationalScienceFoundationundercontractnumberIIS-06876andbyaUNRJuniorFacultyAwardtoMonicaNicolescu.
thatisutilizedbytheteacherduringademonstration.Onceestimated,thelearnedpolicycanbeusedastherobot’scontrolpolicy,assuminganappropriatetransformbetweenembodiments.LFDstrivestoprovideopentheprogrammingofrobotcontroltobroaderpopulationsinsocietybylearningimplicitlyfromhumanguidance,asopposedtoexplicitprogrammingofacomputerlanguage.InadditiontoLFD,knowledgeofauser’sdecisionstatecanbeusedtoinformonlinehuman-robotinteraction,suchasforadjustableautonomy[2].
Atahighlevel,estimatingauser’sdecisionstateisfindingthemostlikelycontrolpolicygivensensory-motorobservationsfromdemonstration.InthecontextofaMarkovDecisionProcess,acontrolpolicyµ(S)→AisdefinedbyamappingofobservedworldstateStocontroloutputsA.Thiscontrolpolicygovernedbyavaluefunction,Q(S,A),thatspecifiesthebenefitfortakingactionAinworldstateS.Thisvaluefunctiondefinesthedecisionstateoverthespaceofallstateaction(S,A)pairs.AtkesonandSchaal[3]defineoptimality(asarewardfunction)forthisproblemasminimizingthedivergenceinperformanceofthelearnedpolicyandobservationsfromademonstration.Specifically,givenaworldstateoccurringduringdemonstration,themotoroutputpredictedbythelearnedpolicyshouldresultchangestoworldstatetothedemonstration.Byphrasingoptimalitybasedondemonstration,policiescanbelearnedforarbitrarytaskswithout(orwith)biastowardsspecifictasks.
However,learningsuchpoliciesaresubjecttoissuesinpartialobservabilityinworldstateandgeneralizationtonewsituations.Forinstance,theapproachofAtkesonandSchaal,andsubsequentwork[4],aregearedfortop-downplanningoverthespaceofcontrolpoliciesgivenfullyob-servableandrelevantaspectsofworldstate.However,suchmethodsaresuitedforlimitedorone-timegeneralizationthatvariesorrefinesexistingbehavior(e.g.,correctingademonstratedlocomotiongaitorvariationsofatennisswing).
Toaddresstheseissues,weproposeabehavior-basedapproachtolearningfromdemonstrationthatusesbe-haviorfusiontoprovidebottom-upgeneralizationtonewsituations.Assumingasetofpreexistingrobotbehaviors
expressedasschemas[5]orpotentialfields,ouraimistolearnacoordinationpolicythatlinearlyfusestheircombinedoutputinamannerthatmatchestheteacher’sdemonstration.Wephrasethelearningofthiscoordinationasafusionestimationproblem,i.e.,stateestimationinthespaceoflinearcombinationsofprimitivebehaviors.Fordomainssuchasmobilerobotics,fusionestimationisoftensubjecttoambiguouschangesinworldstatethatareattributabletoalargespaceofsolutions.
Inthispaper,wepresentbehaviorfusionestimationasamethodtolearnfromdemonstrationfusionweightsforcoordinatingconcurrentlyexecutingprimitivebehaviors.Toaccountforthisambiguityanddynamicchangestotheuser’sfusionpolicy,aparticlefilterisusedtoinferfu-sionestimatesfromrobotsensoryobservationsandmotorcommands.Wefocusonthelimitedcaseinwhichfusionisassumedtobeunimodalforeachdiscretecombinationofbehaviorpreconditions.Resultsarepresenteddemon-stratingfusionpolicieslearnedfromdatacollectedduringsimulatedandreal-worldnavigationdemonstrationsofateleoperatedPioneer3DXrobot.
II.RELATEDWORK
Ourworkfallsintothecategoryoflearningbyexpe-rienceddemonstrations.Thisapproachimpliestherobotactivelyparticipatesinthedemonstrationprovidedbytheteacher,andexperiencesthetaskthroughitsownsensors.Successfulfirstpersonapproacheshavedemonstratedlearningofreactivepolicies[6],trajectories[7],orhigh-levelrepresentationsofsequentialtasks[8].Theseap-proachesemployateacherfollowingstrategy,wheretherobotlearnerfollowsahumanorarobotteacher.Suchsequential(orarbitrated)representationsforcoordinationcanbeconsideredasubsetofourapproach.Incontrast,ouraimistoavoidhardbinarydecisionsincoordinationofandfusingcontrolcommandsfromdifferentbehaviors.Thisresultsinageneral-purposepolicy,whichwouldallowtherobottoperformthedemonstratedtaskinanynewenvironments,fromanyinitialpositions.Wedonotattempttoreproduceexacttrajectories,butratherlearntheunderlyingpolicyforexecutingthetask.Inmakingbinarydecisions,sequentialmethodsrepresentallpossiblecoordinationsasdiscretesetofpossibilitiesalongeachaxisoffusionspace.Plattetal.[9]proposednull-spacecompositionasafusioncoordinationmechanismlimitedtocontrolstateswherebehaviorsdonotaffecteachother.Thiscoordinationallowsforoff-axis(butdiscrete)com-binationsinfusionspace.AlthoughtheworkofPlattetal.isappliedtodexterousmanipulation,wefocusonlyontheircoordinationmechanismandnotplatformspecifics.Thechoiceoftheparticlefilter[10]isonlyoneofseveralmethodsavailabletoinferbehaviorfusion.The
moststraightforwardchoiceisalinearleastsquaresop-timization.Whileleastsquaresworkedwellwhenlittleambiguitywaspresentinthefusionlikelihood,ourpriortestingshowedthatitdidnotsufficientlyfindfunctionalfusionpoliciesasthenumberofprimitivesincreasedandintroducedgreaterambiguity.Nonlinearmethods,suchasLevenberg-Marquardt,couldyieldbetterresults.However,wechosetheparticlefiltertoaccountforambiguityexplic-itly.LFDintheformreinforcementlearningmethods,suchasQ-Learning[11],areaviableoptionforfullyobservablestates,butarenontrivialtoextendforpartialobservability.Asignificantchallengeforallroboticsystemsthatlearnfromateacher’sdemonstrationistheabilitytomaptheperceivedbehaviorofthetrainertotheirownbehaviorrepertoire.Wefocusonthespecificproblemoflearningbehaviorfusionfromdemonstrationthatcouldbecastintomoreholisticapproachestohuman-robotinteraction,suchasworkbyBreazealetal.[12].Onesuccessfulapproachtothisproblemhasbeentomatchobservationstorobotbehaviorsbasedonforwardmodels[13],[14],inwhichmultiplebehaviormodelscompeteforpredictionoftheteacher’sbehavior[15],[16],andthebehaviorwiththemostaccuratepredictionistheonesaidtomatchtheobservedaction.
III.BEHAVIORREPRESENTATION
Behavior-BasedControl(BBC)hasbecomeoneofthemostpopularapproachestoembeddedandroboticsystemcontrolbothinresearchandinpracticalapplications.Weutilizeaschema-basedrepresentationinthecontextofBBC,similartoapproachesin[5].ThischoiceisessentialforthepurposeofourworkbecauseschemaswithBBCprovideacontinuousencodingofbehavioralresponsesandauniformoutputintheformofvectorsgeneratedusingapotentialfieldsapproach.
Inoursystem,acontrollerconsistsofasetofcon-currentlyrunningbehaviors.Thus,foragiventask,eachbehaviorbringsitsowncontributiontotheoverallmotorcommand.Thesecontributionsareweightedsuchthat,forexample,anobstacleavoidancebehaviorcouldhaveahigherimpactthanreachingatarget,iftheobstaclesinthefieldaresignificantlydangeroustotherobot.Alternatively,inatimeconstrainedtask,therobotcouldgiveahighercontributiontogettingtothedestinationthantoobstaclesalongtheway.Theseweightsaffectthemagnitudeoftheindividualvectorscomingfromeachbehavior,thusgeneratingdifferentmodalitiesofexecutionforthetask.
IV.BEHAVIORFUSIONESTIMATION
Theprimaryfunctioninbehaviorfusionestimationistoinfer,fromateacherprovideddemonstration,thecontribu-
tion(orweight)ofeachprimitiveintherobot’srepertoiresuchthattheircombinationmatchestheobservedoutcome.Theseweightsmodulatethemagnitudeofcontrolvectoroutputbytheindividualprimitives,thusinfluencingtheresultingcommandfromfusionandconsequentlythewaytherobotinteractswiththeworld.However,choosingtheseweightsisanon-trivialproblem.Tosavetimeandresources(suchasrobotpower),weautomaticallyestimateappropriateweightsforfusingbehaviorsaccordingtothedesirednavigationstyleasdemonstrated.
ForasetofNprimitives,behaviorfusionestimationisaccomplishedbyestimatingthejointprobabilitydistri-butionofthefusionspace(i.e.,acrossweightingcombi-nations)overthedemonstrationduration.Forthiswork,demonstrationsconsistedofguidingtherobotthroughanavigationtask,usingajoystick,whiletherobot’sbehaviorscontinuouslyprovidepredictionsonwhattheiroutputswouldbe(intheformofa2Dspeedandheadingvectorintherobot’scoordinatesystem)forthecurrentsensoryreadings.However,insteadofbeingtranslatedintomotorcommands,thesepredictionsarerecordedalongwiththeturningrateoftherobot,atthatmomentoftime.Thus,foreachtimestept,weareprovidedwithasetof
predictionvectorsVpt=vt1...vt
fromeachprimitiveandademonstrationvectorVrobot.Itisknownrt
expressingN
therealizedcontroloutputofthethattheresultingvectorValinearcombinationofthepredictionvectors[vrt
istaccordingtosomeunknownsuperpositionweights1···vt
SN]
t
=
[st1···st
N]:
Vrt=
Nstivt
i
(1)i=1
Weconsiderheadingtobethemostimportantconsider-ationforbehaviorfusionin2Dnavigation.Consequently,wenormalizecommandvectorstounitlength.
Thegoalofthealgorithmistoinfertheweightssovertimeor,morepreciselytherelativeproportionsamongtheweightsthatcouldproducethedemonstrationvectorVr.
A.IncorporatingBehaviorPreconditions
Atanygiventimeduringthedemonstration,multiplebehaviorscouldbeactive,dependingonwhethertheirpreconditionsaremetornot.Wesegmentthedemonstra-tionintointervalsbasedonthebinarydecisionssetbythepreconditionsofeachbehavior.Asegmentationofthedemonstrationtraceisperformedatthemomentsoftimewhenthestatusofanyofthebehaviors’preconditionschangesbetweenmetandnot-met.Theresultingsegmentsrepresentdifferentenvironmentalsituations,sincedifferentbehaviorsbecome“applicable”atthetransitionpoints.Theweightsofbehaviorswithineachsegmentencodethemodeofperformingthecurrenttaskgiventhesituationand,thus
withineachsegment,theweightsoftheapplicablebehav-iorsareconstant.Forexample,foratargetreachingtask,
therobotcouldbehaveundertheinfluenceofcorridor-follow,target-followandavoid-obstaclebehaviorsifinthepresenceofobstacle,butwouldbehaveonlyundertheinfluenceoftarget-followifinanopenspace.
B.FusionParameterEstimation
SimilartoMonteCarlorobotlocalization,aparticlefilterisusedtorecursivelyestimatethejointdensityintheparameterspaceoffusionweightsStovertimet=1···T.Particlefilters[17]havebeenusedforstateandparameterestimationinseveraldifferentdomains(suchasrobotlocalization[10],poseestimation[18],andinsecttracking[19]).Restatingthesemethods,mostlyfollowing[19],weusethestandardformoftheBayesfiltertoestimatetheposteriorprobabilitydensityp(St|Vr1:t,Vp1:t)inthespaceoffusionparametersgivenpredictionandresultvectors:p(St|Vr1:t,Vp1:t)=
(2)
kp(Vrt,Vpt|St)
p(St|St−1)p(St−1|Vr1:t−1,Vp1:t−1)
wherep(Vandr1:t,Vresultp1:t|St)isthelikelihoodofobservingpredictionvectorgivenavectoroffusionparameters,p(St|St−1)isthemotionmodeldescribingtheexpecteddisplacementofparameterweightsoveratimestep,p(St−1|Vfromther1:t−1,Vpreviousp1:t−1)isthepriorprobabilitydistributiontimestep,andkisanor-malizationconstanttoenforcethatthedistributionsumstoone.Wesimplifythelikelihoodusingthechainruleofprobabilityanddomainknowledge(Eq.1)thatpredictionvectorsarenotdependentonthefusionweights:p(Vrt,Vpt|St)=p(Vrt|Vpt,St)p(Vp1:t|St)=p(Vrt|Vpt,St)
(3)
TheresultingBayesfilter:p(St|Vr1:t,Vp1:t)=
(4)
kp(Vrt|Vpt,St)
p(St|St−1)p(St−1|Vr1:t−1,Vp1:t−1)
hasaMonteCarloapproximationthatrepresentstheposteriorasparticledistributionofMweightedsamples
{St
(j),πt(j)}Mj=1,whereSt(j)isaparticlerepresentingaspe-cifichypothesisforfusionweightsandπtparticleproportionaltoitsposterior(j)istheweightoftheprobability:
p(St|Vr1:t,Vp1:t)∝kp(Vrt|Vpt,St)
πt(j)p(St|S
t−1
)j
(5)Theestimationoftheposteriorattimetisperformedby1)importancesamplingtodrawnewparticlehypothesesSt(j)fromtheposteriorattimet−1and2)computing
t
weightsπ(j)foreachparticlefromthelikelihood.Im-portancesamplingisperformedbyrandomlyassigning
t−1tt−1
particleS(andi)toparticlesS(j)basedonweightsπ
addingGaussiannoise.Thisprocesseffectivelysamplesthefollowingproposaldistribution:
ttt−1tt
π((6)S(∼q(S)j)p(S|S(j))i)(i)
j
andweightsbythefollowinglikelihoodasthedistance
betweenactualandpredicteddisplacementdirection:
tttttˆtπ((7)i)=p(Vr|Vp,S)=2−D(Vr,V(i))/2whereD(a,b)istheEuclideandistancebetweenaand
band:N
ttS(vkk=1i),ktˆ=V
(i)
Fig.3.ResultsfromScenario3.Therobotusedlearnedfusionweightsfromtheleft,centerandrespectivelyrightsidedemonstrations.
learnstheproperweightsfordealingwiththissituation.Inthisimage,asforthefollowingplots,therobotpathisinred,thelaserrangereadingsareinblueandthedetectionofthegoalisingreen.
B.Scenario2
Duringthesecondsetofexperiments,therobotwasequippedwiththesamesetofbehaviorsasinScenario1,withtheonlydifferencethatthewereplacedthegeneralwallAttractbehaviorwithwallAttract-leftandwallAttract-right.Theleftandrightwallattractbehaviorsrespondonlytothewallsontheircorrespondingside,asopposedtoallthecombinedapproachofregularwallAttract.Theexperimentswereperformedinasimplesimulateden-vironment,showninFigure1,bottomrightinset.Thefourdemonstrationsconsistedoftakingthefollowingpathsthroughtheenvironment:onethroughtheuppercorridor,onethroughthebottom(wide)corridor,inbothcaseskeepingontheleft,thenontheright.WiththeweightslearnedinthisenvironmentwetestedthebehavioroftherobotinasimulatedSEMbuildingmap.
Results.Asopposedtothefirstsetofexperiments,duetothesplitofthewall-Attractbehavior,therobotnowlearnedadifferenceonwhichsidetofavorwhennavigatingacorridor.However,amoresignificantdifferenceinbehav-iorisapparentwhentherobotapproachesaT-junction.Whenusingtheweightslearnedfromtheright-followingdemonstrations(forbothcorridordemonstrations),therobotturnstotherightwhenitapproachesanintersection.Thatis,whenitisgiventhechoiceofeither“rightorleft”or“rightorstraight,”itwillgoright,becauseofhowstronglyitisattractedtotherightwall.Whenitisgiventhechoice“straightorleft,”itwillgostraight.Withtheweightslearnedfromtheleft-followingdemonstrations,therobotexhibitsasimilarbehaviorfortheleftside.Theseresultsdemonstratetherobustnessofourapproach,inthatcontrollerslearnedinoneenvironmenttranslatetodifferentenvironmentsaswell.
C.Scenario3
Inthethirdsetofexperiments,thesetofbehaviorswasthesameasinScenario2,withtheonlydifferencethat
insteadofthewanderbehavior,weuseditssplitversions,wander-leftandwander-right.Thesetwobehaviorsseekleftandrespectivelyrightopenspaces,asopposedtotheregularwander,whichlooksforopenspaceinanydirection.Weperformedthreedemonstrations,intheSEMsimulatedenvironment,eachconsistingoftakingatourofthebuilding,asfollows:1)keeptotheright,2)keeptothecenter,and3)keeptotheleft.
Results.Asexpectedfromtheseexperiments,therobothaslearnedsimilarpreferencesasthoseinthesecondscenario.Figure3(left)showsthetrajectoryoftherobot,usingthecontrollerlearnedfromtheleftfollowdemonstration.Therobotstartsatthetopoftheleftcorridor,choosesaleftturnattheT-junction,thenstopsatthegoal.Duringtheentireruntherobotkeepsclosertothewallontheleft.Figure3(right)showsasimilarpreference,thistimeforfollowingwallsontherightandchoosingarightturnatT-junctions.Inthisexperiment,therobotstartsatthetopoftherightcorridorandmovesdownandleft.WhilefortheleftandrightpreferencestherobotmakesaclearturninaparticulardirectionwhenreachingtheT-junction,forthecenterexperimenttherobotshowsthatitdoesnothaveapreferreddirection,asshownbyhowfaritgoesintotheT-junctionbeforedecidingtoturn,duetoitswanderbehavior(Figure3(center)).Therobotnavigatesclosertotheleftduetothewanderingbehavior,whichattractstherobottotheopenspacesthroughthedoors.Weonlyshowgoalreachingcapabilityfortheleftexperiment:westoppedthecenterandrightexperimentsbeforetherobotmadethefullturnintheenvironmenttoreachthegoal,locatedinadifferentareaintheenvironment.Ifallowedtorunlonger,therobotwasabletoreachthegoalinallsituations.
Inadditiontolearningtheleftandrightpreference,ourresultsdemonstratethattheadditionalrefinementoftheunderlyingbehaviorsetinawander-leftandwander-rightbehavior,allowedtherobottocaptureadditionalaspectsofthedemonstration.Inparticular,whenevertherobotfounditselfinthemiddleofaT-junction,withopenspaceonbothsides,therobotwouldchoosetogointhedirectiongivenbythepreferenceexpressedduringthedemonstration:right
Fig.4.Therobotlearnspreferencesofwan-deringindirectionsspecifiedduringthedemonstration(leftandrightrespectively).
fortherightweightsandleftfortheleftweights.Thispreferencewasdemonstratedeveninthecaseinwhichtherobothadmoreopenspaceintheoppositedirection.Underequalweightingofleftandrightwandering,therobotwouldnormallyfollowthelargeropenspace.Figure4showsthispreferencethroughtherobot’strajectory.Intheleftimage,therobotisusingtheweightslearnedfromtheleft-followdemonstration.WhiletherobotstartsorientedslightlytowardtherightinthemiddleoftheT-junction,asshownbyitslaserprofile,thehigherweightofthewander-leftbehaviorpullstherobotintheleftdirection.Similarly,intherightimage,therobotusestheweightsfromtheright-followdemonstration.Eveniforientedslightlytotheleft,wherethereismoreopenspace,therobotchoosestogorightduetothehigherweightofwander-right.
Theapproachwepresenteddemonstratestheimpor-tanceofconsideringconcurrentlyrunningbehaviorsasunderlyingmechanismsforachievingatask.Ourmethodallowsforlearningofboththegoalsinvolvedinthetask(e.g.,reachingatarget)andalsooftheparticularwaysinwhichthesametaskcanbeperformed.Inaddition,ourresultsdemonstratetheimportanceofchoosingtheprimitivebehaviorset,animportantandstillopenissueforbehavior-basedresearch.Ourlearnedcontrollersarenotrestrictedtoaaparticularpathorexecutionsequenceandthusaregeneralenoughtoexhibitmeaningfulbehavioreveninenvironmentsdifferentfromtheoneinwhichthedemonstrationtookplace.
VI.SUMMARY
Wepresentedamethodforrobottasklearningfromdemonstrationthataddressestheproblemofmappingobservationstorobotbehaviorsfromanovelperspective.Ourclaimisthatmotorbehavioristypicallyexpressedintermsofconcurrentcontrolofmultipledifferentactivities.Tothisend,wedevelopedalearningbydemonstrationapproachthatallowsarobottomapthedemonstrator’sactionsontomultiplebehaviorprimitivesfromitsreper-toire.Thismethodhasbeenshowntocapturenotonlytheoverallgoalsofthetask,butalsothespecificsoftheuser’sdemonstration,thusenablingadditionalcapabilitiesthroughlearningbydemonstration.
References
[1]S.Schaal,“Isimitationlearningtheroutetohumanoidrobots,”
TrendsinCognitiveSciences,vol.3,no.6,pp.233–242,1999.[2]M.Goodrich,D.Olsen,J.Crandall,andT.Palmer,“Experimentsin
adjustableautonomy.”2001.
[3]C.G.AtkesonandS.Schaal,“Robotlearningfromdemonstration,”
inICML’97:ProceedingsoftheFourteenthInternationalConfer-enceonMachineLearning.SanFrancisco,CA,USA:MorganKaufmannPublishersInc.,1997,pp.12–20.
[4]S.Schaal,J.Peters,J.Nakanishi,andA.Ijspeert,“Learning
movementprimitives,”inInternationalSymposiumonRoboticsResearch,2004.
[5]R.C.Arkin,“Motorschemabasednavigationforamobilerobot:
Anapproachtoprogrammingbybehavior,”inIEEEConferenceonRoboticsandAutomation,1987,1987,pp.2–271.
[6]G.HayesandJ.Demiris,“Arobotcontrollerusinglearningby
imitation,”inProc.oftheIntl.Symp.onIntelligentRoboticSystems,Grenoble,France,1994,pp.198–204.
[7]P.Gaussier,S.Moga,J.Banquet,andM.Quoy,“Fromperception-actionloopstoimitationprocesses:Abottom-upapproachoflearningbyimitation,”AppliedArtificialIntelligenceJournal,vol.12(78),pp.701–729,1998.
[8]M.N.NicolescuandM.J.Matari´c,“Naturalmethodsforrobottask
learning:Instructivedemonstration,generalizationandpractice,”inProc.,SecondIntl.JointConf.onAutonomousAgentsandMulti-AgentSystems,Melbourne,Australia,July2003.
[9]R.Platt,A.H.Fagg,andR.R.Grupen,“Manipulationgaits:
Sequencesofgraspcontroltasks,”inIEEEConferenceonRoboticsandAutomation,NewOrleans,LA,USA,2004,pp.801–806.[10]S.Thrun,W.Burgard,andD.Fox,ProbabilisticRobotics.MIT
Press,2005.
[11]W.D.SmartandL.P.Kaelbling,“Effectivereinforcementlearning
formobilerobots,”inProceedingsofIEEEInternationalConfer-enceonRoboticsandAutomation(ICRA2002),vol.4,May2002,pp.3404–3410.
[12]C.Breazeal,A.Brooks,J.Gray,G.Hoffman,C.Kidd,H.Lee,
J.Lieberman,A.Lockerd,,andD.Mulanda,“Tutelageandcollab-orationforhumanoidrobots,”InternationalJournalofHumanoidRobotics,vol.1,no.2,2004.
[13]S.Schaal,“Learningfromdemonstration,”AdvancesinNeural
InformationProcessingSystems,vol.9,pp.1040–1046,1997.[14]O.C.JenkinsandM.J.Matari´c,“Performance-derivedbehavior
vocabularies:Data-drivenacqusitionofskillsfrommotion,”Inter-nationalJournalofHumanoidRobotics,vol.1,no.2,pp.237–288,Jun2004.
[15]D.WolpertandM.Kawato,“Multiplepairedforwardandinverse
modelsformotorcontrol,”NeuralNetworks,vol.11,pp.1317–1329,1998.
[16]O.C.Jenkins,“Data-drivenderivationofskillsforautonomous
humanoidagents,”Ph.D.dissertation,TheUniversityofSouthernCalifornia,2003.
[17]S.Arulampalam,S.Maskell,N.Gordon,andT.Clapp,“Atuto-rialonparticlefiltersforon-linenon-linear/non-gaussianbayesiantracking,”IEEETransactionsonSignalProcessing,vol.50,no.2,pp.174–188,Feb.2002.
[18]M.IsardandA.Blake,“Condensation–conditionaldensityprop-agationforvisualtracking,”InternationalJournalofComputerVision,vol.29,no.1,pp.5–28,1998.
[19]Z.Khan,T.R.Balch,andF.Dellaert,“Arao-blackwellizedparticle
filterforeigentracking,”inIEEEComputerVisionandPatternRecognition,vol.2,2004,pp.980–986.
[20]B.Gerkey,R.T.Vaughan,andA.Howard.,“Theplayer/stage
project:Toolsformulti-robotanddistributedsensorsystems,”inProc.,the11thInternationalConferenceonAdvancedRobotics,2003,pp.317–323.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- hzar.cn 版权所有 赣ICP备2024042791号-5
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务