Pattern recognition and machine learning (Christopher M. Bishop 著) Springe 101页

872.96 KB
2022-04-22 11:24:53 发布

Pattern recognition and machine learning (Christopher M. Bishop 著) Springe

关闭预览

101页
当前文档由用户上传发布，收益归属用户

下载提示
文本预览

1、本文档共5页，可阅读全部内容。
2、本文档内容版权归属内容提供方，所产生的收益全部归内容提供方所有。如果您对本文有版权争议，可选择认领，认领后既往收益都归您。
3、本文档由用户上传，本站不保证质量和数量令人满意，可能有诸多瑕疵，付费之前，请仔细先通过免费阅读内容等途径辨别内容交易风险。如存在严重挂羊头卖狗肉之情形，可联系本站下载客服投诉处理。
文档侵权举报电话：19940600175。

'课后答案网您最真诚的朋友www.hackshp.cn网团队竭诚为学生服务，免费提供各门课后答案，不用积分，甚至不用注册，旨在为广大学生提供自主学习的平台！课后答案网：www.hackshp.cn视频教程网：www.efanjy.comPPT课件网：www.ppthouse.com课后答案网www.hackshp.cn khdaw.com课后答案网www.hackshp.cnkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com PatternRecognitionandMachineLearningSolutionstotheExercises:Web-Editionkhdaw.comMarkusSvensenandChristopherM.Bishop´Copyrightc2002–2007Thisisthesolutionsmanual(web-edition)forthebookPatternRecognitionandMachineLearning(PRML;publishedbySpringerin2006).Itcontainssolutionstothe课后答案网wwwexercises.ThisreleasewascreatedOctober5,2007.FuturereleaseswithcorrectionstoerrorswillbepublishedonthePRMLweb-site(seebelow).Theauthorswouldliketoexpresstheirgratitudetothevariouspeoplewhohaveprovidedfeedbackonpre-releasesofthisdocument.Inparticular,the“BishopReadingGroup”,heldintheVisualGeometryGroupattheUniversityofOxfordprovidedvaluablecommentsandsuggestions.Theauthorswelcomeallcomments,questionsandsuggestionsaboutthesolutionsaswellasreportsonwww.hackshp.cn(potential)errorsintextorformulaeinthisdocument;pleasesendanysuchfeedbacktoprml-fb@microsoft.comFurtherinformationaboutPRMLisavailablefromhttp://research.microsoft.com/∼cmbishop/PRMLkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com khdaw.com课后答案网www.hackshp.cnkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com khdaw.comContentsContents5Chapter1:PatternRecognition.......................7Chapter2:DensityEstimation.......................19Chapter3:LinearModelsforRegression..................34Chapter4:LinearModelsforClassiﬁcation................41Chapter5:NeuralNetworks........................46Chapter6:KernelMethods.........................54Chapter7:SparseKernelMachines.....................59Chapter8:ProbabilisticGraphicalModels.................63课后答案网Chapter9:MixtureModels.........................68Chapter10:VariationalInferenceandEM.................72Chapter11:SamplingMethods.......................83Chapter12:LatentVariables........................85Chapter13:SequentialData........................92www.hackshp.cnChapter14:CombiningModels.......................965khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 6CONTENTSkhdaw.com课后答案网www.hackshp.cnkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions1.1–1.47Chapter1PatternRecognition1.1Substituting(1.1)into(1.2)andthendifferentiatingwithrespecttowiweobtain!XNXMwxj−txi=0.(1)jnnnn=1j=0khdaw.comRe-arrangingtermsthengivestherequiredresult.1.4Weareofteninterestedinﬁndingthemostprobablevalueforsomequantity.Inthecaseofprobabilitydistributionsoverdiscretevariablesthisposeslittleproblem.However,forcontinuousvariablesthereisasubtletyarisingfromthenatureofprob-abilitydensitiesandthewaytheytransformundernon-linearchangesofvariable.Considerﬁrstthewayafunctionf(x)behaveswhenwechangetoanewvariableywherethetwovariablesarerelatedbyx=g(y).Thisdeﬁnesanewfunctionofygivenbyfe(y)=f(g(y)).(2)Supposef(x)hasamode(i.e.amaximum)atbxsothatf0(bx)=0.Thecorrespond-ingmodeoffe(y)willoccurforavaluebyobtainedbydifferentiatingbothsidesof(2)withrespecttoyfe0(by)=f0(g(by))g0(by)=0.(3)Assumingg0(by)6=0atthemode,thenf0(g(by))=0.However,weknowthatf0课后答案网(bx)=0,andsoweseethatthelocationsofthemodeexpressedintermsofeachofthevariablesxandyarerelatedbybx=g(by),asonewouldexpect.Thus,ﬁndingamodewithrespecttothevariablexiscompletelyequivalenttoﬁrsttransformingtothevariabley,thenﬁndingamodewithrespecttoy,andthentransformingbacktox.Nowconsiderthebehaviourofaprobabilitydensitypx(x)underthechangeofvari-ableswww.hackshp.cnx=g(y),wherethedensitywithrespecttothenewvariableispy(y)andisgivenby((1.27)).Letuswriteg0(y)=s|g0(y)|wheres∈{−1,+1}.Then((1.27))canbewrittenp(y)=p(g(y))sg0(y).yxDifferentiatingbothsideswithrespecttoythengivesp0(y)=sp0(g(y)){g0(y)}2+sp(g(y))g00(y).(4)yxxDuetothepresenceofthesecondtermontherighthandsideof(4)therelationshipbx=g(by)nolongerholds.Thusthevalueofxobtainedbymaximizingpx(x)willnotbethevalueobtainedbytransformingtopy(y)thenmaximizingwithrespecttoyandthentransformingbacktox.Thiscausesmodesofdensitiestobedependentonthechoiceofvariables.Inthecaseoflineartransformation,thesecondtermonkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 8Solution1.7Figure1Exampleofthetransformationofthemodeofadensityunderanon-1linearchangeofvariables,illus-py(y)g−1(x)tratingthedifferentbehaviourcom-paredtoasimplefunction.Seetheytextfordetails.0.5px(x)khdaw.com005x10therighthandsideof(4)vanishes,andsothelocationofthemaximumtransformsaccordingtobx=g(by).Thiseffectcanbeillustratedwithasimpleexample,asshowninFigure1.WebeginbyconsideringaGaussiandistributionpx(x)overxwithmeanµ=6andstandarddeviationσ=1,shownbytheredcurveinFigure1.NextwedrawasampleofN=50,000pointsfromthisdistributionandplotahistogramoftheirvalues,whichasexpectedagreeswiththedistributionpx(x).Nowconsideranon-linearchangeofvariablesfromxtoygivenbyx=g(y)=ln(y)−ln(1−y)+5.(5)Theinverseofthisfunctionisgivenby课后答案网−11y=g(x)=(6)1+exp(−x+5)whichisalogisticsigmoidfunction,andisshowninFigure1bythebluecurve.Ifwesimplytransformwww.hackshp.cnpx(x)asafunctionofxweobtainthegreencurvepx(g(y))showninFigure1,andweseethatthemodeofthedensitypx(x)istransformedviathesigmoidfunctiontothemodeofthiscurve.However,thedensityoverytransformsinsteadaccordingto(1.27)andisshownbythemagentacurveontheleftsideofthediagram.Notethatthishasitsmodeshiftedrelativetothemodeofthegreencurve.Toconﬁrmthisresultwetakeoursampleof50,000valuesofx,evaluatethecorre-spondingvaluesofyusing(6),andthenplotahistogramoftheirvalues.WeseethatthishistogrammatchesthemagentacurveinFigure1andnotthegreencurve!1.7ThetransformationfromCartesiantopolarcoordinatesisdeﬁnedbyx=rcosθ(7)y=rsinθ(8)khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution1.89andhencewehavex2+y2=r2wherewehaveusedthewell-knowntrigonometricresult(2.177).AlsotheJacobianofthechangeofvariablesiseasilyseentobe∂x∂x∂(x,y)∂r∂θ=∂(r,θ)∂y∂y∂r∂θcosθ−rsinθ=sinθrcosθ=rkhdaw.comwhereagainwehaveused(2.177).Thusthedoubleintegralin(1.125)becomesZ2πZ∞22rI=exp−rdrdθ(9)2σ200Z∞u1=2πexp−du(10)2σ220hui∞2=πexp−−2σ(11)2σ202=2πσ(12)wherewehaveusedthechangeofvariablesr2=u.Thus21/2I=2πσ.Finally,usingthetransformationy=x−µ,theintegraloftheGaussiandistributionbecomes课后答案网Z∞Z∞221yNx|µ,σdx=exp−dy(2πσ2)1/22σ2−∞−∞I==1www.hackshp.cn(2πσ2)1/2asrequired.1.8Fromthedeﬁnition(1.46)oftheunivariateGaussiandistribution,wehaveZ∞1/2112E[x]=exp−(x−µ)xdx.(13)2πσ22σ2−∞Nowchangevariablesusingy=x−µtogiveZ∞1/2112E[x]=exp−y(y+µ)dy.(14)2πσ22σ2−∞Wenownotethatinthefactor(y+µ)theﬁrstterminycorrespondstoanoddintegrandandsothisintegralmustvanish(toshowthisexplicitly,writetheintegralkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 10Solutions1.9–1.10asthesumoftwointegrals,onefrom−∞to0andtheotherfrom0to∞andthenshowthatthesetwointegralscancel).Inthesecondterm,µisaconstantandpullsoutsidetheintegral,leavinganormalizedGaussiandistributionwhichintegratesto1,andsoweobtain(1.49).Toderive(1.50)weﬁrstsubstitutetheexpression(1.46)forthenormaldistributionintothenormalizationresult(1.48)andre-arrangetoobtainZ∞1221/2exp−(x−µ)dx=2πσ.(15)2σ2−∞Wenowdifferentiatebothsidesof(15)withrespecttoσ2andthenre-arrangetokhdaw.comobtain1/2Z∞11222exp−(x−µ)(x−µ)dx=σ(16)2πσ22σ2−∞whichdirectlyshowsthat22E[(x−µ)]=var[x]=σ.(17)Nowweexpandthesquareontheleft-handsidegiving222E[x]−2µE[x]+µ=σ.Makinguseof(1.49)thengives(1.50)asrequired.Finally,(1.51)followsdirectlyfrom(1.49)and(1.50)222222E[x]−E[x]=µ+σ−µ=σ.1.9Fortheunivariatecase,wesimplydifferentiate(1.46)withrespectto课后答案网xtoobtaindx−µ22Nx|µ,σ=−Nx|µ,σ.dxσ2Settingthistozeroweobtainx=µ.Similarly,forthemultivariatecasewedifferentiate(1.52)withrespecttoxtoobtain∂1T−1www.hackshp.cnN(x|µ,Σ)=−N(x|µ,Σ)∇x(x−µ)Σ(x−µ)∂x2−1=−N(x|µ,Σ)Σ(x−µ),−1wherewehaveused(C.19),(C.20)andthefactthatΣissymmetric.Settingthisderivativeequalto0,andleft-multiplyingbyΣ,leadstothesolutionx=µ.1.10Sincexandzareindependent,theirjointdistributionfactorizesp(x,z)=p(x)p(z),andsoZZE[x+z]=(x+z)p(x)p(z)dxdz(18)ZZ=xp(x)dx+zp(z)dz(19)=E[x]+E[z].(20)khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions1.12–1.1511Similarlyforthevariances,weﬁrstnotethat222(x+z−E[x+z])=(x−E[x])+(z−E[z])+2(x−E[x])(z−E[z])(21)wheretheﬁnaltermwillintegratetozerowithrespecttothefactorizeddistributionp(x)p(z).HenceZZ2var[x+z]=(x+z−E[x+z])p(x)p(z)dxdzZZ22khdaw.com=(x−E[x])p(x)dx+(z−E[z])p(z)dz=var(x)+var(z).(22)Fordiscretevariablestheintegralsarereplacedbysummations,andthesameresultsareagainobtained.1.12Ifm=nthenxx=x2andusing(1.50)weobtainE[x2]=µ2+σ2,whereasifnmnnn6=mthenthetwodatapointsxnandxmareindependentandhenceE[xnxm]=E[x]E[x]=µ2wherewehaveused(1.49).Combiningthesetworesultswenmobtain(1.130).NextwehaveXN1E[µML]=E[xn]=µ(23)Nn=1using(1.49).Finally,considerE[σ2].From(1.55)and(1.56),andmakinguseof(1.130),we课后答案网MLhave!2XNXN211E[σML]=Exn−xmNNn=1m=1"#1XN2XN1XNXN2www.hackshp.cn=Exn−xnxm+2xmxlNNNn=1m=1m=1l=122212212=µ+σ−2µ+σ+µ+σNNN−12=σ(24)Nasrequired.1.15Theredundancyinthecoefﬁcientsin(1.133)arisesfrominterchangesymmetriesbetweentheindicesik.Suchsymmetriescanthereforeberemovedbyenforcinganorderingontheindices,asin(1.134),sothatonlyonememberineachgroupofequivalentconﬁgurationsoccursinthesummation.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 12Solution1.15Toderive(1.135)wenotethatthenumberofindependentparametersn(D,M)whichappearatorderMcanbewrittenasXDXi1iXM−1n(D,M)=···1(25)i1=1i2=1iM=1whichhasMterms.Thiscanclearlyalsobewrittenas()XDXi1iXM−1n(D,M)=···1(26)khdaw.comi1=1i2=1iM=1wheretheterminbraceshasM−1termswhich,from(25),mustequaln(i1,M−1).ThuswecanwriteXDn(D,M)=n(i1,M−1)(27)i1=1whichisequivalentto(1.135).Toprove(1.136)weﬁrstsetD=1onbothsidesoftheequation,andmakeuseof0!=1,whichgivesthevalue1onbothsides,thusshowingtheequationisvalidforD=1.NowweassumethatitistrueforaspeciﬁcvalueofdimensionalityDandthenshowthatitmustbetruefordimensionalityD+1.Thusconsidertheleft-handsideof(1.136)evaluatedforD+1whichgivesDX+1(i+M−2)!(D+M−1)!(D+M−1)!=+课后答案网(i−1)!(M−1)!(D−1)!M!D!(M−1)!i=1(D+M−1)!D+(D+M−1)!M=D!M!(D+M)!=(28)D!M!whichequalsthewww.hackshp.cnrighthandsideof(1.136)fordimensionalityD+1.Thus,byinduction,(1.136)mustholdtrueforallvaluesofD.Finallyweuseinductiontoprove(1.137).ForM=2weﬁndobtainthestandardresultn(D,2)=1D(D+1),whichisalsoprovedinExercise1.14.Nowassume2that(1.137)iscorrectforaspeciﬁcorderM−1sothat(D+M−2)!n(D,M−1)=.(29)(D−1)!(M−1)!Substitutingthisintotherighthandsideof(1.135)weobtainXD(i+M−2)!n(D,M)=(30)(i−1)!(M−1)!i=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions1.17–1.2013which,makinguseof(1.136),gives(D+M−1)!n(D,M)=(31)(D−1)!M!andhenceshowsthat(1.137)istrueforpolynomialsoforderM.Thusbyinduction(1.137)mustbetrueforallvaluesofM.1.17UsingintegrationbypartswehaveZ∞Γ(x+1)=uxe−udukhdaw.com0Z∞∞=−e−uux+xux−1e−udu=0+xΓ(x).(32)00Forx=1wehaveZ∞∞Γ(1)=e−udu=−e−u=1.(33)00Ifxisanintegerwecanapplyproofbyinductiontorelatethegammafunctiontothefactorialfunction.SupposethatΓ(x+1)=x!holds.Thenfromtheresult(32)wehaveΓ(x+2)=(x+1)Γ(x+1)=(x+1)!.Finally,Γ(1)=1=0!,whichcompletestheproofbyinduction.1.18Ontheright-handsideof(1.142)wemakethechangeofvariablesu=r2togiveZ∞1−uD/2−11课后答案网SDeudu=SDΓ(D/2)(34)220wherewehaveusedthedeﬁnition(1.141)oftheGammafunction.Onthelefthandsideof(1.142)wecanuse(1.126)toobtainπD/2.Equatingtheseweobtainthedesiredresult(1.143).Thevolumeofasphereofradiuswww.hackshp.cn1inD-dimensionsisobtainedbyintegrationZ1D−1SDVD=SDrdr=.(35)D0ForD=2andD=3weobtainthefollowingresults243S2=2π,S3=4π,V2=πa,V3=πa.(36)31.20Sincep(x)isradiallysymmetricitwillberoughlyconstantovertheshellofradiusrandthickness.ThisshellhasvolumeSrD−1andsincekxk2=r2wehaveDZp(x)dx"p(r)SrD−1(37)Dshellkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 14Solutions1.22–1.24fromwhichweobtain(1.148).Wecanﬁndthestationarypointsofp(r)bydifferen-tiationhi2dD−2D−1rrp(r)∝(D−1)r+r−exp−=0.(38)drσ22σ2√Solvingforr,andusingD1,weobtainbr"Dσ.Nextwenotethat(br+)2p(br+)∝(br+)D−1exp−2σ2khdaw.com(br+)2=exp−+(D−1)ln(br+).(39)2σ2Wenowexpandp(r)aroundthepointbr.Sincethisisastationarypointofp(r)wemustkeeptermsuptosecondorder.Makinguseoftheexpansionln(1+x)=x−x2/2+O(x3),togetherwithD1,weobtain(1.149).Finally,from(1.147)weseethattheprobabilitydensityattheoriginisgivenby1p(x=0)=(2πσ2)1/2whilethedensityatkxk=brisgivenfrom(1.147)by1br21Dp(kxk=br)=exp−=exp−课后答案网(2πσ2)1/22σ2(2πσ2)1/22√wherewehaveusedbr"Dσ.Thustheratioofdensitiesisgivenbyexp(D/2).1.22SubstitutingLkj=1−δkjinto(1.81),andusingthefactthattheposteriorproba-bilitiessumtoone,weﬁndthat,foreachxweshouldchoosetheclassjforwhich1−p(Cj|x)isaminimum,whichisequivalenttochoosingthejforwhichthepos-teriorprobabilitywww.hackshp.cnp(Cj|x)isamaximum.Thislossmatrixassignsalossofoneiftheexampleismisclassiﬁed,andalossofzeroifitiscorrectlyclassiﬁed,andhenceminimizingtheexpectedlosswillminimizethemisclassiﬁcationrate.1.24AvectorxbelongstoclassCkwithprobabilityPp(Ck|x).IfwedecidetoassignxtoclassCjwewillincuranexpectedlossofkLkjp(Ck|x),whereasifweselecttherejectoptionwewillincuralossofλ.Thus,ifXj=argminLklp(Ck|x)(40)lkthenweminimizetheexpectedlossifwetakethefollowingactionPchooseclassj,ifminlkLklp(Ck|x)<λ;(41)reject,otherwise.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions1.25–1.2715PForalossmatrixLkj=1−IkjwehavekLklp(Ck|x)=1−p(Cl|x)andsowerejectunlessthesmallestvalueof1−p(Cl|x)islessthanλ,orequivalentlyifthelargestvalueofp(Cl|x)islessthan1−λ.Inthestandardrejectcriterionwerejectifthelargestposteriorprobabilityislessthanθ.Thusthesetwocriteriaforrejectionareequivalentprovidedθ=1−λ.1.25TheexpectedsquaredlossforavectorialtargetvariableisgivenbyZZ2E[L]=ky(x)−tkp(t,x)dxdt.khdaw.comOurgoalistochoosey(x)soastominimizeE[L].WecandothisformallyusingthecalculusofvariationstogiveZδE[L]=2(y(x)−t)p(t,x)dt=0.δy(x)Solvingfory(x),andusingthesumandproductrulesofprobability,weobtainZtp(t,x)dtZy(x)=Z=tp(t|x)dtp(t,x)dtwhichistheconditionalaverageoftconditionedonx.ForthecaseofascalartargetvariablewehaveZy(x)=tp(t|x)dtwhichisequivalentto(1.89).课后答案网1.27Sincewecanchoosey(x)independentlyforeachvalueofx,theminimumoftheexpectedLqlosscanbefoundbyminimizingtheintegrandgivenbyZwww.hackshp.cn|y(x)−t|qp(t|x)dt(42)foreachvalueofx.Settingthederivativeof(42)withrespecttoy(x)tozerogivesthestationarityconditionZq|y(x)−t|q−1sign(y(x)−t)p(t|x)dtZy(x)Z∞=q|y(x)−t|q−1p(t|x)dt−q|y(x)−t|q−1p(t|x)dt=0−∞y(x)whichcanalsobeobtaineddirectlybysettingthefunctionalderivativeof(1.91)withrespecttoy(x)equaltozero.Itfollowsthaty(x)mustsatisfyZy(x)Z∞|y(x)−t|q−1p(t|x)dt=|y(x)−t|q−1p(t|x)dt.(43)−∞y(x)khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 16Solutions1.29–1.31Forthecaseofq=1thisreducestoZy(x)Z∞p(t|x)dt=p(t|x)dt.(44)−∞y(x)whichsaysthaty(x)mustbetheconditionalmedianoft.Forq→0wenotethat,asafunctionoft,thequantity|y(x)−t|qiscloseto1everywhereexceptinasmallneighbourhoodaroundt=y(x)whereitfallstozero.Thevalueof(42)willthereforebecloseto1,sincethedensityp(t)isnormalized,butreducedslightlybythe`notch"closetot=y(x).Weobtainthebiggestreductioninkhdaw.com(42)bychoosingthelocationofthenotchtocoincidewiththelargestvalueofp(t),i.e.withthe(conditional)mode.1.29TheentropyofanM-statediscretevariablexcanbewrittenintheformXMXM1H(x)=−p(xi)lnp(xi)=p(xi)ln.(45)p(xi)i=1i=1Thefunctionln(x)isconcave_andsowecanapplyJensen"sinequalityintheform(1.115)butwiththeinequalityreversed,sothat!XM1H(x)6lnp(xi)=lnM.(46)p(xi)i=11.31WeﬁrstmakeuseoftherelationI(x;y)=H(y)−H(y|x)whichweobtainedin(1.121),andnotethatthemutualinformationsatisﬁesI(x;y)>0sinceitisaformofKullback-Leiblerdivergence.Finallywemakeuseoftherelation(1.112)toobtain课后答案网thedesiredresult(1.152).Toshowthatstatisticalindependenceisasufﬁcientconditionfortheequalitytobesatisﬁed,wesubstitutep(x,y)=p(x)p(y)intothedeﬁnitionoftheentropy,givingZZwww.hackshp.cnH(x,y)=p(x,y)lnp(x,y)dxdyZZ=p(x)p(y){lnp(x)+lnp(y)}dxdyZZ=p(x)lnp(x)dx+p(y)lnp(y)dy=H(x)+H(y).Toshowthatstatisticalindependenceisanecessarycondition,wecombinetheequal-ityconditionH(x,y)=H(x)+H(y)withtheresult(1.112)togiveH(y|x)=H(y).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution1.3417Wenownotethattheright-handsideisindependentofxandhencetheleft-handsidemustalsobeconstantwithrespecttox.Using(1.121)itthenfollowsthatthemutualinformationI[x,y]=0.Finally,using(1.120)weseethatthemutualinformationisaformofKLdivergence,andthisvanishesonlyifthetwodistributionsareequal,sothatp(x,y)=p(x)p(y)asrequired.1.34Obtainingtherequiredfunctionalderivativecanbedonesimplybyinspection.How-ever,ifamoreformalapproachisrequiredwecanproceedasfollowsusingthetechniquessetoutinAppendixD.ConsiderﬁrstthefunctionalZkhdaw.comI[p(x)]=p(x)f(x)dx.Underasmallvariationp(x)→p(x)+η(x)wehaveZZI[p(x)+η(x)]=p(x)f(x)dx+η(x)f(x)dxandhencefrom(D.3)wededucethatthefunctionalderivativeisgivenbyδI=f(x).δp(x)Similarly,ifwedeﬁneZJ[p(x)]=p(x)lnp(x)dxthenunderasmallvariationp(x)→p(x)+η(x)wehaveZJ课后答案网[p(x)+η(x)]=p(x)lnp(x)dxZZ12+η(x)lnp(x)dx+p(x)η(x)dx+O()p(x)andhenceδJ=p(x)+1.www.hackshp.cnδp(x)Usingthesetworesultsweobtainthefollowingresultforthefunctionalderivative2−lnp(x)−1+λ1+λ2x+λ3(x−µ).Re-arrangingthengives(1.108).ToeliminatetheLagrangemultiplierswesubstitute(1.108)intoeachofthethreeconstraints(1.105),(1.106)and(1.107)inturn.ThesolutionismosteasilyobtainedbycomparisonwiththestandardformoftheGaussian,andnotingthattheresults12λ1=1−ln2πσ(47)2λ2=0(48)1λ3=(49)2σ2khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 18Solutions1.35–1.38doindeedsatisfythethreeconstraints.Notethatthereisatypographicalerrorinthequestion,whichshouldread”Usecalculusofvariationstoshowthatthestationarypointofthefunctionalshownjustbefore(1.108)isgivenby(1.108)”.Forthemultivariateversionofthisderivation,seeExercise2.14.1.35NOTE:InPRML,thereisaminussign("−")missingonthel.h.s.of(1.103).Substitutingtherighthandsideof(1.109)intheargumentofthelogarithmontherighthandsideof(1.103),weobtainkhdaw.comZH[x]=−p(x)lnp(x)dxZ1(x−µ)22=−p(x)−ln(2πσ)−dx22σ2Z1212=ln(2πσ)+p(x)(x−µ)dx2σ212=ln(2πσ)+1,2whereinthelaststepweused(1.107).1.38From(1.114)weknowthattheresult(1.115)holdsforM=1.WenowsupposethatitholdsforsomegeneralvalueMandshowthatitmustthereforeholdforM+1.Considerthelefthandsideof(1.115)!!课后答案网MX+1XMfλixi=fλM+1xM+1+λixi(50)i=1i=1!XM=fλM+1xM+1+(1−λM+1)ηixi(51)www.hackshp.cni=1wherewehavedeﬁnedλiηi=.(52)1−λM+1Wenowapply(1.114)togive!!MX+1XMfλixi6λM+1f(xM+1)+(1−λM+1)fηixi.(53)i=1i=1WenownotethatthequantitiesλibydeﬁnitionsatisfyMX+1λi=1(54)i=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions1.41–2.119andhencewehaveXMλi=1−λM+1(55)i=1Thenusing(52)weseethatthequantitiesηisatisfythepropertyXMXM1ηi=λi=1.(56)1−λM+1i=1i=1khdaw.comThuswecanapplytheresult(1.115)atorderMandso(53)becomes!MX+1XMMX+1fλixi6λM+1f(xM+1)+(1−λM+1)ηif(xi)=λif(xi)(57)i=1i=1i=1wherewehavemadeuseof(52).1.41Fromtheproductrulewehavep(x,y)=p(y|x)p(x),andso(1.120)canbewrittenasZZZZI(x;y)=−p(x,y)lnp(y)dxdy+p(x,y)lnp(y|x)dxdyZZZ=−p(y)lnp(y)dy+p(x,y)lnp(y|x)dxdy课后答案网=H(y)−H(y|x).(58)Chapter2DensityEstimationwww.hackshp.cn2.1Fromthedeﬁnition(2.2)oftheBernoullidistributionwehaveXp(x|µ)=p(x=0|µ)+p(x=1|µ)x∈{0,1}=(1−µ)+µ=1Xxp(x|µ)=0.p(x=0|µ)+1.p(x=1|µ)=µx∈{0,1}X222(x−µ)p(x|µ)=µp(x=0|µ)+(1−µ)p(x=1|µ)x∈{0,1}22=µ(1−µ)+(1−µ)µ=µ(1−µ).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 20Solution2.3TheentropyisgivenbyXH[x]=−p(x|µ)lnp(x|µ)x∈{0,1}X=−µx(1−µ)1−x{xlnµ+(1−x)ln(1−µ)}x∈{0,1}=−(1−µ)ln(1−µ)−µlnµ.2.3Usingthedeﬁnition(2.10)wehavekhdaw.comNNN!N!+=+nn−1n!(N−n)!(n−1)!(N+1−n)!(N+1−n)N!+nN!(N+1)!==n!(N+1−n)!n!(N+1−n)!N+1=.(59)nToprovethebinomialtheorem(2.263)wenotethatthetheoremistriviallytrueforN=0.WenowassumethatitholdsforsomegeneralvalueNandproveitscorrectnessforN+1,whichcanbedoneasfollowsNXN(1+x)N+1=(1+x)xnnn=0XNNX+1NnNn课后答案网=x+xnn−1n=0n=1NNXNNN=x0++xn+xN+10nn−1Nn=1NN+1XN+1N+10nN+1www.hackshp.cn=x+x+x0nN+1n=1NX+1N+1n=x(60)nn=0whichcompletestheinductiveproof.Finally,usingthebinomialtheorem,thenor-malizationcondition(2.264)forthebinomialdistributiongivesXNXNnNnN−nNNµµ(1−µ)=(1−µ)nn1−µn=0n=0NNµ=(1−µ)1+=1(61)1−µkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions2.5–2.921Figure2Plotoftheregionofintegrationof(62)in(x,t)space.t=xtkhdaw.comxasrequired.2.5Makingthechangeofvariablet=y+xin(2.266)weobtainZ∞Z∞Γ(a)Γ(b)=xa−1exp(−t)(t−x)b−1dtdx.(62)0xWenowexchangetheorderofintegration,takingcareoverthelimitsofintegrationZ∞ZtΓ(a)Γ(b)=xa−1exp(−t)(t−x)b−1dxdt.(63)课后答案网00Thechangeinthelimitsofintegrationingoingfrom(62)to(63)canbeunderstoodbyreferencetoFigure2.Finallywechangevariablesinthexintegralusingx=tµtogiveZ∞Z1Γ(a)Γ(b)=exp(−t)ta−1tb−1tdtµa−1(1−µ)b−1dµwww.hackshp.cn00Z1=Γ(a+b)µa−1(1−µ)b−1dµ.(64)02.9WhenweintegrateoverµM−1thelowerlimitofintegrationis0,whiletheupperPM−2limitis1−j=1µjsincetheremainingprobabilitiesmustsumtoone(seeFig-ure2.4).ThuswehavePZ1−M−2µjj=1pM−1(µ1,...,µM−2)=pM(µ1,...,µM−1)dµM−10"MY−2#Z1−PM−2µjMX−1!αM−1j=1=Cµαk−1µαM−1−11−µdµ.MkM−1jM−1k=10j=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 22Solution2.11Inordertomakethelimitsofintegrationequalto0and1wechangeintegrationvariablefromµM−1totusing!MX−2µM−1=t1−µjj=1whichgivespM−1(µ1,...,µM−2)"MY−2#MX−2!αM−1+αM−1Z1=Cµαk−11−µtαM−1−1(1−t)αM−1dtkhdaw.comMkjk=1j=10"MY−2#MX−2!αM−1+αM−1αk−1Γ(αM−1)Γ(αM)=CMµk1−µj(65)Γ(αM−1+αM)k=1j=1wherewehaveused(2.265).Therighthandsideof(65)isseentobeanormalizedDirichletdistributionoverM−1variables,withcoefﬁcientsα1,...,αM−2,αM−1+αM,(notethatwehaveeffectivelycombinedtheﬁnaltwocategories)andwecanidentifyitsnormalizationcoefﬁcientusing(2.38).ThusΓ(α1+...+αM)Γ(αM−1+αM)CM=·Γ(α1)...Γ(αM−2)Γ(αM−1+αM)Γ(αM−1)Γ(αM)Γ(α1+...+αM)=(66)Γ(α1)...Γ(αM)asrequired.课后答案网2.11WeﬁrstofallwritetheDirichletdistribution(2.38)intheformYMDir(µ|α)=K(α)µαk−1kk=1wherewww.hackshp.cnΓ(α0)K(α)=.Γ(α1)···Γ(αM)NextwenotethefollowingrelationYMYM∂αk−1∂µk=exp((αk−1)lnµk)∂αj∂αjk=1k=1YM=lnµjexp{(αk−1)lnµk}k=1YM=lnµµαk−1jkk=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution2.1423fromwhichweobtainZ1Z1YME[lnµ]=K(α)···lnµµαk−1dµ...dµjjk1M00k=1Z1Z1YM∂αk−1=K(α)···µkdµ1...dµM∂αj00k=1∂1=K(α)∂µjK(α)khdaw.com∂=−lnK(α).∂µjFinally,usingtheexpressionforK(α),togetherwiththedeﬁnitionofthedigammafunctionψ(·),wehaveE[lnµj]=ψ(αj)−ψ(α0).2.14AsfortheunivariateGaussianconsideredinSection1.6,wecanmakeuseofLa-grangemultiplierstoenforcetheconstraintsonthemaximumentropysolution.NotethatweneedasingleLagrangemultiplierforthenormalizationconstraint(2.280),aD-dimensionalvectormofLagrangemultipliersfortheDconstraintsgivenby(2.281),andaD×DmatrixLofLagrangemultiplierstoenforcetheD2constraintsrepresentedby(2.282).ThuswemaximizeZZH[ep]=−p(x)lnp(x)dx+λp(x)dx−1课后答案网ZT+mp(x)xdx−µZT+TrLp(x)(x−µ)(x−µ)dx−Σ.(67)Byfunctionaldifferentiation(AppendixD)themaximumofthisfunctionalwithwww.hackshp.cnrespecttop(x)occurswhenTT0=−1−lnp(x)+λ+mx+Tr{L(x−µ)(x−µ)}.Solvingforp(x)weobtainTTp(x)=expλ−1+mx+(x−µ)L(x−µ).(68)WenowﬁndthevaluesoftheLagrangemultipliersbyapplyingtheconstraints.Firstwecompletethesquareinsidetheexponential,whichbecomesT1−11−1T1T−1λ−1+x−µ+LmLx−µ+Lm+µm−mLm.224khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 24Solution2.16Wenowmakethechangeofvariable1−1y=x−µ+Lm.2Theconstraint(2.281)thenbecomesZTT1T−11−1expλ−1+yLy+µm−mLmy+µ−Lmdy=µ.42Intheﬁnalparentheses,theterminyvanishesbysymmetry,whiletheterminµsimplyintegratestoµbyvirtueofthenormalizationconstraint(2.280)whichnowkhdaw.comtakestheformZTT1T−1expλ−1+yLy+µm−mLmdy=1.4andhencewehave1−1−Lm=02whereagainwehavemadeuseoftheconstraint(2.280).Thusm=0andsothedensitybecomesTp(x)=expλ−1+(x−µ)L(x−µ).Substitutingthisintotheﬁnalconstraint(2.282),andmakingthechangeofvariablex−µ=zweobtainZTT课后答案网expλ−1+zLzzzdx=Σ.Applyingananalogousargumenttothatusedtoderive(2.64)weobtainL=−1Σ.2Finally,thevalueofλissimplythatvalueneededtoensurethattheGaussiandistri-butioniscorrectlynormalized,asderivedinSection2.3,andhenceisgivenby11www.hackshp.cnλ−1=ln.(2π)D/2|Σ|1/2−1−12.16Wehavep(x1)=N(x1|µ1,τ1)andp(x2)=N(x2|µ2,τ2).Sincex=x1+x2−1wealsohavep(x|x2)=N(x|µ1+x2,τ1).Wenowevaluatetheconvolutionintegralgivenby(2.284)whichtakestheformτ1/2τ1/2Z∞nττop(x)=12exp−1(x−µ−x)2−2(x−µ)2dx.122222π2π−∞22(69)SincetheﬁnalresultwillbeaGaussiandistributionforp(x)weneedonlyevaluateitsprecision,since,from(1.110),theentropyisdeterminedbythevarianceorequiv-alentlytheprecision,andisindependentofthemean.Thisallowsustosimplifythecalculationbyignoringsuchthingsasnormalizationconstants.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions2.17–2.2025Webeginbyconsideringthetermsintheexponentof(69)whichdependonx2whicharegivenby12−x2(τ1+τ2)+x2{τ1(x−µ1)+τ2µ2}2221τ1(x−µ1)+τ2µ2{τ1(x−µ1)+τ2µ2}=−(τ1+τ2)x2−+2τ1+τ22(τ1+τ2)wherewehavecompletedthesquareoverx2.Whenweintegrateoutx2,theﬁrsttermontherighthandsidewillsimplygiverisetoaconstantfactorindependentofx.Thesecondterm,whenexpandedout,willinvolveaterminx2.Sincethekhdaw.comprecisionofxisgivendirectlyintermsofthecoefﬁcientofx2intheexponent,itisonlysuchtermsthatweneedtoconsider.Thereisoneotherterminx2arisingfromtheoriginalexponentin(69).Combiningthesewehaveττ21ττ−1x2+1x2=−12x222(τ1+τ2)2τ1+τ2fromwhichweseethatxhasprecisionτ1τ2/(τ1+τ2).Wecanalsoobtainthisresultfortheprecisiondirectlybyappealingtothegeneralresult(2.115)fortheconvolutionoftwolinear-Gaussiandistributions.Theentropyofxisthengiven,from(1.110),by12π(τ1+τ2)H[x]=ln.2τ1τ22.17WecanuseananalogousargumenttothatusedinthesolutionofExercise1.14.课后答案网ConsiderageneralsquarematrixΛwithelementsΛij.ThenwecanalwayswriteASΛ=Λ+ΛwhereSΛij+ΛjiAΛij−ΛjiΛij=,Λij=(70)22anditiseasilyveriﬁedthatΛSissymmetricsothatΛS=ΛS,andΛAisantisym-www.hackshp.cnijjimetricsothatΛA=−ΛS.ThequadraticformintheexponentofaD-dimensionalijjimultivariateGaussiandistributioncanbewrittenXDXD1(xi−µi)Λij(xj−µj)(71)2i=1j=1−1ASwhereΛ=Σistheprecisionmatrix.WhenwesubstituteΛ=Λ+ΛintoA(71)weseethattheterminvolvingΛvanishessinceforeverypositivetermthereisanequalandoppositenegativeterm.ThuswecanalwaystakeΛtobesymmetric.2.20Sinceu,...,uconstituteabasisforRD,wecanwrite1Da=ˆa1u1+ˆa2u2+...+ˆaDuD,khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 26Solutions2.22–2.24whereaˆ1,...,aˆDarecoefﬁcientsobtainedbyprojectingaonu1,...,uD.Notethattheytypicallydonotequaltheelementsofa.UsingthiswecanwriteTTTaΣa=aˆ1u1+...+ˆaDuDΣ(ˆa1u1+...+ˆaDuD)andcombiningthisresultwith(2.45)wegetTTaˆ1u1+...+ˆaDuD(ˆa1λ1u1+...+ˆaDλDuD).Now,sinceuTu=1onlyifi=j,and0otherwise,thisbecomesij22khdaw.comaˆ1λ1+...+ˆaDλDandsinceaisreal,weseethatthisexpressionwillbestrictlypositiveforanynon-zeroa,ifalleigenvaluesarestrictlypositive.Itisalsoclearthatifaneigenvalue,λi,iszeroornegative,thereexistavectora(e.g.a=ui),forwhichthisexpressionwillbelessthanorequaltozero.Thus,thatamatrixhaseigenvectorswhichareallstrictlypositiveisasufﬁcientandnecessaryconditionforthematrixtobepositivedeﬁnite.2.22ConsideramatrixMwhichissymmetric,sothatMT=M.TheinversematrixM−1satisﬁesMM−1=I.Takingthetransposeofbothsidesofthisequation,andusingtherelation(C.1),weobtain−1TTTMM=I=Isincetheidentitymatrixissymmetric.Makinguseofthesymmetryconditionfor课后答案网Mwethenhave−1TMM=Iandhence,fromthedeﬁnitionofthematrixinverse,TM−1=M−1andsoMwww.hackshp.cn−1isalsoasymmetricmatrix.2.24Multiplyingthelefthandsideof(2.76)bythematrix(2.287)triviallygivestheiden-titymatrix.Ontherighthandsideconsiderthefourblocksoftheresultingparti-tionedmatrix:upperleftAM−BD−1CM=(A−BD−1C)(A−BD−1C)−1=Iupperright−AMBD−1+BD−1+BD−1CMBD−1=−(A−BD−1C)(A−BD−1C)−1BD−1+BD−1=−BD−1+BD−1=0khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions2.28–2.3227lowerleftCM−DD−1CM=CM−CM=0lowerright−CMBD−1+DD−1+DD−1CMBD−1=DD−1=I.Thustherighthandsidealsoequalstheidentitymatrix.2.28Forthemarginaldistributionp(x)weseefrom(2.92)thatthemeanisgivenbytheupperpartitionof(2.108)whichissimplyµ.Similarlyfrom(2.93)weseethatthe−1khdaw.comcovarianceisgivenbythetopleftpartitionof(2.105)andisthereforegivenbyΛ.Nowconsidertheconditionaldistributionp(y|x).Applyingtheresult(2.81)fortheconditionalmeanweobtain−1µy|x=Aµ+b+AΛΛ(x−µ)=Ax+b.Similarlyapplyingtheresult(2.82)forthecovarianceoftheconditionaldistributionwehavecov[y|x]=L−1+AΛ−1AT−AΛ−1ΛΛ−1AT=L−1asrequired.2.32Thequadraticformintheexponentialofthejointdistributionisgivenby1T1T−(x−µ)Λ(x−µ)−(y−Ax−b)L(y−Ax−b).(72)22Wenowextractallofthosetermsinvolving课后答案网xandassemblethemintoastandardGaussianquadraticformbycompletingthesquare1TTTT=−x(Λ+ALA)x+xΛµ+AL(y−b)+const21TT=−(x−m)(Λ+ALA)(x−m)www.hackshp.cn21TT+m(Λ+ALA)m+const(73)2whereT−1Tm=(Λ+ALA)Λµ+AL(y−b).Wecannowperformtheintegrationoverxwhicheliminatestheﬁrsttermin(73).Thenweextractthetermsinyfromtheﬁnaltermin(73)andcombinethesewiththeremainingtermsfromthequadraticform(72)whichdependonytogive1TT−1T=−yL−LA(Λ+ALA)ALy2TT−1T+yL−LA(Λ+ALA)ALbT−1+LA(Λ+ALA)Λµ.(74)khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 28Solution2.34Wecanidentifytheprecisionofthemarginaldistributionp(y)fromthesecondorderterminy.Toﬁndthecorrespondingcovariance,wetaketheinverseoftheprecisionandapplytheWoodburyinversionformula(2.289)togiveT−1T−1−1−1TL−LA(Λ+ALA)AL=L+AΛA(75)whichcorrespondsto(2.110).Nextweidentifythemeanνofthemarginaldistribution.Todothiswemakeuseof(75)in(74)andthencompletethesquaretogive1T−1−1T−1khdaw.com−(y−ν)L+AΛA(y−ν)+const2whereν=L−1+AΛ−1AT(L−1+AΛ−1AT)−1b+LA(Λ+ATLA)−1Λµ.Nowconsiderthetwotermsinthesquarebrackets,theﬁrstoneinvolvingbandthesecondinvolvingµ.Theﬁrstofthesecontributionsimplygivesb,whiletheterminµcanbewritten=L−1+AΛ−1ATLA(Λ+ATLA)−1Λµ=A(I+Λ−1ATLA)(I+Λ−1ATLA)−1Λ−1Λµ=Aµwherewehaveusedthegeneralresult(BC)−1=C−1B−1.Henceweobtain(2.109).课后答案网2.34Differentiating(2.118)withrespecttoΣweobtaintwoterms:XNN∂1∂T−1−ln|Σ|−(xn−µ)Σ(xn−µ).2∂Σ2∂Σn=1Fortheﬁrstterm,wecanapply(C.28)directlytogetwww.hackshp.cnN∂N−1TN−1−ln|Σ|=−Σ=−Σ.2∂Σ22Forthesecondterm,weﬁrstre-writethesumXN(x−µ)TΣ−1(x−µ)=NTrΣ−1S,nnn=1whereXN1TS=(xn−µ)(xn−µ).Nn=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution2.3629Usingthistogetherwith(C.21),inwhichx=Σij(element(i,j)inΣ),andproper-tiesofthetracewegetXN∂T−1∂−1(xn−µ)Σ(xn−µ)=NTrΣS∂Σij∂Σijn=1∂−1=NTrΣS∂Σij−1∂Σ−1=−NTrΣΣS∂Σijkhdaw.com∂Σ−1−1=−NTrΣSΣ∂Σij−1−1=−NΣSΣijwherewehaveused(C.26).NotethatinthelaststepwehaveignoredthefactthatΣij=Σji,sothat∂Σ/∂Σijhasa1inposition(i,j)onlyand0everywhereelse.Treatingthisresultasvalidnevertheless,wegetXN1∂T−1N−1−1−(xn−µ)Σ(xn−µ)=ΣSΣ.2∂Σ2n=1Combiningthederivativesofthetwotermsandsettingtheresulttozero,weobtain课后答案网N−1N−1−1Σ=ΣSΣ.22Re-arrangementthenyieldsΣ=Sasrequired.www.hackshp.cn2.36NOTE:InPRML,therearemistakesthataffectthissolution.Thesignin(2.129)isincorrect,andthisequationshouldread(N)(N−1)(N−1)θ=θ−aN−1z(θ).Then,inordertobeconsistentwiththeassumptionthatf(θ)>0forθ>θ?andf(θ)<0forθ<θ?inFigure2.10,weshouldﬁndtherootoftheexpectednegativeloglikelihood.Thisleadtosignchangesin(2.133)and(2.134),butin(2.135),thesearecancelledagainstthechangeofsignin(2.129),soineffect,(2.135)remainsunchanged.Also,xnshouldbexnonthel.h.s.of(2.133).Finally,thelabelsµandµMLinFigure2.11shouldbeinterchangedandtherearecorrespondingchangestothecaption(seeerrataonthePRMLwebsitefordetails).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 30Solution2.40Considertheexpressionforσ2andseparateoutthecontributionfromobservation(N)xNtogiveXN212σ(N)=(xn−µ)Nn=1NX−1212(xN−µ)=(xn−µ)+NNn=1N−1(x−µ)22N=σ(N−1)+khdaw.comNN1(x−µ)2=σ2−σ2+N(N−1)(N−1)NN1222=σ(N−1)+(xN−µ)−σ(N−1).(76)NIfwesubstitutetheexpressionforaGaussiandistributionintotheresult(2.135)fortheRobbins-Monroprocedureappliedtomaximizinglikelihood,weobtain()∂1(x−µ)2σ2=σ2+a−lnσ2−N(N)(N−1)N−1∂σ22(N−1)2σ2(N−1)(N−1)()1(x−µ)22N=σ(N−1)+aN−1−2σ2+2σ4(N−1)(N−1)2aN−122=σ(N−1)+4(xN−µ)−σ(N−1).(77)2σ课后答案网(N−1)Comparisonof(77)with(76)allowsustoidentify2σ4(N−1)aN−1=.N2.40Theposteriordistributionisproportionaltotheproductofthepriorandthewww.hackshp.cnlikelihoodfunctionYNp(µ|X)∝p(µ)p(xn|µ,Σ).n=1ThustheposteriorisproportionaltoanexponentialofaquadraticforminµgivenbyXN1T−11T−1−(µ−µ0)Σ0(µ−µ0)−(xn−µ)Σ(xn−µ)22n=1!XN1T−1−1T−1−1=−µΣ0+NΣµ+µΣ0µ0+Σxn+const2n=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions2.46–2.4731where`const."denotestermsindependentofµ.Usingthediscussionfollowing(2.71)weseethatthemeanandcovarianceoftheposteriordistributionaregivenby−1−1−1−1−1µN=Σ0+NΣΣ0µ0+ΣNµML(78)−1−1−1ΣN=Σ0+NΣ(79)whereµMListhemaximumlikelihoodsolutionforthemeangivenbyXN1µML=xn.Nkhdaw.comn=12.46From(2.158),wehaveZ∞bae(−bτ)τa−1τ1/2nτo2exp−(x−µ)dτΓ(a)2π20a1/2Z∞2b1a−1/2(x−µ)=τexp−τb+dτ.Γ(a)2π20Wenowmaketheproposedchangeofvariablez=τ∆,where∆=b+(x−µ)2/2,yieldinga1/2Z∞b1−a−1/2a−1/2∆zexp(−z)dzΓ(a)2π0课后答案网a1/2b1−a−1/2=∆Γ(a+1/2)Γ(a)2πwherewehaveusedthedeﬁnitionoftheGammafunction(1.141).Finally,wesub-stituteb+(x−µ)2/2for∆,ν/2foraandν/2λforb:www.hackshp.cn1/2Γ(−a+1/2)a1a−1/2b∆Γ(a)2πν/21/22−(ν+1)/2Γ((ν+1)/2)ν1ν(x−µ)=+Γ(ν/2)2λ2π2λ2ν/21/2−(ν+1)/22−(ν+1)/2Γ((ν+1)/2)ν1νλ(x−µ)=1+Γ(ν/2)2λ2π2λν1/22−(ν+1)/2Γ((ν+1)/2)λλ(x−µ)=1+Γ(ν/2)νπνkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 32Solution2.512.47Ignoringthenormalizationconstant,wewrite(2.159)as2−(ν−1)/2λ(x−µ)St(x|µ,λ,ν)∝1+νν−1λ(x−µ)2=exp−ln1+.(80)2νForlargeν,wemakeuseoftheTaylorexpansionforthelogarithmintheform2khdaw.comln(1+)=+O()(81)tore-write(80)asν−1λ(x−µ)2exp−ln1+2νν−1λ(x−µ)2=exp−+O(ν−2)2νλ(x−µ)2=exp−+O(ν−1).2Weseethatinthelimitν→∞thisbecomes,uptoanoverallconstant,thesameasaGaussiandistributionwithmeanµandprecisionλ.SincetheStudentdistributionisnormalizedtounityforallvaluesofνitfollowsthatitmustremainnormalizedinthislimit.Thenormalizationcoefﬁcientisgivenbythestandardexpression(2.42)foraunivariateGaussian.课后答案网2.51Usingtherelation(2.296)wehave221=exp(iA)exp(−iA)=(cosA+isinA)(cosA−isinA)=cosA+sinA.Similarly,wehavewww.hackshp.cncos(A−B)=0andminimize(3.29).Denotingtheresultingvalueofwbyw?(λ),andusingtheKKTcondition(E.11),weseethatthevalueof课后答案网ηisgivenbyXMη=|w?(λ)|q.jj=13.6Weﬁrstwritedowntheloglikelihoodfunctionwhichisgivenbywww.hackshp.cnXNN1TT−1TlnL(W,Σ)=−ln|Σ|−(tn−Wφ(xn))Σ(tn−Wφ(xn)).22n=1FirstofallwesetthederivativewithrespecttoWequaltozero,givingXN0=−Σ−1(t−WTφ(x))φ(x)T.nnnn=1MultiplyingthroughbyΣandintroducingthedesignmatrixΦandthetargetdatamatrixTwehaveTTΦΦW=ΦTSolvingforWthengives(3.15)asrequired.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions3.8–3.1037ThemaximumlikelihoodsolutionforΣiseasilyfoundbyappealingtothestandardresultfromChapter2givingXN1TTTΣ=(tn−WMLφ(xn))(tn−WMLφ(xn)).Nn=1asrequired.SinceweareﬁndingajointmaximumwithrespecttobothWandΣweseethatitisWMLwhichappearsinthisexpression,asinthestandardresultforanunconditionalGaussiandistribution.3.8Combiningthepriorkhdaw.comp(w)=N(w|mN,SN)andthelikelihood1/2ββT2p(tN+1|xN+1,w)=exp−(tN+1−wφN+1)(94)2π2whereφN+1=φ(xN+1),weobtainaposterioroftheformp(w|tN+1,xN+1,mN,SN)1T−11T2∝exp−(w−mN)SN(w−mN)−β(tN+1−wφN+1).22Wecanexpandtheargumentoftheexponential,omittingthe−1/2factors,asfol-lows(w−m)TS−1(w−m)+β(t−wTφ)2NNNN+1N+1课后答案网=wTS−1w−2wTS−1mNNNTTT+βwφN+1φN+1w−2βwφN+1tN+1+const=wT(S−1+βφφT)w−2wT(S−1m+βφt)+const,NN+1N+1NNN+1N+1whereconstdenotesremainingtermsindependentofw.Fromthiswecanreadoffthedesiredresultdirectly,www.hackshp.cnp(w|tN+1,xN+1,mN,SN)=N(w|mN+1,SN+1),with−1−1TSN+1=SN+βφN+1φN+1.(95)and−1mN+1=SN+1(SNmN+βφN+1tN+1).(96)3.10Using(3.3),(3.8)and(3.49),wecanre-write(3.57)asZT−1p(t|x,t,α,β)=N(t|φ(x)w,β)N(w|mN,SN)dw.Bymatchingtheﬁrstfactoroftheintegrandwith(2.114)andthesecondfactorwith(2.113),weobtainthedesiredresultdirectlyfrom(2.115).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 38Solutions3.15–3.203.15Thisiseasilyshownbysubstitutingthere-estimationformulae(3.92)and(3.95)into(3.82),givingβ2αTE(mN)=kt−ΦmNk+mNmN22N−γγN=+=.2223.18Wecanrewrite(3.79)β2αTkt−Φwk+wwkhdaw.com22βαTTTTT=tt−2tΦw+wΦΦw+ww221TTT=βtt−2βtΦw+wAw2where,inthelastline,wehaveused(3.81).Wenowusethetricksofadding0=mTAm−mTAmandusingI=A−1A,combinedwith(3.84),asfollows:NNNN1TTTβtt−2βtΦw+wAw21TT−1T=βtt−2βtΦAAw+wAw21TTTTT=βtt−2mNAw+wAw+mNAmN−mNAmN211TTT课后答案网=βtt−mNAmN+(w−mN)A(w−mN).22Herethelasttermequalstermthelasttermof(3.80)andsoitremainstoshowthattheﬁrsttermequalsther.h.s.of(3.82).Todothis,weusethesametricksagain:11TTTTTβtt−mNAmN=βtt−2mNAmN+mNAmN2www.hackshp.cn21=βtTt−2mTAA−1ΦTtβ+mTαI+βΦTΦmNNN21TTTTTT=βtt−2mNΦtβ+βmNΦΦmN+αmNmN21TT=β(t−ΦmN)(t−ΦmN)+αmNmN2β2αT=kt−ΦmNk+mNmN22asrequired.3.20Weonlyneedtoconsiderthetermsof(3.86)thatdependonα,whicharetheﬁrst,thirdandfourthterms.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution3.2339FollowingthesequenceofstepsinSection3.5.2,westartwiththelastoftheseterms,1−ln|A|.2From(3.81),(3.87)andthefactthatthateigenvectorsuiareorthonormal(seealsokhdaw.comAppendixC),weﬁndthattheeigenvectorsofAtobeα+λi.Wecanthenuse(C.47)andthepropertiesofthelogarithmtotakeusfromthelefttotherightsideof(3.88).Thederivativesfortheﬁrstandthirdtermof(3.86)aremoreeasilyobtainedusingstandardderivativesand(3.82),yielding1MT+mNmN.2α课后答案网Wecombinetheseresultsinto(3.89),fromwhichweget(3.92)via(3.90).Theexpressionforγin(3.91)isobtainedfrom(3.90)bysubstitutingXMwww.hackshp.cnλi+αλi+αiforMandre-arranging.3.23From(3.10),(3.112)andthepropertiesoftheGaussianandGammadistributions(seeAppendixB),wegetkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 40Solution3.23ZZp(t)=p(t|w,β)p(w|β)dwp(β)dβZZN/2ββT=exp−(t−Φw)(t−Φw)2π2M/2β−1/2βT−1|S0|exp−(w−m0)S0(w−m0)dw2π2Γ(a)−1ba0βa0−1exp(−bβ)dβkhdaw.com000ZZba0β0T=exp−(t−Φw)(t−Φw)((2π)M+N|S|)1/220βT−1exp−(w−m0)S0(w−m0)dw2βa0−1βN/2βM/2exp(−bβ)dβ0ZZba0β=0exp−(w−m)TS−1(w−m)dwM+N1/22NNN((2π)|S0|)βTT−1T−1exp−tt+m0S0m0−mNSNmN2βaN−1βM/2exp(−bβ)dβ课后答案网0wherewehavecompletedthesquareforthequadraticforminw,using−1TmN=SNS0m0+Φt−1−1TSN=βS0+ΦΦNwww.hackshp.cnaN=a0+2!XN1T−1T−12bN=b0+m0S0m0−mNSNmN+tn.2n=1Nowwearereadytodotheintegration,ﬁrstoverwandthenβ,andre-arrangethetermstoobtainthedesiredresultZba0p(t)=0(2π)M/2|S|1/2βaN−1exp(−bβ)dβNN((2π)M+N|S|)1/201|S|1/2ba0Γ(a)N0N=a.(2π)N/2|S|1/2bNΓ(a)0N0khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution4.241Chapter4LinearModelsforClassiﬁcation4.2Forthepurposeofthisexercise,wemakethecontributionofthebiasweightsexplicitin(4.15),giving1TTTED(Wf)=Tr(XW+1w0−T)(XW+1w0−T),(97)2wherew0isthecolumnvectorofbiasweights(thetoprowofWftransposed)and1khdaw.comisacolumnvectorofNones.Wecantakethederivativeof(97)w.r.t.w0,givingT2Nw0+2(XW−T)1.Settingthistozero,andsolvingforw0,weobtainTx¯(98)w0=¯t−Wwhere¯t=1T1TT1andx¯=X1.NNIfwesubsitute(98)into(97),weget1TED(W)=Tr(XW+T−XW−T)(XW+T−XW−T),课后答案网2whereTTT=1¯tandX=1x¯.Settingthederivativeofthisw.r.t.Wtozerowegetwww.hackshp.cnW=(XbTXb)−1XbTTb=Xb†Tb,wherewehavedeﬁnedXb=X−XandTb=T−T.Nowconsiderthepredictionforanewinputvectorx?,y(x?)=WTx?+w0T?Tx¯=Wx+¯t−WT=¯t−TbTXb†(x?−x¯).(99)Ifweapply(4.157)to¯t,wegetT¯t=1TTaaT1=−b.Nkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 42Solutions4.4–4.9Therefore,applying(4.157)to(99),weobtainTaTy(x?)=aT¯t+aTTbTXb†(x?−x¯)=aT¯t=−b,sinceaTTbT=aT(T−T)T=b(1−1)T=0T.4.4NOTE:InPRML,thetextoftheexerciserefersequation(4.23)whereitshouldreferto(4.22).khdaw.comFrom(4.22)wecanconstructtheLagrangianfunctionTTL=w(m2−m1)+λww−1.TakingthegradientofLweobtain∇L=m2−m1+2λw(100)andsettingthisgradienttozerogives1w=−(m2−m1)2λformwhichitfollowsthatw∝m2−m1.4.7From(4.59)wehave11+e−a−11−σ(a)=1−=课后答案网1+e−a1+e−ae−a1===σ(−a).1+e−aea+1Theinverseofthelogisticsigmoidiseasilyfoundasfollows1y=σ(a)=www.hackshp.cn1+e−a1−a⇒−1=ey1−y⇒ln=−ayy−1⇒ln=a=σ(y).1−y4.9ThelikelihoodfunctionisgivenbyYNYKp({φ,t}|{π})={p(φ|C)π}tnknnknkkn=1k=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions4.12–4.1343andtakingthelogarithm,weobtainXNXKlnp({φn,tn}|{πk})=tnk{lnp(φn|Ck)+lnπk}.(101)n=1k=1InordertomaximizetheloglikelihoodwithrespecttoPπkweneedtopreservetheconstraintkπk=1.ThiscanbedonebyintroducingaLagrangemultiplierλandmaximizing!XKlnp({φn,tn}|{πk})+λπk−1.khdaw.comk=1Settingthederivativewithrespecttoπkequaltozero,weobtainXNtnk+λ=0.πkn=1Re-arrangingthengivesXN−πkλ=tnk=Nk.(102)n=1Summingbothsidesoverkweﬁndthatλ=−N,andusingthistoeliminateλweobtain(4.159).4.12Differentiating(4.59)weobtaindσe−a课后答案网=da(1+e−a)2e−a=σ(a)1+e−a1+e−a1=σ(a)−www.hackshp.cn1+e−a1+e−a=σ(a)(1−σ(a)).4.13Westartbycomputingthederivativeof(4.90)w.r.t.yn∂E1−tntn=−(103)∂yn1−ynynyn(1−tn)−tn(1−yn)=yn(1−yn)yn−yntn−tn+yntn=(104)yn(1−yn)yn−tn=.(105)yn(1−yn)khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 44Solutions4.17–4.19From(4.88),weseethat∂yn∂σ(an)==σ(an)(1−σ(an))=yn(1−yn).(106)∂an∂anFinally,wehave∇an=φn(107)where∇denotesthegradientwithrespecttow.Combining(105),(106)and(107)usingthechainrule,weobtainkhdaw.comXN∂E∂yn∇E=∇an∂yn∂ann=1XN=(yn−tn)φnn=1asrequired.4.17From(4.104)wehave2∂yeakeakk=P−P=yk(1−yk),∂akeaieaiii∂yeakeajk=−P2=−ykyj,j6=k.∂ajeai课后答案网iCombiningtheseresultsweobtain(4.106).4.19Usingthecross-entropyerrorfunction(4.90),andfollowingExercise4.13,wehave∂Eyn−tn=.(108)www.hackshp.cn∂ynyn(1−yn)Also∇an=φn.(109)From(4.115)and(4.116)wehave∂yn∂Φ(an)1−a2==√en.(110)∂an∂an2πCombining(108),(109)and(110),wegetXNXN∂E∂ynyn−tn1−a2∇E=∇an=√enφn.(111)∂yn∂anyn(1−yn)2πn=1n=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution4.2345InordertoﬁndtheexpressionfortheHessian,itisisconvenienttoﬁrstdetermine∂yn−tnyn(1−yn)(yn−tn)(1−2yn)=−∂ynyn(1−yn)yn2(1−yn)2yn2(1−yn)2y2+t−2ytnnnn=.(112)yn2(1−yn)2Thenusing(109)–(112)wehaveNX∂yn−tn1−a2∇∇E=√enφn∇yn∂ynyn(1−yn)2πn=1khdaw.comyn−tn1−a2+√en(−2an)φn∇anyn(1−yn)2πXN2−2a2Tyn+tn−2yntn1−a2enφnφn=√en−2an(yn−tn)√.yn(1−yn)2π2πyn(1−yn)n=14.23NOTE:InPRML,thetextoftheexercisecontainsatypographicalerror.Followingtheequation,itshouldsaythatHisthematrixofsecondderivativesofthenegativeloglikelihood.TheBICapproximationcanbeviewedasalargeNapproximationtothelogmodelevidence.From(4.138),wehaveA=−∇∇lnp(D|θMAP)p(θMAP)=H−∇∇lnp(θMAP)andif课后答案网p(θ)=N(θ|m,V0),thisbecomes−1A=H+V0.Ifweassumethatthepriorisbroad,orequivalentlythatthenumberofdatapoints−1islarge,wecanneglectthetermV0comparedtoH.Usingthisresult,(4.137)canberewrittenintheformwww.hackshp.cn1−11lnp(D)"lnp(D|θMAP)−(θMAP−m)V0(θMAP−m)−ln|H|+const22(113)asrequired.Notethatthephrasingofthequestionismisleading,sincetheassump-tionofabroadprior,oroflargeN,isrequiredinordertoderivethisform,aswellasinthesubsequentsimpliﬁcation.Wenowagaininvokethebroadpriorassumption,allowingustoneglectthesecondtermontherighthandsideof(113)relativetotheﬁrstterm.Sinceweassumei.i.d.data,H=−∇∇lnp(D|θMAP)consistsofasumofterms,onetermforeachdatum,andwecanconsiderthefollowingapproximation:XNH=Hn=NHbn=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 46Solutions5.2–5.5whereHisthecontributionfromthenthdatapointandnXN1Hb=Hn.Nn=1Combiningthiswiththepropertiesofthedeterminant,wehaveln|H|=ln|NHb|=lnNM|Hb|=MlnN+ln|Hb|khdaw.comwhereMisthedimensionalityofθ.NotethatweareassumingthatHbhasfullrankM.Finally,usingthisresulttogether(113),weobtain(4.139)bydroppingtheln|Hb|sincethisO(1)comparedtolnN.Chapter5NeuralNetworks5.2Thelikelihoodfunctionforani.i.d.dataset,{(x1,t1),...,(xN,tN)},undertheconditionaldistribution(5.16)isgivenbyYNNt|y(x,w),β−1I.nnn=1Ifwetakethelogarithmofthis,using(2.43),weget课后答案网XNlnNt|y(x,w),β−1Innn=1XN1T=−(tn−y(xn,w))(βI)(tn−y(xn,w))+const2www.hackshp.cnn=1XNβ2=−ktn−y(xn,w)k+const,2n=1where`const"comprisestermswhichareindependentofw.Theﬁrsttermontherighthandsideisproportionaltothenegativeof(5.11)andhencemaximizingthelog-likelihoodisequivalenttominimizingthesum-of-squareserror.5.5Forthegiveninterpretationofyk(x,w),theconditionaldistributionofthetargetvectorforamulticlassneuralnetworkisYKp(t|w,...,w)=ytk.1Kkk=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions5.6–5.947Thus,foradatasetofNpoints,thelikelihoodfunctionwillbeYNYKp(T|w,...,w)=ytnk.1Knkn=1k=1Takingthenegativelogarithminordertoderiveanerrorfunctionweobtain(5.24)asrequired.Notethatthisisthesameresultasforthemulticlasslogisticregressionmodel,givenby(4.108).5.6Differentiating(5.21)withrespecttotheactivationancorrespondingtoaparticularkhdaw.comdatapointn,weobtain∂E1∂yn1∂yn=−tn+(1−tn).(114)∂anyn∂an1−yn∂anFrom(4.88),wehave∂yn=yn(1−yn).(115)∂anSubstituting(115)into(114),weget∂Eyn(1−yn)yn(1−yn)=−tn+(1−tn)∂anyn(1−yn)=yn−tnasrequired.5.9Thissimplycorrespondstoascalingandshiftingofthebinaryoutputs,whichdi-rectlygivestheactivationfunction,usingthenotationfrom(5.19),intheform课后答案网y=2σ(a)−1.Thecorrespondingerrorfunctioncanbeconstructedfrom(5.21)byapplyingtheinversetransformtoynandtn,yieldingNX1+tn1+yn1+tn1+ynE(w)=−ln+1−ln1−2222www.hackshp.cnnXN1=−{(1+tn)ln(1+yn)+(1−tn)ln(1−yn)}+Nln22nwherethelasttermcanbedropped,sinceitisindependentofw.Toﬁndthecorrespondingactivationfunctionwesimplyapplythelineartransforma-tiontothelogisticsigmoidgivenby(5.19),whichgives2y(a)=2σ(a)−1=−11+e−a1−e−aea/2−e−a/2==1+e−aea/2+e−a/2=tanh(a/2).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 48Solutions5.10–5.115.10From(5.33)and(5.35)wehaveTTuiHui=uiλiui=λi.AssumethatHispositivedeﬁnite,sothat(5.37)holds.Thenbysettingv=uiitfollowsthatTλi=uiHui>0(116)forallvaluesofi.Thus,ifHispositivedeﬁnite,allofitseigenvalueswillbepositive.khdaw.comConversely,assumethat(116)holds.Then,foranyvector,v,wecanmakeuseof(5.38)togive!T!XXTvHv=ciuiHcjujij!T!XX=ciuiλjcjujijX2=λici>0iwherewehaveused(5.33)and(5.34)alongwith(116).Thus,ifalloftheeigenvaluesarepositive,theHessianmatrixwillbepositivedeﬁnite.5.11Westartbymakingthechangeofvariablegivenby(5.35)whichallowsthe课后答案网errorfunctiontobewrittenintheform(5.36).SettingthevalueoftheerrorfunctionE(w)toaconstantvalueCweobtain1XE(w?)+λα2=C.ii2www.hackshp.cniRe-arranginggivesX2?λiαi=2C−2E(w)=CeiwhereCeisalsoaconstant.Thisistheequationforanellipsewhoseaxesarealignedwiththecoordinatesdescribedbythevariables{αi}.Thelengthofaxisjisfoundbysettingαi=0foralli6=j,andsolvingforαjgiving!1/2Ceαj=λjwhichisinverselyproportionaltothesquarerootofthecorrespondingeigenvalue.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions5.12–5.25495.12From(5.37)weseethat,ifHispositivedeﬁnite,thenthesecondtermin(5.32)willbepositivewhenever(w−w?)isnon-zero.ThusthesmallestvaluewhichE(w)cantakeisE(w?),andsow?istheminimumofE(w).Conversely,ifw?istheminimumofE(w),then,foranyvectorw6=w?,E(w)>E(w?).Thiswillonlybethecaseifthesecondtermof(5.32)ispositiveforallvaluesofw6=w?(sincetheﬁrsttermisindependentofw).Sincew−w?canbesettoanyvectorofrealnumbers,itfollowsfromthedeﬁnition(5.37)thatHmustbepositivedeﬁnite.5.19Ifwetakethegradientof(5.21)withrespecttow,weobtainkhdaw.comXNXN∂E∇E(w)=∇an=(yn−tn)∇an,∂ann=1n=1wherewehaveusedtheresultprovedearlierinthesolutiontoExercise5.6.TakingthesecondderivativeswehaveNX∂yn∇∇E(w)=∇an∇an+(yn−tn)∇∇an.∂ann=1Droppingthelasttermandusingtheresult(4.88)forthederivativeofthelogisticsigmoidfunction,provedinthesolutiontoExercise4.12,weﬁnallygetXNXNT∇∇E(w)"yn(1−yn)∇an∇an=yn(1−yn)bnbnn=1n=1where课后答案网bn≡∇an.5.25Thegradientof(5.195)isgiven∇E=H(w−w?)andhenceupdateformula(5.196)becomeswww.hackshp.cnw(τ)=w(τ−1)−ρH(w(τ−1)−w?).Pre-multiplyingbothsideswithuTwegetjw(τ)=uTw(τ)(117)jjT(τ−1)T(τ−1)?=ujw−ρujH(w−w)=w(τ−1)−ρηuT(w−w?)jjj(τ−1)(τ−1)?=wj−ρηj(wj−wj),(118)wherewehaveused(5.198).Toshowthatw(τ)={1−(1−ρη)τ}w?jjjkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 50Solution5.27forτ=1,2,...,wecanuseproofbyinduction.Forτ=1,werecallthatw(0)=0andinsertthisinto(118),giving(1)(0)(0)?wj=wj−ρηj(wj−wj)=ρηw?jj={1−(1−ρη)}w?.jjNowweassumethattheresultholdsforτ=N−1andthenmakeuseof(118)(N)(N−1)(N−1)?khdaw.comwj=wj−ρηj(wj−wj)(N−1)?=wj(1−ρηj)+ρηjwj=1−(1−ρη)N−1w?(1−ρη)+ρηw?jjjjj=(1−ρη)−(1−ρη)Nw?+ρηw?jjjjj=1−(1−ρη)Nw?jjasrequired.Providedthat|1−ρη|<1thenwehave(1−ρη)τ→0asτ→∞,andhencejj1−(1−ρη)N→1andw(τ)→w?.jIfτisﬁnitebutη(ρτ)−1,τmuststillbelarge,sinceηρτ1,eventhoughjj|1−ρη|<1.Ifτislarge,itfollowsfromtheargumentabovethatw(τ)"w?.jjjIf,ontheotherhand,η(ρτ)−1,thismeansthatρηmustbesmall,sinceρητjjj1andτ课后答案网isanintegergreaterthanorequaltoone.Ifweexpand,(1−ρη)τ=1−τρη+O(ρη2)jjjandinsertthisinto(5.197),weget|w(τ)|=|{1−(1−ρη)τ}w?|www.hackshp.cnjjj2?=|1−(1−τρηj+O(ρηj))wj|"τρη|w?||w?|jjjRecallthatinSection3.5.3weshowedthatwhentheregularizationparameter(calledαinthatsection)ismuchlargerthanoneoftheeigenvalues(calledλjinthatsection)thenthecorrespondingparametervaluewiwillbeclosetozero.Conversely,whenαismuchsmallerthanλithenwiwillbeclosetoitsmaximumlikelihoodvalue.Thusαisplayingananalogousroletoρτ.5.27Ifs(x,ξ)=x+ξ,then∂sk∂s=Iki,i.e.,=I,∂ξi∂ξkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution5.2851andsincetheﬁrstorderderivativeisconstant,therearenohigherorderderivatives.Wenowmakeuseofthisresulttoobtainthederivativesofyw.r.t.ξi:∂yX∂y∂sk∂y===bi∂ξi∂sk∂ξi∂sik∂y∂biX∂bi∂sk∂bi====Bij∂ξi∂ξj∂ξj∂sk∂ξj∂sjkUsingtheseresults,wecanwritetheexpansionofEeasfollows:ZZZkhdaw.comEe=1{y(x)−t}2p(t|x)p(x)p(ξ)dξdxdt2ZZZT+{y(x)−t}bξp(ξ)p(t|x)p(x)dξdxdtZZZ1TT+ξ{y(x)−t}B+bbξp(ξ)p(t|x)p(x)dξdxdt.2Themiddletermwillagaindisappear,sinceE[ξ]=0andthuswecanwriteEeontheformof(5.131)withZZZ1TTΩ=ξ{y(x)−t}B+bbξp(ξ)p(t|x)p(x)dξdxdt.2AgaintheﬁrsttermwithintheparenthesisvanishestoleadingorderinξandweareleftwithZZ1课后答案网Ω"ξTbbTξp(ξ)p(x)dξdx2ZZ1TT=Traceξξbbp(ξ)p(x)dξdx2Z1T=TraceIbbp(x)dx2www.hackshp.cnZZ1T12=bbp(x)dx=k∇y(x)kp(x)dx,22TwhereweusedthefactthatE[ξξ]=I.5.28Themodiﬁcationsonlyaffectderivativeswithrespecttoweightsintheconvolutionallayer.Theunitswithinafeaturemap(indexedm)havedifferentinputs,butallshareacommonweightvector,w(m).Thus,errorsδ(m)fromallunitswithinafeaturemapwillcontributetothederivativesofthecorrespondingweightvector.Inthissituation,(5.50)becomesX(m)X∂En∂En∂aj(m)(m)==δz.(m)(m)(m)jji∂w∂a∂wijjijkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 52Solutions5.29–5.34Herea(m)denotestheactivationofthejthunitinthemthfeaturemap,whereasjw(m)denotestheithelementofthecorrespondingfeaturevectorand,ﬁnally,z(m)ijidenotestheithinputforthejthunitinthemthfeaturemap;thelattermaybeanactualinputortheoutputofaprecedinglayer.(m)(m)Notethatδj=∂En/∂ajwilltypicallybecomputedrecursivelyfromtheδsoftheunitsinthefollowinglayer,using(5.55).Iftherearelayer(s)precedingtheconvolutionallayer,thestandardbackwardpropagationequationswillapply;theweightsintheconvolutionallayercanbetreatedasiftheywereindependentparam-khdaw.cometers,forthepurposeofcomputingtheδsfortheprecedinglayer"sunits.5.29Thisiseasilyveriﬁedbytakingthederivativeof(5.138),using(1.46)andstandardderivatives,yielding∂Ω1X2(wi−µj)=P2πjN(wi|µj,σj)2.∂wikπkN(wi|µk,σk)jσCombiningthiswith(5.139)and(5.140),weimmediatelyobtainthesecondtermof(5.141).5.34NOTE:InPRML,thel.h.s.of(5.154)shouldbereplacedwithγnk=γk(tn|xn).Accordingly,in(5.155)and(5.156),γkshouldbereplacedbyγnkandin(5.156),tlshouldbetnl.Westartbyusingthechainruletowrite课后答案网XK∂En∂En∂πj=.(119)∂aπ∂πj∂aπkj=1kNotethatbecauseofthecouplingbetweenoutputscausedbythesoftmaxactivationfunction,thedependenceontheactivationofasingleoutputunitinvolvesallthewww.hackshp.cnoutputunits.Fortheﬁrstfactorinsidethesumonther.h.s.of(119),standardderivativesappliedtothenthtermof(5.153)gives∂EnNnjγnj=−P=−.(120)∂πjKπNπjl=1lnlFortheforthesecondfactor,wehavefrom(4.106)that∂πjπ=πj(Ijk−πk).(121)∂akkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions5.39–5.4053Combining(119),(120)and(121),wegetXK∂Enγnjπ=−πj(Ijk−πk)∂akπjj=1XKXK=−γnj(Ijk−πk)=−γnk+γnjπk=πk−γnk,j=1j=1PKkhdaw.comwherewehaveusedthefactthat,by(5.154),j=1γnj=1foralln.5.39Using(4.135),wecanapproximate(5.174)asp(D|α,β)"p(D|wMAP,β)p(wMAP|α)Z1Texp−(w−wMAP)A(w−wMAP)dw,2whereAisgivenby(5.166),sincep(D|w,β)p(w|α)isproportionaltop(w|D,α,β).Using(4.135),(5.162)and(5.163),wecanrewritethisasYN(2π)W/2p(D|α,β)"N(t|y(x,w),β−1)N(w|0,α−1I).nnMAPMAP|A|1/2nTakingthelog课后答案网arithmofbothsidesandthenusing(2.42)and(2.43),weobtainthedesiredresult.5.40ForaK-classneuralnetwork,thelikelihoodfunctionisgivenbyYNYKy(x,w)tnkwww.hackshp.cnknnkandthecorrespondingerrorfunctionisgivenby(5.24).AgainwewoulduseaLaplaceapproximationfortheposteriordistributionovertheweights,butthecorrespondingHessianmatrix,H,in(5.166),wouldnowbederivedfrom(5.24).Similarly,(5.24),wouldreplacethebinarycrossentropyerrortermintheregularizederrorfunction(5.184).Thepredictivedistributionforanewpatternwouldagainhavetobeapproximated,sincetheresultingmarginalizationcannotbedoneanalytically.However,incon-trasttothetwo-classproblem,thereisnoobviouscandidateforthisapproximation,althoughGibbs(1997)discussesvariousalternatives.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 54Solutions6.1–6.5Chapter6KernelMethods6.1WeﬁrstofallnotethatJ(a)dependsonaonlythroughtheformKa.SincetypicallythenumberNofdatapointsisgreaterthanthenumberMofbasisfunctions,theTmatrixK=ΦΦwillberankdeﬁcient.TherewillthenbeMeigenvectorsofKhavingnon-zeroeigenvalues,andN−Meigenvectorswitheigenvaluezero.Wecanthendecomposea=a+awhereaTa=0andKa=0.Thusthevalueofk⊥k⊥⊥a⊥isnotdeterminedbyJ(a).Wecanremovetheambiguitybysettinga⊥=0,orkhdaw.comequivalentlybyaddingaregularizertermTa⊥a⊥2toJ(a)whereisasmallpositiveconstant.Thena=akwhereakliesinthespanTofK=ΦΦandhencecanbewrittenasalinearcombinationofthecolumnsofΦ,sothatincomponentnotationXMan=uiφi(xn)i=1orequivalentlyinvectornotationa=Φu.(122)Substituting(122)into(6.7)weobtain课后答案网1TλTTJ(u)=(KΦu−t)(KΦu−t)+uΦKΦu221TTTλTTT=ΦΦΦu−tΦΦΦu−t+uΦΦΦΦu(123)22TSincethematrixwww.hackshp.cnΦΦhasfullrankwecandeﬁneanequivalentparametrizationgivenbyTw=ΦΦuandsubstitutingthisinto(123)werecovertheoriginalregularizederrorfunction(6.2).6.5Theresults(6.13)and(6.14)areeasilyprovedbyusing(6.1)whichdeﬁnesthekernelintermsofthescalarproductbetweenthefeaturevectorsfortwoinputvectors.Ifk(x,x0)isavalidkernelthentheremustexistafeaturevectorφ(x)suchthat1k(x,x0)=φ(x)Tφ(x0).1Itfollowsthatck(x,x0)=u(x)Tu(x0)1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution6.755where1/2u(x)=cφ(x)andsock(x,x0)canbeexpressedasthescalarproductoffeaturevectors,andhence1isavalidkernel.Similarly,for(6.14)wecanwritef(x)k(x,x0)f(x0)=v(x)Tv(x0)khdaw.com1wherewehavedeﬁnedv(x)=f(x)φ(x).Again,weseethatf(x)k(x,x0)f(x0)canbeexpressedasthescalarproductof1featurevectors,andhenceisavalidkernel.Alternatively,theseresultscanbeprovedbeappealingtothegeneralresultthattheGrammatrix,K,whoseelementsaregivenbyk(xn,xm),shouldbepositivesemideﬁniteforallpossiblechoicesoftheset{xn},byfollowingasimilarargu-menttoSolution6.7below.6.7(6.17)ismosteasilyprovedbymakinguseoftheresult,discussedonpage295,thatanecessaryandsufﬁcientconditionforafunctionk(x,x0)tobeavalidkernelisthattheGrammatrix课后答案网K,whoseelementsaregivenbyk(xn,xm),shouldbepositivesemideﬁniteforallpossiblechoicesoftheset{xn}.AmatrixKispositivesemi-deﬁniteif,andonlyif,TaKa>0foranychoiceofthevectora.LetKbetheGrammatrixfork(x,x0)andletK112betheGrammatrixfork(x,x0).Thenwww.hackshp.cn2TTTa(K1+K2)a=aK1a+aK2a>0wherewehaveusedthefactthatK1andK2arepositivesemi-deﬁnitematrices,togetherwiththefactthatthesumoftwonon-negativenumberswillitselfbenon-negative.Thus,(6.17)deﬁnesavalidkernel.Toprove(6.18),wetaketheapproachadoptedinSolution6.5.Sinceweknowthatk(x,x0)andk(x,x0)arevalidkernels,weknowthatthereexistmappingsφ(x)12andψ(x)suchthatk(x,x0)=φ(x)Tφ(x0)andk(x,x0)=ψ(x)Tψ(x0).12khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 56Solutions6.12–6.14Hencek(x,x0)=k(x,x0)k(x,x0)12T0T0=φ(x)φ(x)ψ(x)ψ(x)XMXN=φ(x)φ(x0)ψ(x)ψ(x0)mmnnm=1n=1XMXN=φ(x)φ(x0)ψ(x)ψ(x0)mmnnm=1n=1khdaw.comXK=ϕ(x)ϕ(x0)kkk=1T0=ϕ(x)ϕ(x),whereK=MNandϕk(x)=φ((k−1)N)+1(x)ψ((k−1)N)+1(x),whereinturnanddenoteintegerdivisionandremainder,respectively.6.12NOTE:InPRML,thereisanerrorinthetextrelatingtothisexercise.Immediatelyfollowing(6.27),itsays:|A|denotesthenumberofsubsetsinA;itshouldhavesaid:|A|denotesthenumberofelementsinA.SinceAmaybeequaltoD(thesubsetrelationwasnotdeﬁnedtobestrict),φ(D)mustbedeﬁned.Thiswillmaptoavectorof2|D|1s,oneforeachpossiblesubsetofD,including课后答案网Ditselfaswellastheemptyset.ForA⊂D,φ(A)willhave1sinallpositionsthatcorrespondtosubsetsofAand0sinallotherpositions.Therefore,φ(A)Tφ(A)willcountthenumberofsubsetssharedbyAandA.However,this1212canjustaswellbeobtainedbycountingthenumberofelementsintheintersectionofA1andA2,andthenraising2tothisnumber,whichisexactlywhat(6.27)does.6.14InordertoevaluatetheFisherkernelfortheGaussianweﬁrstnotethatthecowww.hackshp.cnvari-anceisassumedtobeﬁxed,andhencetheparameterscompriseonlytheelementsofthemeanµ.TheﬁrststepistoevaluatetheFisherscoredeﬁnedby(6.32).Fromthedeﬁnition(2.43)oftheGaussianwehaveg(µ,x)=∇lnN(x|µ,S)=S−1(x−µ).µNextweevaluatetheFisherinformationmatrixusingthedeﬁnition(6.34),givingT−1T−1F=Exg(µ,x)g(µ,x)=SEx(x−µ)(x−µ)S.HeretheexpectationiswithrespecttotheoriginalGaussiandistribution,andsowecanusethestandardresultTEx(x−µ)(x−µ)=Skhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions6.17–6.2057fromwhichweobtainF=S−1.ThustheFisherkernelisgivenbyk(x,x0)=(x−µ)TS−1(x0−µ),whichwenoteisjustthesquaredMahalanobisdistance.6.17NOTE:InPRML,therearetypographicalerrorsinthetextrelatingtothisexercise.Inthesentencefollowingimmediatelyafter(6.39),f(x)shouldbereplacedbyy(x).Also,onthel.h.s.of(6.40),y(xn)shouldbereplacedbyy(x).Therewerealsokhdaw.comerrorsinAppendixD,whichmightcauseconfusion;pleaseconsulttheerrataonthePRMLwebsite.FollowingthediscussioninAppendixDwegiveaﬁrst-principlesderivationofthesolution.Firstconsideravariationinthefunctiony(x)oftheformy(x)→y(x)+η(x).Substitutinginto(6.39)weobtainXNZ12E[y+η]={y(xn+ξ)+η(xn+ξ)−tn}ν(ξ)dξ.2n=1Nowweexpandinpowersofandsetthecoefﬁcientof,whichcorrespondstothefunctionalﬁrstderivative,equaltozero,givingXNZ课后答案网{y(xn+ξ)−tn}η(xn+ξ)ν(ξ)dξ=0.(124)n=1Thismustholdforeverychoiceofthevariationfunctionη(x).Thuswecanchooseη(x)=δ(x−z)whereδ(·)istheDiracdeltafunction.Thisallowsustoevaluatetheintegraloverξgivingwww.hackshp.cnXNZXN{y(xn+ξ)−tn}δ(xn+ξ−z)ν(ξ)dξ={y(z)−tn}ν(z−xn).n=1n=1Substingthisbackinto(124)andrearrangingwethenobtaintherequiredresult(6.40).6.20Giventhejointdistribution(6.64),wecanidentifytN+1withxaandtwithxbin(2.65).NotethatthismeansthatweareprependingratherthanappendingtN+1totandCN+1thereforegetsredeﬁnedasckTCN+1=.kCNkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 58Solution6.21Itthenfollowsthatµa=0µb=0xb=tTTΣaa=cΣbb=CNΣab=Σba=kin(2.81)and(2.82),fromwhich(6.66)and(6.67)followsdirectly.6.21BoththeGaussianprocessandthelinearregressionmodelgiverisetoGaussianpredictivedistributionsp(tN+1|xN+1)sowesimplyneedtoshowthatthesehavethesamemeanandvariance.Todothiswemakeuseoftheexpression(6.54)forthekernelfunctiondeﬁnedintermsofthebasisfunctions.Using(6.62)thecovariancekhdaw.commatrixCNthentakestheform1T−1CN=ΦΦ+βIN(125)αwhereΦisthedesignmatrixwithelementsΦnk=φk(xn),andINdenotestheN×Nunitmatrix.ConsiderﬁrstthemeanoftheGaussianprocesspredictivedistribution,whichfrom(125),(6.54),(6.66)andthedeﬁnitionsinthetextpreceding(6.66)isgivenby−1TT−1T−1−1mN+1=αφ(xN+1)ΦαΦΦ+βINt.Wenowmakeuseofthematrixidentity(C.6)togiveT−1T−1−1T−1TTΦαΦΦ+βIN=αββΦΦ+αIMΦ=αβSNΦ.Thusthemeanbecomes课后答案网TTmN+1=βφ(xN+1)SNΦtwhichwerecognizeasthemeanofthepredictivedistributionforthelinearregressionmodelgivenby(3.58)withmNdeﬁnedby(3.53)andSNdeﬁnedby(3.54).Forthevariancewesimilarlysubstitutetheexpression(125)forthekernelfunc-tionintotheGaussianprocessvariancegivenby(6.67)andthenuse(6.54)andthewww.hackshp.cndeﬁnitionsinthetextpreceding(6.66)toobtain2−1T−1σN+1(xN+1)=αφ(xN+1)φ(xN+1)+β−2TT−1T−1−1−αφ(xN+1)ΦαΦΦ+βINΦφ(xN+1)=β−1+φ(x)Tα−1IN+1M−2T−1T−1−1−αΦαΦΦ+βINΦφ(xN+1).(126)Wenowmakeuseofthematrixidentity(C.7)togive−1−1T−1T−1−1−1αIM−αIMΦΦ(αIM)Φ+βINΦαIMT−1=αI+βΦΦ=SN,khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions6.23–7.159wherewehavealsoused(3.54).Substitutingthisin(126),weobtain21TσN(xN+1)=+φ(xN+1)SNφ(xN+1)βasderivedforthelinearregressionmodelinSection3.3.2.6.23NOTE:InPRML,atypographicalmistakeappearsinthetextoftheexerciseatlinethree,whereitshouldsay“...atrainingsetofinputvectorsx1,...,xN”.Ifweassumethatthetargetvariables,t1,...,tD,areindependentgiventheinputkhdaw.comvector,x,thisextensionisstraightforward.Usinganalogousnotationtotheunivariatecase,p(tN+1|T)=N(tN+1|m(xN+1),σ(xN+1)I),whereTisaN×DmatrixwiththevectorstT,...,tTasitsrows,1NTTm(xN+1)=kCNTandσ(xN+1)isgivenby(6.67).NotethatCN,whichonlydependontheinputvectors,isthesameintheuni-andmultivariatemodels.6.25SubstitutingthegradientandtheHessianintotheNewton-Raphsonformulaweob-tainanew=a+(C−1+W)−1t−σ−C−1aNNNNNNNN=(C−1+W)−1[t−σ+Wa]课后答案网NNNNNN=C(I+WC)−1[t−σ+Wa]NNNNNNNChapter7SparseKernelMachineswww.hackshp.cn7.1FromBayes"theoremwehavep(t|x)∝p(x|t)p(t)where,from(2.249),XN11p(x|t)=k(x,xn)δ(t,tn).NtZkn=1HereNtisthenumberofinputvectorswithlabelt(+1or−1)andN=N+1+N−1.δ(t,tn)equals1ift=tnand0otherwise.Zkisthenormalisationconstantforthekernel.Theminimummisclassiﬁcation-rateisachievedif,foreachnewinputkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 60Solution7.4vector,x˜,wechose˜ttomaximisep(˜t|x˜).Withequalclasspriors,thisisequivalenttomaximizingp(x˜|˜t)andthus1X1X+1iffk(x˜,xi)>k(x˜,xj)˜t=N+1N−1i:ti=+1j:tj=−1−1otherwise.Herewehavedroppedthefactor1/Zksinceitonlyactsasacommonscalingfactor.Usingtheencodingschemeforthelabel,thisclassiﬁcationrulecanbewritteninthemorecompactform!khdaw.comXNtn˜t=signk(x˜,xn).Ntnn=1Nowwetakek(x,x)=xTx,whichresultsinthekerneldensitynn1XTTx¯+p(x|t=+1)=xxn=x.N+1n:tn=+1Here,thesuminthemiddleexperssionrunsoverallvectorsxnforwhichtn=+1andx¯+denotesthemeanofthesevectors,withthecorrespondingdeﬁnitionforthenegativeclass.Notethatthisdensityisimproper,sinceitcannotbenormalized.However,wecanstillcomparelikelihoodsunderthisdensity,resultingintheclassi-ﬁcationrule+1ifx˜Tx¯+>x˜Tx¯−,˜t=课后答案网−1otherwise.Thesameargumentwouldofcoursealsoapplyinthefeaturespaceφ(x).7.4FromFigure4.1and(7.4),weseethatthevalueofthemargin112ρ=andso=kwk.www.hackshp.cnkwkρ2From(7.16)weseethat,forthemaximummarginsolution,thesecondtermof(7.7)vanishesandsowehave12L(w,b,a)=kwk.2Usingthistogetherwith(7.8),thedual(7.10)canbewrittenasXN1212kwk=an−kwk,22nfromwhichthedesiredresultfollows.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions7.8–7.10617.8Thisfollowsfrom(7.67)and(7.68),whichinturnfollowfromtheKKTconditions,(E.9)–(E.11),forµn,ξn,µbnandbξn,andtheresultsobtainedin(7.59)and(7.60).Forexample,forµnandξn,theKKTconditionsareξn>0µn>0µnξn=0(127)andfrom(7.59)wehavethatkhdaw.comµn=C−an.(128)Combining(127)and(128),weget(7.67);similarreasoningforµbnandbξnleadto(7.68).7.10Weﬁrstnotethatthisresultisgivenimmediatelyfrom(2.113)–(2.115),butthetasksetintheexercisewastopracticethetechniqueofcompletingthesquare.InthissolutionandthatofExercise7.12,webroadlyfollowthepresentationinSection3.5.1.Using(7.79)and(7.80),wecanwrite(7.84)inaformsimilarto(3.78)N/2YMZβ1p(t|X,α,β)=αiexp{−E(w)}dw(129)2π(2π)N/2i=1whereβ21TE(w)=kt−Φwk+wAw课后答案网22andA=diag(α).Completingthesquareoverw,weget1T−1E(w)=(w−m)Σ(w−m)+E(t)(130)www.hackshp.cn2wheremandΣaregivenby(7.82)and(7.83),respectively,and1E(t)=βtTt−mTΣ−1m.(131)2Using(130),wecanevaluatetheintegralin(129)toobtainZexp{−E(w)}dw=exp{−E(t)}(2π)M/2|Σ|1/2.(132)Consideringthisasafunctionoftweseefrom(7.83),thatweonlyneedtodealwiththefactorexp{−E(t)}.Using(7.82),(7.83),(C.7)and(7.86),wecanre-writekhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 62Solution7.12(131)asfollows1E(t)=βtTt−mTΣ−1m21TT−1T=βtt−βtΦΣΣΣΦtβ21TT=tβI−βΦΣΦβt21TT−1T=tβI−βΦ(A+βΦΦ)Φβt21T−1−1T−1khdaw.com=tβI+ΦAΦt21T−1=tCt.2Thisgivesusthelasttermonther.h.s.of(7.85);thetwoprecedingtermsaregivenimplicitly,astheyformthenormalizationconstantfortheposteriorGaussiandistri-butionp(t|X,α,β).7.12Usingtheresults(129)–(132)fromSolution7.10,wecanwrite(7.85)intheformof(3.86):XNN11Nlnp(t|X,α,β)=lnβ+lnαi−E(t)−ln|Σ|−ln(2π).(133)2222iBymakinguseof(131)and(7.83)togetherwith(C.22),wecantakethederivativesofthisw.r.tαi,yielding课后答案网∂1112lnp(t|X,α,β)=−Σii−mi.(134)∂αi2αi22Settingthistozeroandre-arranging,weobtain1−αiΣiiγiαi==,m2m2www.hackshp.cniiwherewehaveused(7.89).Similarly,forβweseethat∂1N2Tlnp(t|X,α,β)=−kt−Φmk−TrΣΦΦ.(135)∂β2βUsing(7.83),wecanrewritetheargumentofthetraceoperatorasΣΦTΦ=ΣΦTΦ+β−1ΣA−β−1ΣAT−1−1=Σ(ΦΦβ+A)β−βΣA=(A+βΦTΦ)−1(ΦTΦβ+A)β−1−β−1ΣA=(I−AΣ)β−1.(136)Heretheﬁrstfactoronther.h.s.ofthelastlineequals(7.89)writteninmatrixform.Wecanusethistoset(135)equaltozeroandthenre-arrangetoobtain(7.88).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions7.15–8.1637.15Using(7.94),(7.95)and(7.97)–(7.99),wecanrewrite(7.85)asfollows1−1T−1lnp(t|X,α,β)=−Nln(2π)+ln|C−i||1+αiϕiC−iϕi|2−1T−1T−1C−iϕiϕiC−i+tC−t−iα+ϕTC−1ϕii−ii1T−1=−Nln(2π)+ln|C−i|+tC−it2−1T−11−1T−1TC−iϕiϕiC−i+2−ln|1+αiϕiC−iϕi|+tT−1tkhdaw.comαi+ϕiC−iϕi1q2=L(α)+lnα−ln(α+s)+i−iiii2αi+si=L(α−i)+λ(αi)7.18AstheRVMcanberegardedasaregularizedlogisticregressionmodel,wecanfollowthesequenceofstepsusedtoderive(4.91)inExercise4.13toderivetheﬁrsttermofther.h.s.of(7.110),whereasthesecondtermfollowsfromstandardmatrixderivatives(seeAppendixC).Notehowever,thatinExercise4.13wearedealingwiththenegativelog-likelhood.Toderive(7.111),wemakeuseof(106)and(107)fromExercise4.13.Ifwewritetheﬁrsttermofther.h.s.of(7.110)incomponentformwegetXNXN∂∂yn∂an(tn−yn)φni=−φni课后答案网∂wj∂an∂wjn=1n=1XN=−yn(1−yn)φnjφni,n=1which,writteninmatrixform,equalstheﬁrstterminsidetheparenthesisonther.h.s.of(7.111).Thesecondtermagainfollowsfromstandardmatrixderivatives.www.hackshp.cnChapter8ProbabilisticGraphicalModels8.1Wewanttoshowthat,for(8.5),XXXXYK...p(x)=...p(xk|pak)=1.x1xKx1xKk=1Weassumethatthenodesinthegraphhasbeennumberedsuchthatx1istherootnodeandnoarrowsleadfromahighernumberednodetoalowernumberednode.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 64Solutions8.2–8.9Wecanthenmarginalizeoverthenodesinreverseorder,startingwithxKXXXXKY−1...p(x)=...p(xK|paK)p(xk|pak)x1xKx1xKk=1XXKY−1=...p(xk|pak),x1xK−1k=1sinceeachoftheconditionaldistributionsisassumedtobecorrectlynormalizedandkhdaw.comnoneoftheothervariablesdependonxK.RepeatingthisprocessK−2timesweareleftwithXp(x1|∅)=1.x18.2Consideradirectedgraphinwhichthenodesofthegrapharenumberedsuchthatarenoedgesgoingfromanodetoalowernumberednode.Ifthereexistsadirectedcycleinthegraphthenthesubsetofnodesbelongingtothisdirectedcyclemustalsosatisfythesamenumberingproperty.Ifwetraversethecycleinthedirectionoftheedgesthenodenumberscannotbemonotonicallyincreasingsincewemustendupbackatthestartingnode.Itfollowsthatthecyclecannotbeadirectedcycle.8.5ThesolutionisgiveninFigure3.Figure3Thegraphicalrepresentationoftherelevancexnvectormachine(R课后答案网VM);Solution8.5.αiβwitnMN8.8a⊥⊥b,cwww.hackshp.cn|dcanbewrittenasp(a,b,c|d)=p(a|d)p(b,c|d).Summing(orintegrating)bothsideswithrespecttoc,weobtainp(a,b|d)=p(a|d)p(b|d)ora⊥⊥b|d,asdesired.8.9ConsiderFigure8.26.Inordertoapplythed-separationcriterionweneedtocon-siderallpossiblepathsfromthecentralnodexitoallpossiblenodesexternaltotheMarkovblanket.Therearethreepossiblecategoriesofsuchpaths.First,considerpathsviatheparentnodes.Sincethelinkfromtheparentnodetothenodexihasitstailconnectedtotheparentnode,itfollowsthatforanysuchpaththeparentnodekhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions8.12–8.1565mustbeeithertail-to-tailorhead-to-tailwithrespecttothepath.Thustheobserva-tionoftheparentnodewillblockanysuchpath.Secondconsiderpathsviaoneofthechildnodesofnodexiwhichdonotpassdirectlythroughanyoftheco-parents.Bydeﬁnitionsuchpathsmustpasstoachildofthechildnodeandhencewillbehead-to-tailwithrespecttothechildnodeandsowillbeblocked.Thethirdandﬁnalcategoryofpathpassesviaachildnodeofxiandthenaco-parentnode.Thispathwillbehead-to-headwithrespecttotheobservedchildnodeandhencewillnotbeblockedbytheobservedchildnode.However,thispathwilleithertail-to-tailorhead-to-tailwithrespecttotheco-parentnodeandhenceobservationoftheco-parentwillblockthispath.Wethereforeseethatallpossiblepathsleavingnodekhdaw.comxiwillbeblockedandsothedistributionofxi,conditionedonthevariablesintheMarkovblanket,willbeindependentofalloftheremainingvariablesinthegraph.8.12InanundirectedgraphofMnodestherecouldpotentiallybealinkbetweeneachpairofnodes.Thenumberofdistinctgraphsisthen2raisedtothepowerofthenumberofpotentiallinks.Toevaluatethenumberofdistinctlinks,notethatthereareMnodeseachofwhichcouldhavealinktoanyoftheotherM−1nodes,makingatotalofM(M−1)links.However,eachlinkiscountedtwicesince,inanundirectedgraph,alinkfromnodeatonodebisequivalenttoalinkfromnodebtonodea.ThenumberofdistinctpotentiallinksisthereforeM(M−1)/2andsothenumberofdistinctgraphsis2M(M−1)/2.Thesetof8possiblegraphsoverthreenodesisshowninFigure4.课后答案网www.hackshp.cnFigure4Thesetof8distinctundirectedgraphswhichcanbeconstructedoverM=3nodes.8.15Themarginaldistributionp(xn−1,xn)isobtainedbymarginalizingthejointdistri-butionp(x)overallvariablesexceptxn−1andxn,XXXXp(xn−1,xn)=......p(x).x1xn−2xn+1xNThisisanalogoustothemarginaldistributionforasinglevariable,givenby(8.50).FollowingthesamestepsasinthesinglevariablecasedescribedinSection8.4.1,khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 66Solutions8.18–8.20wearriveatamodiﬁedformof(8.52),1p(xn)=Z"#XXψn−2,n−1(xn−2,xn−1)···ψ1,2(x1,x2)···ψn−1,n(xn−1,xn)xn−2x1|{z}µα(xn−1)"#XXkhdaw.comψn,n+1(xn,xn+1)···ψN−1,N(xN−1,xN)···,xn+1xN|{z}µβ(xn)fromwhich(8.58)immediatelyfollows.8.18Thejointprobabilitydistributionoverthevariablesinageneraldirectedgraphicalmodelisgivenby(8.5).Intheparticularcaseofatree,eachnodehasasingleparent,sopakwillbeasingletonforeachnode,k,exceptfortherootnodeforwhichitwillempty.Thus,thejointprobabilitydistributionforatreewillbesimilartothejointprobabilitydistributionoverachain,(8.44),withthedifferencethatthesamevari-ablemayoccurtotherightoftheconditioningbarinseveralconditionalprobabilitydistributions,ratherthanjustone(inotherwords,althougheachnodecanonlyhaveoneparent,itcanhaveseveralchildren).Hence,theargumentinSection8.3.4,bywhich(8.44)isre-writtenas(8.45),canalsobeappliedtoprobabilitydistributionsovertrees.TheresultisaMarkovrandomﬁeldmodelwhereeachpotentialfunction课后答案网correspondstooneconditionalprobabilitydistributioninthedirectedtree.Thepriorfortherootnode,e.g.p(x1)in(8.44),canagainbeincorporatedinoneofthepoten-tialfunctionsassociatedwiththerootnodeor,alternatively,canbeincorporatedasasinglenodepotential.Thistransformationcanalsobeappliedintheotherdirection.Givenanundirectedtree,wepickanodearbitrarilyastheroot.Sincethegraphisatree,thereisawww.hackshp.cnuniquepathbetweeneverypairofnodes,so,startingatrootandworkingoutwards,wecandirectalltheedgesinthegraphtopointfromtheroottotheleafnodes.AnexampleisgiveninFigure5.Sinceeveryedgeinthetreecorrespondtoatwo-nodepotentialfunction,bynormalizingthisappropriately,weobtainaconditionalprobabilitydistributionforthechildgiventheparent.Sincethereisauniquepathbeweeneverypairofnodesinanundirectedtree,oncewehavechosentherootnode,theremainderoftheresultingdirectedtreeisgiven.Hence,fromanundirectedtreewithNnodes,wecanconstructNdifferentdirectedtrees,oneforeachchoiceofrootnode.8.20Wedotheinductionoverthesizeofthetreeandwegrowthetreeonenodeatatimewhile,atthesametime,weupdatethemessagepassingschedule.Notethatwecanbuildupanytreethisway.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions8.21–8.2367Figure5Thegraphontheleftisanx1x2x1x2undirectedtree.Ifwepickx4tobetherootnodeanddirectalltheedgesinthegraphtopointfromtherootx3x3totheleafnodes(x1,x2andx5),weobtainthedirectedtreeshownontheright.x4x5x4x5Forasinglerootnode,therequiredconditionholdstriviallytrue,sincetherearenokhdaw.commessagestobepassed.WethenassumethatitholdsforatreewithNnodes.Intheinductionstepweaddanewleafnodetosuchatree.Thisnewleafnodeneednottowaitforanymessagesfromothernodesinordertosenditsoutgoingmessageandsoitcanbescheduledtosenditﬁrst,beforeanyothermessagesaresent.Itsparentnodewillreceivethismessage,whereafterthemessagepropagationwillfollowtheschedulefortheoriginaltreewithNnodes,forwhichtheconditionisassumedtohold.Forthepropagationoftheoutwardmessagesfromtherootbacktotheleaves,weﬁrstfollowthepropagationschedulefortheoriginaltreewithNnodes,forwhichtheconditionisassumedtohold.Whenthishascompleted,theparentofthenewleafnodewillbereadytosenditsoutgoingmessagetothenewleafnode,therebycompletingthepropagationforthetreewithN+1nodes.8.21Tocomputep(xs),wemarginalizep(x)overallothervariables,analogouslyto(8.61),X课后答案网p(xs)=p(x).xxsUsing(8.59)andthedeﬁntionofFs(x,Xs)thatfollowed(8.62),wecanwritethisasXYYwww.hackshp.cnp(xs)=fs(xs)Fj(xi,Xij)xxsi∈ne(fs)j∈ne(xi)fsYXY=fs(xs)Fj(xi,Xij)i∈ne(fs)xxsj∈ne(xi)fsY=fs(xs)µxi→fs(xi),i∈ne(fs)whereinthelaststep,weused(8.67)and(8.68).Notethatthemarginalizationoverthedifferentsub-treesrootedintheneighboursoffswouldonlyrunovervariablesintherespectivesub-trees.8.23Thisfollowsfromthefactthatthemessagethatanode,xi,willsendtoafactorfs,consistsoftheproductofallothermessagesreceivedbyxi.From(8.63)and(8.69),khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 68Solutions8.28–9.1wehaveYp(xi)=µfs→xi(xi)s∈ne(xi)Y=µfs→xi(xi)µft→xi(xi)t∈ne(xi)fs=µfs→xi(xi)µxi→fs(xi).8.28Ifagraphhasoneormorecycles,thereexistsatleastonesetofnodesandedgessuchthat,startingfromanarbitrarynodeintheset,wecanvisitallthenodesinthekhdaw.comsetandreturntothestartingnode,withouttraversinganyedgemorethanonce.Consideroneparticularsuchcycle.Whenoneofthenodesn1inthecyclesendsamessagetooneofitsneighboursn2inthecycle,thiscausesapendingmessagesontheedgetothenextnoden3inthatcycle.Thussendingapendingmessagealonganedgeinthecyclealwaysgeneratesapendingmessageonthenextedgeinthatcycle.Sincethisistrueforeverynodeinthecycleitfollowsthattherewillalwaysexistatleastonependingmessageinthegraph.8.29Weshowthisbyinductionoverthenumberofnodesinthetree-structuredfactorgraph.Firstconsideragraphwithtwonodes,inwhichcaseonlytwomessageswillbesentacrossthesingleedge,oneineachdirection.Noneofthesemessageswillinduceanypendingmessagesandsothealgorithmterminates.WethenassumethatforafactorgraphwithNnodes,therewillbenopendingmessagesafteraﬁnitenumberofmessageshavebeensent.Givensuchagraph,wecanconstructanewgraphwith课后答案网N+1nodesbyaddinganewnode.Thisnewnodewillhaveasingleedgetotheoriginalgraph(sincethegraphmustremainatree)andsoifthisnewnodereceivesamessageonthisedge,itwillinducenopendingmessages.AmessagesentfromthenewnodewilltriggerpropagationofmessagesintheoriginalgraphwithNnodes,butbyassumption,afteraﬁnitenumberofmessageshavebeensent,therewillbenopendingmessagesandthealgorithmwillterminate.www.hackshp.cnChapter9MixtureModels9.1SinceboththeE-andtheM-stepminimisethedistortionmeasure(9.1),thealgorithmwillneverchangefromaparticularassignmentofdatapointstoprototypes,unlessthenewassignmenthasalowervaluefor(9.1).Sincethereisaﬁnitenumberofpossibleassignments,eachwithacorrespondinguniqueminimumof(9.1)w.r.t.theprototypes,{µk},theK-meansalgorithmwillconvergeafteraﬁnitenumberofsteps,whennore-assignmentofdatapointstoprototypeswillresultinadecreaseof(9.1).Whenno-reassignmenttakesplace,therealsowillnotbeanychangein{µk}.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions9.3–9.7699.3From(9.10)and(9.11),wehaveXXYKp(x)=p(x|z)p(z)=(πN(x|µ,Σ))zk.kkkzzk=1Exploitingthe1-of-Krepresentationforz,wecanre-writether.h.s.asXKYKXK(πN(x|µ,Σ))Ikj=πN(x|µ,Σ)kkkjjjkhdaw.comj=1k=1j=1whereIkj=1ifk=jand0otherwise.9.7Considerﬁrsttheoptimizationwithrespecttotheparameters{µk,Σk}.Forthiswecanignorethetermsin(9.36)whichdependonlnπk.Wenotethat,foreachdatapointn,thequantitiesznkareallzeroexceptforaparticularelementwhichequalsone.WecanthereforepartitionthedatasetintoKgroups,denotedXk,suchthatallthedatapointsxnassignedtocomponentkareingroupXk.Thecomplete-dataloglikelihoodfunctioncanthenbewritten()XKXlnp(X,Z|µ,Σ,π)=lnN(xn|µk,Σk).k=1n∈XkThisrepresentsthesumofKindependentterms,oneforeachcomponentinthemixture.WhenwemaximizethistermwithrespecttoµkandΣkwewillsimplybeﬁttingthekthcomponenttothedatasetX,forwhichwewillobtaintheusual课后答案网kmaximumlikelihoodresultsforasingleGaussian,asdiscussedinChapter2.ForthemixingcoefﬁcientsweneedonlyconsiderthetermsinlnPπkin(9.36),butwemustintroduceaLagrangemultipliertohandletheconstraintkπk=1.Thuswemaximize!XNXKXKwww.hackshp.cnznklnπk+λπk−1n=1k=1k=1whichgivesXNznk0=+λ.πkn=1Multiplyingthroughbyπkandsummingoverkweobtainλ=−N,fromwhichwehaveXN1Nkπk=znk=NNn=1whereNkisthenumberofdatapointsingroupXk.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 70Solutions9.8–9.159.8Using(2.43),wecanwritether.h.s.of(9.40)asXNXK1T−1−γ(znj)(xn−µj)Σ(xn−µj)+const.,2n=1j=1where`const."summarizestermsindependentofµj(forallj).Takingthederivativeofthisw.r.t.µk,wegetXN−1−1−γ(znk)Σµk−Σxn,khdaw.comn=1andsettingthistozeroandrearranging,weobtain(9.17).9.12SincetheexpectationofasumisthesumoftheexpectationswehaveXKXKE[x]=πkEk[x]=πkµkk=1k=1whereEk[x]denotestheexpectationofxunderthedistributionp(x|k).ToﬁndthecovarianceweusethegeneralrelationTTcov[x]=E[xx]−E[x]E[x]togiveTTcov[x]=E[xx]−E[x]E[x]XKTT课后答案网=πkEk[xx]−E[x]E[x]k=1XK=πΣ+µµT−E[x]E[x]T.kkkkk=19.15Thisiseasilyshownbycalculatingthederivativesof(9.55),sewww.hackshp.cnttingthemtozeroandsolveforµki.Usingstandardderivatives,wegetN∂Xxni1−xniEZ[lnp(X,Z|µ,π)]=γ(znk)−∂µkiµki1−µkin=1PPnγ(znk)xni−nγ(znk)µki=.µki(1−µki)Settingthistozeroandsolvingforµki,wegetPnγ(znk)xniµki=P,nγ(znk)whichequals(9.59)whenwritteninvectorform.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions9.17–9.25719.17Thisfollowsdirectlyfromtheequationfortheincompletelog-likelihood,(9.51).Thelargestvaluethattheargumenttothelogarithmonther.h.s.of(9.51)canhavePKis1,since∀n,k:06p(xn|µk)61,06πk61andkπk=1.Therefore,themaximumvalueforlnp(X|µ,π)equals0.9.20Ifwetakethederivativesof(9.62)w.r.t.α,weget∂M11TE[lnp(t,w|α,β)]=−Eww.∂α2α2khdaw.comSettingthisequaltozeroandre-arranging,weobtain(9.63).9.23NOTE:InPRML,thetasksetinthisexerciseistoshowthatthetwosetsofre-estimationequationsareformallyequivalent,withoutanyrestriction.However,itreallyshouldberestrictedtostationarypointsoftheobjectivefunction.Consideringthecasewhentheoptimizationhasconverged,wecanstartwithαi,asdeﬁnedby(7.87),anduse(7.89)tore-writethisas1−α?Σα?=iii,im2Nwhereα?=αnew=αisthevaluereachedatconvergence.Wecanre-writethisasiiiα?(m2+Σ)=1iiiiwhichiseasilyre-writtenas(9.67).For课后答案网β,westartfrom(9.68),whichwere-writeasP1kt−Φmk2γNii=+.β?Nβ?NAsintheα-case,β?=βnew=βisthevaluereachedatconvergence.Wecanre-writethisas!www.hackshp.cn1X2N−γi=kt−ΦmNk,β?iwhichcaneasilybere-writtenas(7.88).9.25ThisfollowsfromthefactthattheKullback-Leiblerdivergence,KL(qkp),isatitsminimum,0,whenqandpareidentical.Thismeansthat∂KL(qkp)=0,∂θsincep(Z|X,θ)dependsonθ.Therefore,ifwecomputethegradientofbothsidesof(9.70)w.r.t.θ,thecontributionfromthesecondtermonther.h.s.willbe0,andsothegradientoftheﬁrsttermmustequalthatofthel.h.s.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 72Solutions9.26–10.19.26From(9.18)wegetXoldoldNk=γ(znk).(137)nWegetNnewbyrecomputingtheresponsibilities,γ(z),foraspeciﬁcdatapoint,kmkxm,yieldingXnewoldnewNk=γ(znk)+γ(zmk).n6=mCombiningthiswith(137),weget(9.79).Similarly,from(9.17)wehavekhdaw.com1Xoldoldµk=oldγ(znk)xnNknandrecomputingtheresponsibilities,γ(zmk),weget!1Xnewoldnewµk=newγ(znk)xn+γ(zmk)xmNkn6=m1oldoldoldnew=newNkµk−γ(zmk)xm+γ(zmk)xmNk1newnewoldold=newNk−γ(zmk)+γ(zmk)µkNkoldnew−γ(zmk)xm+γ(zmk)xmγnew(z)−γold(z)课后答案网=µold+mkmk(x−µold),kNnewmkkwherewehaveused(9.79).Chapter10VariationalInferenceandEMwww.hackshp.cn10.1Startingfrom(10.3),weusetheproductruletogetherwith(10.4)togetZp(X,Z)L(q)=q(Z)lndZq(Z)Zp(X|Z)p(X)=q(Z)lndZq(Z)Zp(X|Z)=q(Z)ln+lnp(X)dZq(Z)=−KL(qkp)+lnp(X).Rearrangingthis,weimmediatelyget(10.2).khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution10.37310.3Startingfrom(10.16)andoptimizingw.r.t.qj(Zj),wegetZ"#XMKL(pkq)=−p(Z)lnqi(Zi)dZ+const.i=1Z!X=−p(Z)lnqj(Zj)+p(Z)lnqi(Zi)dZ+const.i6=jZ=−p(Z)lnqj(Zj)dZ+const.khdaw.comZ"Z#Y=−lnqj(Zj)p(Z)dZidZj+const.i6=jZ=−Fj(Zj)lnqj(Zj)dZj+const.,wheretermsindependentofqj(Zj)havebeenabsorbedintotheconstanttermandwehavedeﬁnedZYFj(Zj)=p(Z)dZi.i6=jWeuseaLagrangemultipliertoensurethatqj(Zj)integratestoone,yieldingZZ课后答案网−Fj(Zj)lnqj(Zj)dZj+λqj(Zj)dZj−1.UsingtheresultsfromAppendixD,wethentakethefunctionalderivativeofthisw.r.t.qjandsetthistozero,toobtainFj(Zj)−+λ=0.www.hackshp.cnqj(Zj)Fromthis,weseethatλqj(Zj)=Fj(Zj).IntegratingbothsidesoverZj,weseethat,sinceqj(Zj)mustintgratetoone,ZZ"Z#Yλ=Fj(Zj)dZj=p(Z)dZidZj=1,i6=jandthusZYqj(Zj)=Fj(Zj)=p(Z)dZi.i6=jkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 74Solutions10.5–10.1010.5Weassumethatq(Z)=q(z)q(θ)andsowecanoptimizew.r.t.q(z)andq(θ)inde-pendently.Forq(z),thisisequivalenttominimizingtheKullback-Leiblerdivergence,(10.4),whichherebecomesZZp(z,θ|X)KL(qkp)=−q(θ)q(z)lndzdθ.q(z)q(θ)Fortheparticularchosenformofq(θ),thisisequivalenttokhdaw.comZp(z,θ0|X)KL(qkp)=−q(z)lndz+const.q(z)Zp(z|θ0,X)p(θ0|X)=−q(z)lndz+const.q(z)Zp(z|θ0,X)=−q(z)lndz+const.,q(z)whereconstaccumulatesalltermsindependentofq(z).ThisKLdivergenceismin-imizedwhenq(z)=p(z|θ0,X),whichcorrespondsexactlytotheE-stepoftheEMalgorithm.Todetermineq(θ),weconsiderZZp(X,θ,z)课后答案网q(θ)q(z)lndzdθq(θ)q(z)ZZ=q(θ)Eq(z)[lnp(X,θ,z)]dθ−q(θ)lnq(θ)dθ+const.wherethelasttermsummarizestermsindependentofq(θ).Sinceq(θ)iscon-strainedtobeapointdensity,thecontributionfromtheentropyterm(whichformallywww.hackshp.cndiverges)willbeconstantandindependentofθ0.Thus,theoptimizationproblemisreducedtomaximizingexpectedcompletelogposteriordistributionEq(z)[lnp(X,θ0,z)],w.r.t.θ0,whichisequivalenttotheM-stepoftheEMalgorithm.10.10NOTE:InPRML,thereareerrorsthataffectthisexercise.Lmusedin(10.34)and(10.35)shouldreallybeL,whereasLmusedin(10.36)isgiveninSolution10.11below.ThiscompletelyanalogoustoSolution10.1.Startingfrom(10.35),wecanusethekhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions10.11–10.1375productruletoget,XXp(Z,X,m)L=q(Z|m)q(m)lnq(Z|m)q(m)mZXXp(Z,m|X)p(X)=q(Z|m)q(m)lnq(Z|m)q(m)mZXXp(Z,m|X)=q(Z|m)q(m)ln+lnp(X).q(Z|m)q(m)khdaw.commZRearrangingthis,weobtain(10.34).10.11NOTE:ConsultnoteprecedingSolution10.10forsomerelevantcorrections.WestartbyrewritingthelowerboundasfollowsXXp(Z,X,m)L=q(Z|m)q(m)lnq(Z|m)q(m)mZXX=q(Z|m)q(m){lnp(Z,X|m)+lnp(m)−lnq(Z|m)−lnq(m)}mZX=q(m)lnp(m)−lnq(m)m课后答案网X+q(Z|m){lnp(Z,X|m)−lnq(Z|m)}ZX=q(m){ln(p(m)exp{Lm})−lnq(m)},(138)www.hackshp.cnmwhereXp(Z,X|m)Lm=q(Z|m)ln.q(Z|m)ZWerecognize(138)asthenegativeKLdivergencebetweenq(m)andthe(notnec-essarilynormalized)distributionp(m)exp{Lm}.ThiswillbemaximizedwhentheKLdivergenceisminimized,whichwillbethecasewhenq(m)∝p(m)exp{Lm}.10.13Inordertoderivetheoptimalsolutionforq(µk,Λk)westartwiththeresult(10.54)khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 76Solution10.13andkeeponlythosetermwhichdependonµkorΛktogivelnq?(µ,Λ)=lnN(µ|m,βΛ)+lnW(Λ|W,ν)kkk00kk00XN+E[znk]lnN(xn|µk,Λk)+const.n=1β0T11−1=−(µk−m0)Λk(µk−m0)+ln|Λk|−TrΛkW0222XN(ν0−D−1)1T+ln|Λk|−E[znk](xn−µk)Λk(xn−µk)22khdaw.comn=1!XN1+E[znk]ln|Λk|+const.(139)2n=1Usingtheproductruleofprobability,wecanexpresslnq?(µ,Λ)aslnq?(µ|Λ)kkkk+lnq?(Λ).Letusﬁrstofallidentifythedistributionforµ.Todothisweneedkkonlyconsidertermsontherighthandsideof(139)whichdependonµk,givinglnq?(µ|Λ)kk"#"#XNXN1TT=−µkβ0+E[znk]Λkµk+µkΛkβ0m0+E[znk]xn2n=1n=1+const.1TT=−µk[β0+Nk]Λkµk+µkΛk[β0m0+Nkxk]+const.课后答案网2wherewehavemadeuseof(10.51)and(10.52).Thusweseethatlnq?(µ|Λ)kkdependsquadraticallyonµandhenceq?(µ|Λ)isaGaussiandistribution.Com-kkkpletingthesquareintheusualwayallowsustodeterminethemeanandprecisionofthisGaussian,givingq?(µ|Λ)=N(µ|m,βΛ)(140)www.hackshp.cnkkkkkkwhereβk=β0+Nk1mk=(β0m0+Nkxk).βkNextwedeterminetheformofq?(Λ)bymakinguseoftherelationklnq?(Λ)=lnq?(µ,Λ)−lnq?(µ|Λ).kkkkkOntherighthandsideofthisrelationwesubstituteforlnq?(µ,Λ)using(139),kkandwesubstituteforlnq?(µ|Λ)usingtheresult(140).Keepingonlythosetermskkkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution10.1677whichdependonΛkweobtain?β0T11−1lnq(Λk)=−(µk−m0)Λk(µk−m0)+ln|Λk|−TrΛkW0222XN(ν0−D−1)1T+ln|Λk|−E[znk](xn−µk)Λk(xn−µk)22n=1!XN1βkT+E[znk]ln|Λk|+(µk−mk)Λk(µ−mk)22n=11khdaw.com−ln|Λk|+const.2(νk−D−1)1−1=ln|Λk|−TrΛkWk+const.22Notethatthetermsinvolvingµhavecancelledoutasweexpectsinceq?(Λ)iskkindependentofµk.HerewehavedeﬁnedXNW−1=W−1+β(µ−m)(µ−m)T+E[z](x−µ)(x−µ)Tk00k0k0nknknkn=1T−βk(µk−mk)(µk−mk)−1β0NkT=W0+NkSk+(xk−m0)(xk−m0)β0+NkXN课后答案网νk=ν0+E[znk]n=1=ν0+Nk,wherewehavemadeuseoftheresultXNXNTTTwww.hackshp.cnE[znk]xnxn=E[znk](xn−xk)(xn−xk)+Nkxkxkn=1n=1T=NkSk+Nkxkxk(141)andwehavemadeuseof(10.53).Thusweseethatq?(Λ)isaWishartdistributionkoftheformq?(Λ)=W(Λ|W,ν).kkkk10.16Toderive(10.71)wemakeuseof(10.38)togiveE[lnp(D|z,µ,Λ)]XNXK1=E[znk]{E[ln|Λk|]−E[(xn−µk)Λk(xn−µk)]−Dln(2π)}.2n=1k=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 78Solution10.20WenowuseE[znk]=rnktogetherwith(10.64)andthedeﬁnitionofΛekgivenby(10.65)togiveXNXK1E[lnp(D|z,µ,Λ)]=rnklnΛek2n=1k=1−Dβ−1−ν(x−m)TW(x−m)−Dln(2π).kknkknkNowweusethedeﬁnitions(10.51)to(10.53)togetherwiththeresult(141)togive(10.71).khdaw.comWecanderive(10.72)simplybytakingthelogarithmofp(z|π)givenby(10.37)XNXKE[lnp(z|π)]=E[znk]E[lnπk]n=1k=1andthenmakinguseofE[znk]=rnktogetherwiththedeﬁnitionofπekgivenby(10.65).10.20Considerﬁrsttheposteriordistributionovertheprecisionofcomponentkgivenbyq?(Λ)=W(Λ|W,ν).kkkkFrom(10.63)weseethatforlargeNwehaveνk→Nk,andsimilarlyfrom(10.62)−1−1weseethatWk→NkSk.ThusthemeanofthedistributionoverΛk,givenby−1E[Λk]=νkWk→Skwhichisthemaximumlikelihoodvalue(thisassumesthatthequantitiesrnkreducetothecorrespondingEMvalues,whichisindeedthecaseasweshallshowshortly).Inordertoshowthatthisposteriorisalsosharplypeaked,课后答案网weconsiderthedifferentialentropy,H[Λk]givenby(B.82),andshowthat,asNk→∞,H[Λk]→0,correspondingtothedensitycollapsingtoaspike.FirstconsiderthenormalizingconstantB(W,ν)givenby(B.79).SinceW→N−1S−1andkkkkkνk→Nk,Dwww.hackshp.cnNkXNk+1−i−lnB(Wk,νk)→−(DlnNk+ln|Sk|−Dln2)+lnΓ.22i=1WethenmakeuseofStirling"sapproximation(1.146)toobtainNk+1−iNklnΓ"(lnNk−ln2−1)22whichleadstotheapproximatelimitNkDNk−lnB(Wk,νk)→−(lnNk−ln2−lnNk+ln2+1)−ln|Sk|22Nk=−(ln|Sk|+D).(142)2khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution10.2379−1−1Next,weuse(10.241)and(B.81)incombinationwithWk→NkSkandνk→NktoobtainthelimitNkE[ln|Λ|]→Dln+Dln2−DlnNk−ln|Sk|2=−ln|Sk|,whereweapproximatedtheargumenttothedigammafunctionbyNk/2.Substitut-ingthisand(142)into(B.82),wegetkhdaw.comH[Λ]→0whenNk→∞.Nextconsidertheposteriordistributionoverthemeanµofthekthcomponentgivenkbyq?(µ|Λ)=N(µ|m,βΛ).kkkkkkFrom(10.61)weseethatforlargeNthemeanmkofthisdistributionreducestoxkwhichisthecorrespondingmaximumlikelihoodvalue.From(10.60)weseethat−1βk→NkandThustheprecisionβkΛk→βkνkWk→NkSkwhichislargeforlargeNandhencethisdistributionissharplypeakedarounditsmean.Nowconsidertheposteriordistributionq?(π)givenby(10.57).ForlargeNwehaveαk→Nkandsofrom(B.17)and(B.19)weseethattheposteriordistributionbecomessharplypeakedarounditsmeanE[πk]=αk/α→Nk/Nwhichisthemaximumlikelihoodsolution.Forthedistributionq?(z)weconsidertheresponsibilitiesgivenby(10.67).Using(10.65)and(10.66),togetherwiththeasymptoticresultforthedigammafunction,课后答案网weagainobtainthemaximumlikelihoodexpressionfortheresponsibilitiesforlargeN.Finally,forthepredictivedistributionweﬁrstperformtheintegrationoverπ,asinthesolutiontoExercise10.19,togiveXKZZwww.hackshp.cnαkp(xb|D)=N(xb|µk,Λk)q(µk,Λk)dµkdΛk.αk=1TheintegrationsoverµkandΛkarethentrivialforlargeNsincethesearesharplypeakedandhenceapproximatedeltafunctions.WethereforeobtainXKNkp(xb|D)=N(xb|xk,Wk)Nk=1whichisamixtureofGaussians,withmixingcoefﬁcientsgivenbyNk/N.10.23Whenwearetreatingπasaparameter,thereisneitheraprior,noravariationalposteriordistribution,overπ.Therefore,theonlytermremainingfromthelowerkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 80Solution10.24bound,(10.70),thatinvolvesπisthesecondterm,(10.72).Notehowever,that(10.72)involvestheexpectationsoflnπkunderq(π),whereashere,weoperatedirectlywithπk,yieldingXNXKEq(Z)[lnp(Z|π)]=rnklnπk.n=1k=1AddingaLangrangeterm,asin(9.20),takingthederivativew.r.t.πkandsettingtheresulttozerowegetNk+λ=0,(143)khdaw.comπkwherewehaveused(10.51).Byre-arrangingthistoNk=−λπkPandsummingbothsidesoverk,weseethat−λ=kNk=N,whichwecanusetoeliminateλfrom(143)toget(10.83).10.24Thesingularitiesthatmayariseinmaximumlikelihoodestimationarecausedbyamixturecomponent,k,collapsingonadatapoint,xn,i.e.,rkn=1,µk=xnand|Λk|→∞.However,thepriordistributionp(µ,Λ)deﬁnedin(10.40)willpreventthisfromhappening,alsointhecaseofMAPestimation.Considertheproductoftheexpectedcompletelog-likelihoodandp(µ,Λ)asafunctionofΛk:Eq(Z)[lnp(X|Z,µ,Λ)p(µ,Λ)]课后答案网XN1T=rknln|Λk|−(xn−µk)Λk(xn−µk)2n=1T+ln|Λk|−β0(µk−m0)Λk(µk−m0)−1+(ν0−D−1)ln|Λk|−TrW0Λk+const.wherewehaveused(10.38),(10.40)and(10.50),togetherwiththedeﬁnitionsforwww.hackshp.cntheGaussianandWishartdistributions;thelasttermsummarizestermsindependentofΛk.Using(10.51)–(10.53),wecanrewritethisas(ν+N−D)ln|Λ|−Tr(W−1+β(µ−m)(µ−m)T+NS)Λ,0kk00k0k0kkkwherewehavedroppedtheconstantterm.Using(C.24)and(C.28),wecancomputethederivativeofthisw.r.t.Λkandsettingtheresultequaltozero,weﬁndtheMAPestimateforΛktobe−11−1TΛk=(W0+β0(µk−m0)(µk−m0)+NkSk).ν0+Nk−D−1−1Fromthisweseethat|Λk|canneverbecome0,becauseofthepresenceofW0(whichwemustchosetobepositivedeﬁnite)intheexpressiononther.h.s.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions10.29–10.328110.29Stardardrulesofdifferentiationgivedln(x)1=dxxd2ln(x)1=−.dx2x2Sinceitssecondderivativeisnegativeforallvalueofx,ln(x)isconcavefor0H(R0)orH(R)ym(x)form6=k,andhenceallαm=0form6=k.Ananalogousresultholdsfortheminimumvalue.Forothersettingsofα,ymin(x)0.Sinceαkisthesmallestoftheαvaluesitfollowsthatallofthecoefﬁcientsmustsatisfyαk>0.Similarly,considerthecaseinwhichPym(x)=1forallm.ThenyminP(x)=ymax(x)=1,whileyCOM(x)=mαm.From(14.56)itthenfollowsthatmαm=1,asrequired.khdaw.com14.6Ifwedifferentiate(14.23)w.r.t.αmweobtain!XNXN∂E1αm/2−αm/2(m)−αm/2(m)=(e+e)wnI(ym(xn)6=tn)−ewn.∂αm2n=1n=1Settingthisequaltozeroandrearranging,wegetPw(m)−αm/2nnI(ym(xn)6=tn)e1P==.w(m)eαm/2+e−αm/2eαm+1nnUsing(14.16),wecanrewritethisas1=m,eαm+1whichcanbe课后答案网furtherrewrittenasαm1−me=,mfromwhich(14.17)followsdirectly.14.9Thesum-of-squareserrorfortheadditivemodelof(14.21)isdeﬁnedaswww.hackshp.cnXN12E=(tn−fm(xn)).2n=1Using(14.21),wecanrewritethisasXN112(tn−fm−1(xn)−αmym(x)),22n=1wherewerecognizethetwoﬁrsttermsinsidethesquareastheresidualfromthe(m−1)-thmodel.Minimizingthiserrorw.r.t.ym(x)willbeequivalenttoﬁttingym(x)tothe(scaled)residuals.khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solutions14.13–14.179914.13Startingfromthemixturedistributionin(14.34),wefollowthesamestepsasformixturesofGaussians,presentedinSection9.2.WeintroduceaK-nomiallatentvariable,z,suchthatthejointdistributionoverzandtequalsYKT−1zkp(t,z)=p(t|z)p(z)=Nt|wkφ,βπk.k=1Givenasetofobservations,{(t,φ)}N,wecanwritethecompletelikelihoodnnn=1overtheseobservationsandthecorrespondingz1,...,zN,asYNYKkhdaw.comT−1znkπkN(tn|wkφn,β).n=1k=1Takingthelogarithm,weobtain(14.36).14.15Thepredictivedistributionfromthemixtureoflinearregressionmodelsforanewinputfeaturevector,φb,isobtainedfrom(14.34),withφreplacedbyφb.Calculatingtheexpectationoftunderthisdistribution,weobtainXKE[t|φb,θ]=πkE[t|φb,wk,β].k=1Dependingontheparameters,thisexpectationispotentiallyK-modal,withonemodeforeachmixturecomponent.However,theweightedcombinationofthesemodesoutputbythemixturemodelmaynotbeclosetoanysinglemode.Forexam-ple,thecombinationofthetwomodesintheleftpanelofFigure14.9willendupin课后答案网betweenthetwomodes,aregionwithnosignicantprobabilitymass.14.17Ifwedeﬁneψk(t|x)in(14.58)asXMψk(t|x)=λmkφmk(t|x),www.hackshp.cnm=1wecanrewrite(14.58)asXKXMp(t|x)=πkλmkφmk(t|x)k=1m=1XKXM=πkλmkφmk(t|x).k=1m=1Bychangingtheindexation,wecanwritethisasXLp(t|x)=ηlφl(t|x),l=1khdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com 100Solution14.17Figure8Left:anillustrationofaσ(vTx)1hierarchicalmixturemodel,wheretheinputdepen-π2dentmixingcoefﬁcientsaredeterminedbylinearTlogisticmodelsassociatedσ(v2x)withinteriornodes;theπ3ψ1(t|x)leafnodescorrespondtoπ1local(conditional)densitymodels.Right:apossi-bledivisionoftheinputψ2(t|x)ψ3(t|x)spaceintoregionswheredifferentmixingcoefﬁcientskhdaw.comdominate,underthemodelillustratedleft.whereL=KM,l=(k−1)M+m,ηl=πkλmkandφl(·)=φmk(·).ByPLconstruction,ηl>0andl=1ηl=1.Notethatthiswouldworkjustaswellifπkandλmkweretobedependentonx,aslongastheybothrespecttheconstraintsofbeingnon-negativeandsummingto1foreverypossiblevalueofx.Finally,consideratree-structured,hierarchicalmixturemodel,asillustratedintheleftpanelofFigure8.Onthetop(root)level,thisisamixturewithtwocomponents.Themixingcoefﬁcientsaregivenbyalinearlogisticregressionmodelandhenceareinputdependent.Theleftsub-treecorrespondtoalocalconditionaldensitymodel,ψ1(t|x).Intherightsub-tree,thestructurefromtherootisreplicated,withthedifferencethatbothsub-treescontainlocalconditionaldensitymodels,ψ2(t|x)andψ3(t|x课后答案网).Wecanwritetheresultingmixturemodelontheform(14.58)withmixingcoefﬁ-cientsTπ1(x)=σ(v1x)TTπ2(x)=(1−σ(v1x))σ(v2x)www.hackshp.cnTTπ3(x)=(1−σ(v1x))(1−σ(v2x)),whereσ(·)isdeﬁnedin(4.59)andv1andv2aretheparametervectorsofthelogisticregressionmodels.Notethatπ1(x)isindependentofthevalueofv2.Thiswouldnotbethecaseifthemixingcoefﬁcientsweremodelledusingasinglelevelsoftmaxmodel,Teukxπk(x)=P3uTx,ejjwheretheparametersuk,correspondingtoπk(x),willalsoaffecttheothermixingcoefﬁecients,πj6=k(x),throughthedenominator.Thisgivesthehierarchicalmodeldifferentpropertiesinthemodellingofthemixturecoefﬁcientsovertheinputspace,ascomparedtoalinearsoftmaxmodel.Anexampleisshownintherightpanelofkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com Solution14.17101Figure8,wheretheredlinesrepresentbordersofequalmixingcoefﬁcientsintheinputspace.Thesebordersareformedfromtwostraightlines,correspondingtothetwologisticunitsintheleftpanelof8.Acorrespondingdivisionoftheinputspacebyasoftmaxmodelwouldinvolvethreestraightlinesjoinedatasinglepoint,looking,e.g.,somethingliketheredlinesinFigure4.3inPRML;notethatalinearthree-classsoftmaxmodelcouldnotimplementthebordersshowinrightpanelofFigure8.khdaw.com课后答案网www.hackshp.cnkhdaw.com若侵犯了您的版权利益，敬请来信通知我们！℡www.khdaw.com'

Pattern recognition and machine learning (Christopher M. Bishop 著) Springe 101页

Pattern recognition and machine learning (Christopher M. Bishop 著) Springe

您可能关注的文档

相关文档

最近下载